Advanced Modelling: Retail Use Case
Advanced Modelling: Retail Use Case - Realizing Extreme Performance with SAP HANA
Abani Pattanayak, SAP HANA COE (Delivery)
Robert Guglietti, SAP Strategic Advisory Services
Jako Blagoev, SAP HANA COE (AGS)
We all know HANA is lightning fast and is incredible piece of software (and hardware). There are many articles, white paper and benchmark papers showcasing performance of HANA. My favorite KPI (irrespective of hardware configuration) is that, HANA can process around 2bn scans /second /core and 16m aggregations/sec/core.
So your overall query performance is highly dependent on
- No of records scanned/processed
- Hardware configuration (No of Core/CPU and available dynamic memory)
- Type of calculations i.e. Sum / Count (aggregation) vs. DISTINCT Count. SUM or Count is a simple aggregation of data, where as DISTINCT count is a much more complex calculation depending on the cardinality of the dataset. The distinct count operation will always be slower than a simple aggregation and this is true for any hardware configuration and any type of database.
- Finally the design of the data model – This plays a big part for realizing optimum query performance. Unfortunately there is no ONE fits for all solution.
So purely from an academic point, we can bring down the query execution time by creating multiple parallel processes (assuming there are enough CPU processes available). However this may create CPU bottleneck in case high concurrency scenario resulting higher query execution time..
Please note, #1 to 2 is almost fixed for a particular project or use case. Types of calculations are also fixed for the project (but we can be creative here: discussed in later part of the paper). So query performance ultimately boils down to design of the data model (and design options are also limited on the feature of the Product)
We are going to discuss POS (Point of Sale) use case in retail sector. The most critical measure in a retail use case seems to be Average Check which is “Sales Amount” / “No of Unique Transactions”.
A typical POS scenario deals with tens of billions rows of data.
For this use case,
- We’ve 11B rows over 25 months (7B unique transactions) in the FACT tables.
- 4.5TB (9-node scale out system, 512GB / 40 cores on each node)
- Size of the FACT table: 1.7TB
- Size of the Master data tables 200GB
Since this is use case is from an actual customer project, we can’t discuss the specific business scenarios and customer specific KPIs. However to explain the concept, we'll consider the following business Query.
Trend of Average Check (Year over Year) for the last 12 months:
- Drill-down on Geography Hierarchy – all the way to Retail Stores
- Drill-down on Product Hierarchy – all the way to UPC/SKU level
Trend of Average Check (Year over Year) for the last 12 months for the company will
- Scan through almost all records in the fact table (as no filters are applied)
- Average Check requires computation of Unique Transactions (i.e. Distinct Count on 3-fields namely Store ID, Calday & Transaction ID)
With proper data-model and right partitioning strategy (discussed below), we can get the following performance
- Sum (Sales Amount) or even complex KPI on Sales Amount for the whole dataset (i.e. all 11B rows) takes around 3 secs.
- Sum (Distinct Transaction Count) for the same dataset takes around 20 secs and complex KPIs on Transaction Count takes around 27-30 secs.
- The drill-down on Geography hierarchy and Product hierarchy is significantly faster, because now filters (for Geography or Product) are applied.
So we are able to execute the the Trend of Average Check (Year over Year) for the last 12 months for the whole company query under 30 secs.
However, for 5 concurrent users the average query performance was 2.5+mins and we could only execute a maximum of 7 concurrent users. We've analyzed the models thoroughly using performance trace and it is a clear case of CPU bottleneck and laws of physics in action (i.e. CPU cycles required to compute distinct count).
Our target is to have a Query performance of less than 30 secs for 25 concurrent users.
Design Consideration for the Base Data Model:
Please note, the most resource-hog part of this use case has been computing Transaction Count.
Transaction Count = Distinct Count (Store ID, Calday & Transaction ID)
So to efficient compute Transaction Count, we used the following steps.
1. Right Partitioning of the FACT table:
StoreID, Calday, TransactionID (and few other fields) were part of the primary key of the FACT table. So the FACT table(s) were partitioned as below
- Level 1: HASH partition on StoreID
- Level 2: Range partition on Month (12 partitions)
2. Replicating the Master Data Tables to Each Node:
All the master data tables (used in the attribute view) were replicated to all nodes slave nodes.
3. HANA Models:
We have to make sure, we've the most efficient models possible. All the filters were pushed down to the lowest level of the models. (There are many articles out there discussing the best practice for designing efficient HANA models). At a high-level
- Transaction Counter is computed in the Analytic View
- Calc View to wrap the Analytic View(s) with KEEP_FLAG = true for both StoreID and Month
With KEEP_FLAG set, Store ID and Month will always be selected from the underlying Analytic Views. Selecting these two fields part of the query will push down the Transaction Counters computation to individual partitions of the HANA nodes. This will reduce the data transfer between the partitions and nodes, hence better query performance.
With this, we are able to execute the Query of Trend of Average Check (Year over Year) for the last 12 months for the company under 30 secs.
But as I mentioned earlier, the Query performance was significantly worse with high concurrency. So at this stage, we are pretty much limited by the available CPU cycles. Obviously adding more hardware will improve the Concurrency and Query performance.
Design Consideration for Advanced Data Model:
Our bottleneck has been computation of distinct transaction count. HANA with the OLAP engine is very efficient in aggregating the data. So the thought is to convert this DISTINCT COUNT operation to SUM operation.
So our new design is to pre-compute the Transaction Count and store it for higher level queries and for the detailed level queries, computes the transaction count on the fly (as before).
So we create two aggregation tables as below
- Store Aggregation Table: “Sales Amount” & “Transaction Count” at Store, Calday, Half-Hour level
- Product Catgeory Aggregation Table: “Sales Amount” & “Transaction Count” at Store, Calday, Half-Hour & Category level
Size of the Store Aggregation Table: 20GB (for 25 months of data)
Size of the Category Aggregation Table: 150GB (for 25 months of data)
- 2 Analytic Views on the “Aggregation” Tables
- Integrate these 2 Analytic View in the Calculation view using a UNION node, with CONSTANT mapping
- GRAIN_FLAG = ‘STORE’ for “Store” Specific Analytic View
- GRAIN_FLAG = ‘CATEGORY’ for “Category” Specific Analytic View
- GRAIN_FLAG = ‘DETAIL’ for the Original Analytic/Calculation View
So now depending on the Query, we can use GRAIN_FLAG = ‘STORE’ or ‘CATEGORY’ or ‘DETAIL’ to read data from the appropriate Analytic View.
All the business logic and complex KPIs were defined in calculation views on top of this Calculation View. So all the business logic is at one place and work the same, irrespective of the data coming from the original tables or the aggregated tables. This GRAIN_FLAG field is cascaded to the subsequent calculation views for use.
So now depending on the Query, we can use GRAIN_FLAG = ‘STORE’ or ‘CATEGORY’ or ‘DETAIL’ to read data efficiently. So there are two big gains with
the aggregation table approach
a. the DISTINCT count is converted to a SUM. So there is massive drop in CPU usage (which gives very high-concurrency)
b. Also the volume of records in the aggregated table is small, so we the SUM operation is also faster (though only a little)
But how do we seamlessly switch between these views, without asking the end users?
This is where the Aggregate Aware function in Universe (or IDT) comes into play. (Again this is not a new feature and is there for a while). With appropriate configuration, the universe can automatically create the SQL Query with right value to GRAIN_FLAG in a WHERE clause.
Aggregate Aware in Universe:
Here is how the Aggregate Aware was set-up in Universe/IDT. As I mentioned this is not a new feature, your BI Expert should be already aware of this.
So with the aggregation solution in place, here are the Query performance for 25 concurrent users.
Trend of Average Check (Year over Year) for the last 12 months for the whole company: 10secs
- Drill-down on Geography Hierarchy – all the way to Retail Stores: 10secs
- Drill-down on Product Hierarchy – all the way to UPC/SKU level: 10 - 15secs
Please note, we are not recommending you to go on creating aggregate tables for all scenarios. However you can certainly consider this as an option after you have exhausted all other design options.