Skip to Content

Handling Slowly Changing Dimensions in HANA with SLT Data Replication

Tracking History on HANA:

Handling Slowly Changing Dimensions in HANA with SLT Data Replication

Aug 2013

CONTRIBUTORS

Abani Pattanayak,  SAP HANA Center of Excellence

Jody Hesch, SAP Business Analytic Services

Mark Wozniak, SAP HANA Solution Center

Mahesh Sardesai, SAP Database Technology Services

1. BACKGROUND:

A customer needs the capability to report against current (“as is”) and historical (“as was”) data for certain dimensions in their SAP HANA data mart system.

Dimensions whose values change over time and are captured are referred to as Slowly Changing Dimensions (SCDs). (Technically there are different types of SCDs which meet different business requirements. Those described in this document refer to SCDs of Type 2/6.)

Capturing SCDs is a well-understood task for data warehouses/marts and can be handled in various ways including by SAP Business Objects Data Services via the History Preservation transformation.

2. PROBLEM STATEMENT:

In many instances of SAP HANA data mart solutions, data is replicated from source systems into HANA via SAP LT Replication server (SLT). SLT does not come with history preservation features “out-of-the-box”. As such there is a challenge in addressing the best way to preserve history (in other words, how to track slowly changing dimensions).

An ideal approach should include:

- Good performance on large datasets*.

- Ease of implementation

- Maintainable over time

*Some dimension tables on current retail system at customer site exceed 300 million rows, so good performance is particularly important in instances like these.

3. SOLUTION APPROACHES:

The following documentation describes three solution approaches to capturing Slowly Changing Dimensions in a HANA system with SLT as the replication technology. The first two approaches are outlined briefly. The third approach, which has significant advantages over the first two, is outlined in detail.

All three approaches involve creating and executing stored procedures on a scheduled basis to populate a history table which we also call an ‘effective’ table. The effective table has, at a minimum, the same fields as the source table as well as validity period fields VALID_FROM and VALID_TO. The primary key fields are the same as the source fields and also include VALID_TO.

The effective table can be modeled as an attribute view joined to the Data Foundation in an Analytic View via a Temporal Join. Please see the SP6 SAP HANA Developer’s Guide, p. 175, found at http://help.sap.com/hana_appliance#section6 for details on this join type.

Approach 1 involves doing full table comparisons between source and effective tables and updating the effective table accordingly. (In this and following descriptions, ‘source table’ refers to the HANA table that has been replicated from the source system). This approach is the worst-performing of the three approaches and requires the most effort to implement for each set of dimensions for which SCDs are tracked.

Approach 2 involves creating DB triggers on the source table to capture the DELTA for INSERT/UPDATE/DELETE operations. This approach has better performance than Approach 1 and is easier to implement but can still be challenging to maintain over time. This approach also has performance issue, if high volume delta records.

Approach 3 entails capturing operation types of source data records (INSERT, UPDATE, DELETE) and flagging records accordingly. Then a stored procedure populates/updates the historical (“as was”) table with source data as well as respective validity periods. (Additional fields such as “Current Flag” can easily be included in the following approach.) Only the deltas from the source table are required (except for initial load), and no trigger is required. Also, SLT configs for this approach can be applied (reused) on any table. As such, Approach 3 is the best-performing, most maintainable and easiest to implement over the 3 approaches.

3.1 APPROACH 1: Table Compare

Approach 1 involves the following steps (in SQLScript pseudo code) in a stored procedure to compare source and effective tables and update the effective table accordingly.

IF effective table is empty THEN

- Populate effective table with copy of source table
- Setting VALID_FROM = ‘1900-01-01’
- Setting VALID_TO = ‘9999-12-31’

ELSE

1) SELECT all records that exist in either
      a.source table or
      b.history table and are currently valid
INTO variable call change_table

2) UPDATE all records in history table
      a. that correspond to records in change_table
      b.set VALID_TO = yesterday (timestamp)

3) INSERT all records in history table
      a.That correspond to records in change_table on all field values (not just keys)
      b. set VALID_FROM to today
      c. set VALID_TO to ‘9999-12-31’

3.2 APPROACH 2: DB Triggers

Step 1

Add a trigger to the source table that is executed ON INSERT and insert the corresponding record in the effective table.

Step 2

Add a trigger to source table that is executed ON DELETE and updates the corresponding record in the effective table (i.e. ‘expires’ that record by setting VALID_TO = yesterday). This should reference the old state of the record (REFERENCING OLD ROW <myoldrow>).

Step 3

Add a trigger to the source table that is executed ON UPDATE and

a.  Updates the corresponding record in the effective table (i.e. ‘expires’ that record by setting VALID_TO = yesterday). This should reference the old state of the record (REFERENCING OLD ROW <myoldrow>).

b. Inserts a new record in the effective table (VALID_FROM = today and VALID_TO = ‘9999-12-31’). This should reference the new state of the record (REFERENCING NEW ROW <mynewrow>)

Step 3

Create a stored procedure to initialize the effective table with the following logic.

IF effective table is empty THEN

-   Populate effective table with copy of source table

-   Setting VALID_FROM = ‘1900-01-01’

-   Setting VALID_TO = ‘9999-12-31’

One of the drawbacks of Approach 2 is the performance of the triggers when large sets of data are changed in the source table (mass inserts / updates / deletes).

We ran DELETE statements with filters that would impact different numbers of rows in the source table and arrived at the following measurements.

Statement 'DELETE FROM "ERPACC_RPDCLNT200"."MARC" WHERE WERKS = 'X133''

successfully executed in 790 ms 578 µs  (server processing time: 753 ms 555 µs) - Rows Affected: 500

Statement 'DELETE FROM "ERPACC_RPDCLNT200"."MARC" WHERE WERKS = 'X520''

successfully executed in 25.422 seconds  (server processing time: 23.823 seconds) - Rows Affected: 7394

Statement 'DELETE FROM "ERPACC_RPDCLNT200"."MARC" WHERE WERKS = 'X220''

successfully executed in 31.323 seconds  (server processing time: 30.734 seconds) - Rows Affected: 15011

Statement 'DELETE FROM "ERPACC_RPDCLNT200"."MARC" WHERE WERKS = 'X521''

successfully executed in 7:22.061 minutes  (server processing time: 7:20.096 minutes) - Rows Affected: 85827

Statement 'DELETE FROM "ERPACC_RPDCLNT200"."MARC" WHERE WERKS = 'X423''

successfully executed in 3:26:14.500 hours  (server processing time: 3:26:10.385 hours) - Rows Affected: 303485

As you see above, if the chunk size (no of records processed/deleted at one go) is 20K – 50K records, this approach will work fine. However, if more than 50K records were updated/deleted or inserted at one go, this approach will not work.

For this reason and others discussed already, we recommend the following approach (Approach 3).

3.3 APPROACH 3: SLT Configuration

Step 1

Define Table Deviation on the source table (again, ‘source’ refers to the HANA replicated table, and ‘target’ table would be that which is transformed, i.e. what we’ve been calling the ‘effective’ table. From an SLT perspective, however, the HANA table is the ‘target’ table).

Define the table deviation using the Edit Table Structure option in the IUCC_REPL_TABSTG tab in SLT.

As shown in the next screenshot, add two fields (ZZSLT_FLAG & ZZTIMESTAMP) to store Record Type and Timestamp. This can be configured using the Table Deviation

a.  ZZSLT_FLAG : NVARCHAR(1): To store record type (‘D’ - DELETE, ‘U’ – UPDATE, ‘I’ – INSERT/New)

b. ZZTIMESTAMP: NVARCHAR(14): Timestamp

Step 2

Define a transformation rule for the table (YMARC in this example) in the IUUC_ASS_RULE_MAP tab.

Export Field Name:  MANDT. We choose the first field of the table (MANDT). You can use any field from the table

Import Parameter 1: ‘<WA_R_YMARC>’ is the internal name used to address the receiver work area.

Insert Include Name: The is the ABAP include we need to create for the transformation

Step 3

Create ABAP include using t-code SE38 in the SLT system.

*&---------------------------------------------------------------------*

*&  Include           ZIUUC_DELETE

*&---------------------------------------------------------------------*

FIELD-SYMBOLS:  <ls_data>       TYPE any,

                <lv_operation>  TYPE any,

                <lv_delete>     TYPE any,

                <lv_timestamp>  TYPE any.

ASSIGN (i_p1) TO <ls_data>.

DATA tstamp  LIKE tzonref-tstamps.

DATA d TYPE D VALUE '20121224'.

DATA t TYPE T VALUE '235500'.

d = SY-DATUM.

t = SY-UZEIT.

SET COUNTRY 'US'.

CONVERT DATE d TIME t INTO

        TIME STAMP tstamp TIME ZONE 'UTC-5 '.

"Assign the Time Stamp to ZZTIMESTAMP field

ASSIGN COMPONENT 'ZZTIMESTAMP' OF STRUCTURE <ls_data> TO <lv_timestamp>.

IF sy-subrc = 0.

    <lv_timestamp> = tstamp."

ENDIF.

ASSIGN COMPONENT 'IUUC_OPERAT_FLAG' OF STRUCTURE <ls_data> TO <lv_operation>.

"For delete operation

IF sy-subrc = 0 AND <lv_operation>  = 'D'.

               

     "Change this to a update operation – so the record will not be deleted

      <lv_operation>  = 'U'.

     "Update the ZZSLT_FLAG to store D (for Delete)

      ASSIGN COMPONENT 'ZZSLT_FLAG' OF STRUCTURE <ls_data> TO <lv_delete>.

      IF sy-subrc = 0.

          <lv_delete> = 'D'.

      ENDIF.

"For all other operation

elseif sy-subrc = 0.

      "Update the ZZSLT_FLAG to store appropriate record type

      ASSIGN COMPONENT 'ZZSLT_FLAG' OF STRUCTURE <ls_data> TO <lv_delete>.

      IF sy-subrc = 0.

          <lv_delete> = <lv_operation>.

      ENDIF.

ENDIF.

Step 4

Set-up replication of the Table using HANA studio.

Step 5

Delete and update some records from the YMARC table.

As you see above, the delete records and update records are reflected in the ZZSLT_FLAG.

Step 6

Create a stored procedure to update the history table. Details of the stored procedure will be specific to customer solution. At a minimum, the following steps should be included, in addition to ‘housekeeping’ code such as error-handling.

-- STEP 1: Initial load
-- If target tables is empty, load source table into history table,
-- setting VALID_FROM = ‘1900-01-01’ and VALID_TO = ‘9999-12-31’

-- STEP 2: Record expiration
-- UPDATE history table, setting VALID_TO = yesterday, for all records in history table
-- that correspond to a record from source table WHERE ZZSLT_FLAG IN (‘D’, ‘U’).
-- Make sure source table is filtered against ZZTIMESTAMP*.
-- This expires deleted records and updated records.

-- STEP 3:Inserting new records
-- Insert source data into history table, setting VALID_FROM = today and VALID_TO =
-- ’9999-12-31’, for source records WHERE ZZSLT_FLAG <> ‘D’.
-- Make sure source table is filtered against ZZTIMESTAMP*.
-- This insert ensures that new records (INSERT) are captured in history table, and latest
-- version of current records (UPDATE) are also captured in history table.

* It’s recommended that additional housekeeping code is included in your stored procedure framework that captures dates of successful procedure execution dates. Then, the above code can be filtered via ZZTIMESTAMP > :last_successful_execution_date, where last_successful_execution_date is a scalar variable populated from a query against a procedure execution status table.

Step 7: Schedule stored procedure execution

If SAP Business Objects Data Services (BODS) is part of your landscape, procedures can be scheduled daily (or as often as required) via BODS.

If BODS is not an option, the following link from the SAP HANA Academy demonstrates how to schedule data loads via CRON scheduler. This approach can be modified to schedule your stored procedure calls.

http://www.saphana.com/docs/DOC-2909

4. OPEN QUESTIONS:

There are few open issues with all the three approach, which need to be considered before applying these solutions to a production environment.

1.    As you see the granularity of the effective table is at the Day-level. So what happens if there are multiple changes for a record in a single day? Do you have to track all these changes or just the last one?

2.    What happens if something goes wrong with the SLT server and you need to recreate the configuration and replicate the table again? You have to make sure the Timestamp delta pointer is appropriately adjusted.

5. CONCLUSION:

Please note each of the above approach may work depending on the data volume in the table and no of delta records. 

For example Approach 1 may work just fine for low volume of data, say for site master (T001W), for which typical volume is 100 – 10K. However, Approach 2 will be a better fit for this scenario, since there is no need to schedule any procedure on a daily basis. But then these triggers need to be recreated, if the source table was dropped for any reason.

Approach 2 will also work fine for high volume scenario, if and only if the delta records to HANA is flowing continuously, so that it does exceed more than 10K/min at any particular time.

Approach 3 will work in almost all scenarios. So Approach 2 is better than Approach 1, but Approach 3 is the best option of the three.