cancel
Showing results for 
Search instead for 
Did you mean: 

early arriving facts

former_member214071
Participant
0 Kudos

good day everyone,

what's the best or most common way to handle the case when the fact data arrive early than dimension data.  how do you handle for instance a sales fact row that salesmanid does not yet exists on salesman dimension table?

Thanks in advance.

Accepted Solutions (0)

Answers (1)

Answers (1)

former_member187605
Active Contributor
0 Kudos

The approach hasn't changed for over more than 10 years: http://www.kimballgroup.com/2004/08/design-tip-57-early-arriving-facts/

former_member214071
Participant
0 Kudos

Hello Dirk,

I understand the logic, my question is how to implement it in DS.  I've seen a video where implements to insert a "dummy" row on dimension table if incoming rows in fact tables has a record that has not matching dimension fk. but the dataflow tends to be a little bit tricky and it does for two dimensions fields only,  if the fact table contains 10 or more fk colums to dimension tables the dataflow could become pretty big. (not sure about impact on performance)

According what I've read another possible solution could be using a "stage" table for the fact data, for what I understand the fact rows are inserted on this stage table first, then join to dimensions and insert on real fact table only the rows that have valid fk links to dimension tables, then somehow delete in the stage table the rows that were inserted to the real fact table, so in next runs when dimension data is complete all the remaining rows on the stage table will pass to the fact table.  Still have to figure to implement this logic on the DS tool.

Thanks.

former_member187605
Active Contributor
0 Kudos

I don't do dummy rows in dimension tables. I add extra columns to my facts table, one for each natural foreign key from the source.

When loading the facts table, do the normal processing, i.e. lookup the surrogate key from the dimension (or use an outer join). For every dimension record that does not exist (the loookup returns null), store the natural key in the corresponding column.

Build a 2nd dataflow that uses the facts table both as source and target. Make sure all normal row types are changed into updates (map operation, autocorrect load). Select only those records that have a not-null value in at least one of the extra columns. Do a lookup of the surrogate key for every not-null natural key.

Run that datflow at any later point in time:

1/. The dimension record has been added in the meantime: the lookup will return the natural key; put in its place and blank out the extra column.

2/. The dimension record is not there yet: leave your fact record unchanged and cross your fingers, because it may arrive late and you can process it next time.

There's almost no processing overhead, assuming the number of early arriving facts is low.

There's almost no storage penalty when using column store tables (HANA ).