cancel
Showing results for 
Search instead for 
Did you mean: 

integer vs varchar joins

rmuhuri
Participant
0 Kudos

Hi

I have come across a project where they have used a lot of ETL and done the datawarehousing the old classical way i.e generating integer Surrogate Keys. I believe they have done for performance as integer joins performs better that varchar joins . The dev system has 57 million fact table rows and dim tables have anywhere from 100 K to 1 m rows . So prod tables would be even horrendously larger.

so leaving aside modelling optimization techniques like partition pruning/ query pruning / etc etc . what is the best join strategy for just large tables .

The join fields are 10 characters , one option could be converting the alpha numeric characters to ascii or unicode but 10 characters would be blown to a 20 byte integers , so a join on just a big field would be worse .

any idea's on join optimization .

PS : we will be using SDI , so we can do complex transformation on real time .

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

You seem to assume that SAP HANA performs operations on the actual values of columns in a table.

That's not the case.

Instead, most of the very base operations like filter, join, project are performed with so called value-vectors. Only when the data needs to be materialized, e.g. to create a human readable output, the actual values are pulled from the so called dictionaries put in place.

The openHPI and openSAP courses about in memory databases cover that quite nicely.

So, to answer your question: don't bother "optimizing" your table design by making funky data type choices.

If you need a surrogate key, then create one.

If you don't need it,don't.

You can save storage by picking e.g. a numeric data type for numbers instead of a character data type - which is important in itself. But this won't have a tremendous impact on the join performance.

rmuhuri
Participant
0 Kudos

Thanks Lar . This is a modification / performance optimization project . The earlier implementation team had used extensive ETL process to have integer surrogate keys .

We want to simplify and yet not lose the performance . Your insight helps . Thanks

Also would creating inverted index on the varchar key help the performance ?

Answers (0)