Solved: HANA Merge and Optimize Compression process

Former Member · ‎10-18-2014

Hi,

I'd love to know if anyone has any insight into MergeDog works. The best article I can find is 2 years old: http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/10681c09-6276-2f10-78b8-986c68efb...

What I understand is that when you load, you load into the delta store, which is columnar, but isn't sorted or compressed, so inserts are fast. This has a penalty of read performance, so you periodically merge into the main store. Easy so far.

There is a token process - defaulting to 2 tokens per table by default (parameter token_per_table). You can force this, by using:

MERGE DELTA OF TABLE WITH PARAMETERS ('FORCED_MERGE' = 'ON')

This is supposed to use all available resources to merge, at the expense of system performance. In my system, it doesn't do this - instead using just 3 processes for the pre-merge check (which presumably evaluates which partitions/tables need merging) and then just one process for the merge itself. I have big tables, so the merge takes forever.

Now, some while after loading the tables, when the system is quiet, Mergedog wakes up and scans my tables again. It then goes and compresses the partitions using thread method "optimize compression". It is possible to force an optimize compression evaluation using:

MERGE DELTA OF TABLE WITH PARAMETERS ('OPTIMIZE_COMPRESSION' = 'ON')

I guessed that syntax, it isn't in any reference guide I could find. But it only causes an evaluation, and it won't run if you have high system load anyhow.

So does anyone understand how this thing works, how to force an optimize compression, how to get it to use more cores and finish faster? And whilst we're there... what does optimize compression actually do? Does it improve query performance in most cases and does it generally improve compression? Presumably this depends on the data in the table, and whether the change in entropy means a different compression technique would make a difference? Why is it needed? Surely when the merge process happens, it could happen any time since HANA builds a new main store anyhow, so it could easily recompress using a different algorithm during every merge?

My guess is it reads the statistics of the table and defines a compression algorithm for the dictionary and attribute vector (runlength, prefix etc.) and then recompresses the table using the most appropriate compression technique.

This is all incredibly clever and in 99% of cases it means you never need to touch the system, it is self-tuning and requires no maintenance. But there are extreme circumstances (like the one I'm in) where I really want to control this process!

Guessing about the only person who can answer this is but would be fascinated by anyone who understands this process!

John

lbreddemann · ‎10-18-2014

Hey John,

short on time, so just a brief response to "what does optimize compression actually do?".

(I believe I've covered that stuff in more detail in the book... ).

Optimize compression tries to figure out

a) after what column a table should be sorted and

b) the compression algorithm for the value vector for each column.

Obviously the overall compression efficiency depends on the type of data and the distribution of this data per column.

Since we reconstruct tuples based on the relative position (offset position) of the entries in the respective value vectors, the sort order for tuples has to be the same in all columns.

So, the tuple that is on position 42 in the first column has to be on position 42 in all columns.

Now, the goal is clear: find the sort order and the compression algorithm (you know... DEFAULT, INDIRECT, CLUSTER, RLE..) that allows best overall compression.

Still with me? good.

The single reason that makes one compression algorithm better for a specific column than the other is the data distribution. Depending on things like having one absolutely most common value or a recurring pattern of values and so on, different algorithms can be used.

Stupid example: color indicating column for cars (yeah, the car analogy again... whatever ). Let's say it has ten colors in total but red really stands out with over one third of all cars.

It would probably make sense to sort the tuples by color then and apply RLE to the color red entries.

Alright... as anyone with some exposure to optimization problems will guess finding the "best" combination might take some time. Also, with our usual database situations the actual kind of data and its distribution within the columns doesn't change that often.

Once the initial loading phase is passed, the data distribution is actually quite stable for a while.

Running the optimization every time with a delta merge would be pointless.

So, the compression optimization only is performed when a lot of data has changed - or when it's asked for manually.

Bottom line: the compression optimization run tries to find the optimal compression types for all columns of a table. The actual compression is performed with every delta merge.

Ok, here you go, now you know -

A great weekend to everybody.

- Lars

Former Member · ‎03-28-2016

This message was moderated.

Former Member · ‎10-18-2014

Hi John,

Great question to which I don't know the answer, but appreciate its importance. I'm sure Lars will jump in, but I think I heard somewhere that delta store doesn't have to be columnar, but rather a row store which would make inserts faster. it would be nice to confirm that, but your question stands of course.

Thanks,

greg

HANA Merge and Optimize Compression process

Accepted Solutions (1)

Accepted Solutions (1)

Answers (2)

Answers (2)

SAP scripting findbyID

Re: FB60 does not post. No error message is displa...

Re: Integrate an external task system to Cloud ALM...

Re: SAP BW - Master Data Delta Load

Re: How to add dynamically formcell or button tabl...