cancel
Showing results for 
Search instead for 
Did you mean: 

HANA Merge and Optimize Compression process

Former Member
0 Kudos

Hi,

I'd love to know if anyone has any insight into MergeDog works. The best article I can find is 2 years old: http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/10681c09-6276-2f10-78b8-986c68efb...

What I understand is that when you load, you load into the delta store, which is columnar, but isn't sorted or compressed, so inserts are fast. This has a penalty of read performance, so you periodically merge into the main store. Easy so far.

There is a token process - defaulting to 2 tokens per table by default (parameter token_per_table). You can force this, by using:

MERGE DELTA OF TABLE WITH PARAMETERS ('FORCED_MERGE' = 'ON')

This is supposed to use all available resources to merge, at the expense of system performance. In my system, it doesn't do this - instead using just 3 processes for the pre-merge check (which presumably evaluates which partitions/tables need merging) and then just one process for the merge itself. I have big tables, so the merge takes forever.

Now, some while after loading the tables, when the system is quiet, Mergedog wakes up and scans my tables again. It then goes and compresses the partitions using thread method "optimize compression". It is possible to force an optimize compression evaluation using:

MERGE DELTA OF TABLE WITH PARAMETERS ('OPTIMIZE_COMPRESSION' = 'ON')

I guessed that syntax, it isn't in any reference guide I could find. But it only causes an evaluation, and it won't run if you have high system load anyhow.

So does anyone understand how this thing works, how to force an optimize compression, how to get it to use more cores and finish faster? And whilst we're there... what does optimize compression actually do? Does it improve query performance in most cases and does it generally improve compression? Presumably this depends on the data in the table, and whether the change in entropy means a different compression technique would make a difference? Why is it needed? Surely when the merge process happens, it could happen any time since HANA builds a new main store anyhow, so it could easily recompress using a different algorithm during every merge?

My guess is it reads the statistics of the table and defines a compression algorithm for the dictionary and attribute vector (runlength, prefix etc.) and then recompresses the table using the most appropriate compression technique.

This is all incredibly clever and in 99% of cases it means you never need to touch the system, it is self-tuning and requires no maintenance. But there are extreme circumstances (like the one I'm in) where I really want to control this process!

Guessing about the only person who can answer this is but would be fascinated by anyone who understands this process!

John

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

Hey John,

short on time, so just a brief response to "what does optimize compression actually do?".

(I believe I've covered that stuff in more detail in the book... ).

Optimize compression tries to figure out

a) after what column a table should be sorted and

b) the compression algorithm for the value vector for each column.

Obviously the overall compression efficiency depends on the type of data and the distribution of this data per column.

Since we reconstruct tuples based on the relative position (offset position) of the entries in the respective value vectors, the sort order for tuples has to be the same in all columns.

So, the tuple that is on position 42 in the first column has to be on position 42 in all columns.

Now, the goal is clear: find the sort order and the compression algorithm (you know... DEFAULT, INDIRECT, CLUSTER, RLE..) that allows best overall compression.

Still with me? good.

The single reason that makes one compression algorithm better for a specific column than the other is the data distribution. Depending on things like having one absolutely most common value or a recurring pattern of values and so on, different algorithms can be used.

Stupid example: color indicating column for cars (yeah, the car analogy again... whatever ). Let's say it has ten colors in total but red really stands out with over one third of all cars.

It would probably make sense to sort the tuples by color then and apply RLE to the color red entries.

Alright... as anyone with some exposure to optimization problems will guess finding the "best" combination might take some time. Also, with our usual database situations the actual kind of data and its distribution within the columns doesn't change that often.

Once the initial loading phase is passed, the data distribution is actually quite stable for a while.

Running the optimization every time with a delta merge would be pointless.

So, the compression optimization only is performed when a lot of data has changed - or when it's asked for manually.

Bottom line: the compression optimization run tries to find the optimal compression types for all columns of a table. The actual compression is performed with every delta merge.

Ok, here you go, now you know -

A great weekend to everybody.

- Lars

Former Member
0 Kudos

Hello Lars,

Can you explain this? We are using BW 7.31 with HANA 97.02 and have an un-partitioned DSO table with 1.4 billion rows. Memory/Disk 110GB/120GB.

We deleted 400 Million rows and ran

MERGE DELTA OF TABLE-NAME WITH PARAMETERS ('FORCED_MERGE' = 'ON');

This ran successfully, and used about 300GB of memory to perform the merge.


Now it is 168GB im memory and 320GB on disk!


Thanks


PS I have your book.


Tom





Former Member
0 Kudos

Deleted records (unless located at the end of the table) can only be removed by a compression optimization, not by a merge. Additionally it may take some time until the persistence garbage collector kicks in and cleans up some relics on disk.

Former Member
0 Kudos

Right. Try:

MERGE DELTA OF TABLE-NAME WITH PARAMETERS ('OPTIMIZE_COMPRESSION' = 'ON')

We discussed this in some detail here: https://scn.sap.com/thread/3637730

You will also probably have to create a savepoint:

ALTER SYSTEM SAVEPOINT

and if you want to clear actual disk space, defragment:

ALTER SYSTEM RECLAIM DATAVOLUME 120 DEFRAGMENT

You could of course just leave HANA to itself, and it will manage everything nicely without you getting involved.

Former Member
0 Kudos

Thanks John,

I looked  today and HANA hasn't done any housekeeping, maybe it doesn't work on weekends.

More details:

The first time I ran the force delta merge, it merged 1.6 million records, then failed with "not enough memory for optimization."

I saw this blog and did run MERGE DELTA OF TABLE-NAME WITH PARAMETERS ('OPTIMIZE_COMPRESSION' = 'ON'). This made no difference.

Next I ran MERGE DELTA OF TABLE-NAME WITH PARAMETERS ('FORCED_MERGE' = 'ON');

again. It merged another 73,000 records and completed with no errors.

I have not tried another MERGE DELTA OF TABLE-NAME WITH PARAMETERS ('OPTIMIZE_COMPRESSION' = 'ON'


Again no difference. We have 1TB of memory and while doing this work, I need to manually unload a large number of tables to avoid OOMs, and need to do it in a quite time.


We have gone for an easy work around, which works.

1) Copy the data we want to save to another partitioned DSO.

2) run SQL command "truncate table " on the 168GB table

3) Partition it with RSDU_WODSO_REPART_HD

4) Problem solved


I know truncate table will reduces the size in a few minutes, I tested this on a copy of this database in our TEST system.


have a good weekend


Tom







Former Member
0 Kudos

Right, that would have also been my next suggestion, at least to repartition.

The problem is that your partition got too big at 1.5b rows, compared to the amount of memory you have available for the optimize compression, so it fails. You've done the right thing by repartitioning it, now each partition can be individually managed for compression.

Former Member
0 Kudos

Hi Lars,

You mentioned "the compression optimization only is performed when a lot of data has changed - or when it manually asked for", do you know if HANA upgrade will trigger this process automatically?

We upgrade our HANA from SPS07 to SPS10, after upgrade complete, we found HANA performance is very slow due to optimize compression started against multiple big tables we had in HANA, this make HANA system un-usable for hours, do you if there has way to control this behavior?

Thanks,

Xiaogang

Former Member
0 Kudos

In the past range partitioned tables were compressed to rarely, e.g. BW fact tables. This has changed with SPS 10. So if you mainly see range partitioned tables being compressed, this special improvement with SPS 10 can be the reason. As a consequence you should see a significantly reduced memory footprint. Theoretically you can set the auto_decision_func of the compression optimization back to the previous default value (temporarily), but usually this shouldn't be required.

Former Member
0 Kudos

Hi Martin,

Thanks for the reply. In this case, do you know if there has anything we can do to accelerate the compression process? e.g. change the configuration to adding more threads for compression job...we asked same question to SAP via OSS message, but reply is not clear as they are saying it's doable but adding additional threads could may system slower for other tasks, we have tons of free CPU capacity available and we can change configuration back after compression job complete, so I just wonder if by chance you know the answer.

Thanks,

Xiaogang.

Former Member
0 Kudos

I have now listed some parameters in SAP Note 2222250 -> "How can workload management be configured for CPU and threads?" -> "Compression optimization". Normally adjustments shouldn't be required.

Answers (2)

Answers (2)

Former Member
0 Kudos

This message was moderated.

Former Member
0 Kudos

Hi John,

Great question to which I don't know the answer, but appreciate its importance. I'm sure Lars will jump in, but I think I heard somewhere that delta store doesn't have to be columnar, but rather a row store which would make inserts faster. it would be nice to confirm that, but your question stands of course.

Thanks,

greg

Former Member
0 Kudos

I can answer your question if not my own

The Delta store is columnar, I used to think it was row until Hasso corrected one of my blog.    The column format is in general a good balance for read/wrote performance under most circumstances.

There is however a row cache that you don't see on any architecture slides that comes into play when high frequency inserts are required and the delta store can't keep up. This is committed to the delta store as soon as possible. I believe it was introduced in SPS06 and it's what gives HANA awesome OLTP performance.

Now you know

Former Member
0 Kudos

Thanks for clearing that up, then