Solved: Table partition in scale out architecture

Former Member · ‎04-25-2014

Hello,

We have a huge table with 857M records which it's key is GUID (In our reports we never select records from this table with the key\GUID) . Our system is scale out of 4 indexservers - 512G each.

We partitioned the table with 2 levels:

1.HASH for the key/GUID with number of partitions equal to the number of hosts = 4 in our case.
13 partitions of range according to the periods.

The table before the partition:

The table after the partition:

Questions:

Because the key has a key we were forced to partition according to it although we never use it, Will it improve the performance?
The memory consumed by the table was grown significantly from 73G to 107G and also he size on disk. Why?
After the partitione finished all the partition are located in one indexserver. I know I can move them but does it make sense? On the one hand the calculation will be parlalize between 4 indexservers on the other hand the traffic of data between the servers will be huge. Because the key of the table doesn’t have any meaning I guess that every select above this table will use all the 4 indexservers and have a lot of traffic.

What do you think?

Thanks,

Amir

lbreddemann · ‎04-26-2014

Somehow I know this partitioning scheme...

Ok, briefly to the questions:

1. Because the key has a key we were forced to partition according to it although we never use it, Will it improve the performance?

Why would it?

And what performance are you talking about?

Storing new data in the table?

Updating existing data?

Performing a delta merge?

Selecting data?

2. The memory consumed by the table was grown significantly from 73G to 107G and also he size on disk. Why?

While the primary key column (your GUID) cannot be compressed anyway, most of the other columns very likely can and will be compressed.

Now you copy the one single value catalog for each column 13 times and very likely it will contain the same set of values for each partition.

On top of that, the additional compression on each column might be less effective as now the data is distributed in a different way in each column.

3. After the partitione finished all the partition are located in one indexserver. I know I can move them but does it make sense? On the one hand the calculation will be parlalize between 4 indexservers on the other hand the traffic of data between the servers will be huge. Because the key of the table doesn’t have any meaning I guess that every select above this table will use all the 4 indexservers and have a lot of traffic.

Do you plan to shuffle the majority of all the data over from the slave nodes? Why would you want to do that?

Given the fact that you don't ever seem to access your data via the primary key (why are you using a expensive GUID then??) you could also simply go for round robin partitioning.

Also, the 13 range partitions... 12 month + others, right?

Really, before you even consider doing such a pattern, be absolutely sure about the request pattern of the reports you are going to run agains these tables.

And, again, if you don't provide the HASH key when accessing this table, the range partitioning won't be evaluated. Or expressed otherwise: you won't get what you guess you'd get.

- Lars

Table partition in scale out architecture

Accepted Solutions (1)

Accepted Solutions (1)

Answers (0)

Re: SAP MDK: Attachment delete button not working ...

Re: SAP Datasphere connection to SAP ARIBA (SAP RI...

Re: Problem with Select-Options

Re: SAP FIORI My Inbox - QM

Re: SAP MDK: Attachment delete button not working ...