on 09-10-2008 11:14 AM
Hi,
Please some one explain me what is the relation b/w partition and compression in the cube?
Thanks
You've gotten some good responses, so I won't repeat them, but I would just like to clarify the compression process a bit. It is NOT actually "compressing" the data in the same sense as when we use database compression or file zipping and replace repeating values with tokens.
In BW, the compression process moves the data from the F fact table to the E fact table. Along the way, it removes the Request ID from fact rows (actually, I believe it just sets the value to 0). This allows the data in the cube ( to varying degrees ) to be summarized more, reducing the number of fact records that end up in the E fact table. So the compressed E fact table can be smaller due to the reduced number of rows in it, not because the data has acutally been "compressed".
e.g. Let's say you load store sales data to a cube every day of the month and the time grain of this cube is Calendar Month. By the end of the month, you would have 30 Requests in the F fact table - each Request representing a day's sales. So each store has 30 rows of data for that month. Now if we compress the cube and remove the 30 different Request IDs, the compression process will summarize those 30 row to just 1 row of sales data for each store (remember the time grain is calendar month), resulting in a 30 to 1 reduction in the data volume after compression ( assuming each store has sales each day of the month). It's pretty easy to see that this reduction in the number of rows that a query must read should result in faster query execution.
So the degree of compression is dependent on the granularity of your cube AND the data. SOme cubes, due to the data model, can yield little to no reduxtion in the number of rows (e.g. Material inventory cube) Also - some SAP doc will also refer to Compression as Condensing.
The Compression process can optionally perform Zero Elimination, which can further reduce the number of rows in the compressed fact table (again depending on teh data). Rows in the F fact table where all the KF values are zero are eliminated, as are any rows where the characteristic values match and all the KFs net to 0 once the rows are summarized. Zero elimination even goes to the effort to eliminate previously compressed rows in the E fact table that had a KF with a non-zero values, but after being updated with newer data, now have all KF = 0.
Hope this helps.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Rob,
Compression: The request id is deleted from the cubes and data is moved from F to E table. Compression is used to reduce the size of data in cubes.
Partition: Its done using the two critieria "0Calmonth & 0Fiscper". We use Partition to improve the query performance.
Partition works on the E Table. So for your partition to work, you need to compress the data first.
Regards,
Victor
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Rob,
I can only talk about my experience with Oracle databases. The following may or may not be true for other databases.
First you have to remember that there are E* and F* tables for an infocube. For each Request loaded to an infocube the F* table get a new partition. If you compress the Infocube, the F* Partitions are removed to the (usually) one E* Partition. So during the lifecycle of an infocube you create partitions each time you load a new request, and you remove partitions once you compress.
You can define partitions for the E* Table. These partitions are usually based on a timescale. They are always dependent on an infoobject of the cube. You can only delete this partitions by deleteing the content of the E* tables and reload again (at least if you keep in the BW tools).
Partitions - regardless of E* or F* type - can improve performance, because Oracle is able to collect data from partitions in parallel.
Kind regards,
Jürgen
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
Partitioning: You use partitioning to split the total dataset for an InfoProvider into several, smaller, physically independent and redundancy-free units. This separation improves system performance when you analyze data delete data from the InfoProvider.
Pre-requisite:
You can only partition a dataset using one of the two partitioning criteria calendar month (0CALMONTH) or fiscal year/period (0FISCPER). At least one of the two InfoObjects must be contained in the InfoProvider.
Compression: When you load data into the InfoCube, entire requests can be added at the same time. Each of these requests has its own request ID, which is included in the fact table in the package dimension. This makes it possible to pay particular attention to individual requests. One advantage of the request ID concept is that you can subsequently delete complete requests from the InfoCube.
However, the request ID concept can also cause the same data record (where all characteristics are the same except the request ID) to appear more than once in the fact table. This unnecessarily increases the volume of data and affects system performance when you analyze data, since each time you execute a query, the system has to perform aggregation using the request ID.
You can eliminate these disadvantages by compressing data and bringing data from different requests together into one single request (request ID 0).
Also refer these links
Hope it helps....
Priti
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
There is no relation between partitioning and compression of the cube except both are used to increase query performance.
and another relation is both are done on the E- fact table of the infocube.
REDDY
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Think of partitions as being mini cubes per partioned charactersitic on the database level
So if you run a query across two fiscal periods - the SQL gets split into two and hits two small cubes in parallel
(in reality it doesnt really work that way - but conceptually think of it liek that)
Thus if any query uses fiscal periods as a range in the filter the response time is super quick (of course with BIA this all goes out the window)
You can only partion on a compressed portion of the cube
ie again as explained in the post above - there are logically two cubes - uncompressed and compressed (and the compressed as mentioned above is partioend across disk)
User | Count |
---|---|
85 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.