cancel
Showing results for 
Search instead for 
Did you mean: 

How to find "good" aggregate candidates?

marco_simon2
Participant
0 Kudos

Hi specialists,

I'd like to know how one can find good aggregate-candidates - without hopping from bw-user to bw-user and ansking him "which characteristics do you use mainly"?

I know that the rsrt gives the possibility to show the aggregate that would have been used if it would have exist.

The hatch: Aggregates are on a "per cube" base - not really on a per query-base.

It would be nice if the BI-Statistics could provide information about the mainly used characteristics-combinations per Cube. But I did not find such an information yet.

So my question is: How to you determine good aggregate-candidates - without "guessing" and "user-hopping"?

Accepted Solutions (0)

Answers (2)

Answers (2)

yogesh8984
Contributor
0 Kudos

Hi,

The characteristics which are present in the selection screens of the reports are good candidates to make aggregates upon.

Regards,

Yogesh

marco_simon2
Participant
0 Kudos

Are they?

Generally only the chars that are used for selections and which are part of the drill-down are relevant for the aggregates.

If I've got 10 variables on the variable selection screen and in 80% of all cases only 3 of them are used - then my aggregate would have 7 senseless characteristics and the aggregate would be over-populated...

Besides that: The variable-selection-screen is query-dependant - and so it does not help in any way because I'm looking for a query-independant base to determine characteristics that are valuable for aggregates.

Imho the only way to find this valuable char-combinations is analyzing the the way how users report the business-warehouse data. And this user-behavious is imho reflected in the BI-Statistics (which are based on the tables mentioned by Jürgen). But till now I didn't find any information about "characeristic-usage" in there.

Edited by: Marco Simon on Sep 19, 2008 4:20 PM

yogesh8984
Contributor
0 Kudos

Obviously you have to have the understanding of the reports and how they will be executed by the users. I just mentioned the direction. Anyway you have to do a thorough analysis before fixing it as having more number of aggregates can make havoc.

Regards,

Yogesh

Former Member
0 Kudos

Hello Marco,

your approach to finding aggregates is really stimulating

Okay, some suggestions which came to me in the last hours:

- I didn't find any hint in the RSDDSTAT tables how to get the used drilldown of the users. Perhaps someone else has an idea.

- It is possible to activate Infoobjects to be reported by RSDDSTAT tables. I never tried this, but it may be possible, that each Filter setting and each drilldown leave their traces.

- Perhaps finding out the cardinality of the dimensions will help. This should be possible by checking the number of entries in the dimension table. So what can I do with this information: I can try to find out, if any of the bigger dimensions can be aggregated. But this requires a lot of semantic knowledge about what the characteristics in the dimension are and what they mean.

That's it for the moment. I think my subconscious will work on this problem during the weekend. Perhaps something will come up.

Have a nice weekend,

Jürgen

marco_simon2
Participant
0 Kudos

Hi Jürgen,

first of all: thanks for that suggestions and picking-up/making mind about my aproach.

The 2nd mentioned point is (at least I guess so) what I'm calling "BI Statistics" all the time.

Have a look at SPRO>IMG>"Activate Technical Content in SAP NetWeaver BI"

(although I'd wonder if you didn't know that already).

http://help.sap.com/saphelp_nw70/helpdata/DE/f2/e81c3b85e6e939e10000000a11402f/content.htm

http://help.sap.com/saphelp_nw70/helpdata/EN/f2/e81c3b85e6e939e10000000a11402f/content.htm

Thanks to your subconscious in advance!!

Late Edit:

Perhaps you didn't notice that yet:

There's a nice EClass-Tutorial from Gary Nolan - title "Best Practice for Creating and Optimizing Aggregates in SAP BW".

https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/e55aaca6-0301-0010-928e-af44060b...

https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/media/uuid/c9accf89-0701-0010-78b5-d37040534c...

I belive this tutorial to be a good step into the right direction.

Former Member
0 Kudos

Hi Marco,

unfortunatly my subconscious didn't come up with some new ideas.

Regarding my second point: I just put the hint about the Infoobjects here, because I only activate the monitoring of Cubes and Queries for the technical content yet. But that is, because I expect the technical content in BI grow by leaps and bounds, once I activate it for every available reporting object. Even in our - not yet fully engaged - BI system, the RSDDSTAT* tables takes about 10% of database space at the moment. With "not yet fully engaged" I mean only 10% of the data provider are monitored. You have to monitor the content of RSDDSTAT* tables very carefull, because they will not get smaller if you delete data from them (At least unless you trigger the appropriate Oracle tools I know...). So deleting the content of the RSDDSTAT tables once per month or week is not optional, unless you have unlimted disk space. And I found it a bit awkward to find some of the RSDDSTAT* tables as number one in size.

Okay, back to topic:

All my optimizations today are according to semantic knowledge of the data. Knowing that certain user groups will use certain level of aggregation because of semantics in the cube. Everything else is just try and monitor if the aggregates are used. I am aware that this is not a good idea for large environments where creating an aggregate is time consuming. But I had the possibility to keep almost all of our cubes small, by Multiprovider partitioning. And trying to create aggregates for a Subcube of a multiprovider to see how it is used, can be a good way to test.

And back to your original idea: Nice approach, but I don't see a way with the available tools.

I will keep a watch on this thread. Perhaps someone else offers some interesting insights.

Kind regards,

Jürgen

Former Member
0 Kudos

Hi Marco,

first question would be: What is a good aggregate ?

Usually this is a aggregate which is often used and which will really speed up queries.

I only try to find aggregates if users start to complain, because maintaining them can be verytime consuming. So first rule: Create no aggregate unless you have no other choice.

If users complain, I go to the RSDDSTAT tables. Unfortunatly you did not mention if you use 3.5 or 7.0. Both use different RSDDSTAT tables. But they have one thing in common: They deliver Information about DB performance, number of read entries and number of used entries. An aggregate will only help you if the performance problem is in the DB time. Everything else is cache, release, client.

Okay, assuming you found a lot of queries which a long DB time. Then you can look at the number of read entries. If the read entries and the used entries are on the same level, no Aggregate will help you. If there are significantly more read entries than used entries, an aggregate may help (example: 100.000 read, 20.000 used is worth considering an aggregate). If you found queries with characteristic, you have to take a look at them and find out which causes the large number of reads. Then hopefully you will find out which aggregate would be usefull using RSRT.

There is categorie of Aggregates which come in handy: Aggregates on Hierarchy levels. If some of your Queries use hierarchies, it may be usefull to create aggregates on a middle hierarchy tier. Especially if the users tend to start at the hierarchie node an drill down.

That are some thought to consider. Hope it helps.

For more, I would like to suggest the most interesting SAP Training I ever did: TEWA50 BW Query Tuning. This training is always mentioned in the Early Watch Alert, but I couldn't find any entry in the SAP Training schedule since BI 7.0.

Kind regards,

Jürgen

marco_simon2
Participant
0 Kudos

Hi Jürgen,

unfortunately your post does not give me much new information. I already knew the way you described (considering the rows-selected/rows-transfered ratio.).

But that method is based on a per-query-base again - I'm looking for a more general way - for the crowd of people using different reports and complaining about bad performance when requesting Data from Cube "xyz". I'd like to figure out which char-combinations are used mainly - for building a fitting aggregate and making the majority of the complaing users happy.

Anyway - in general your description will certainly help - so I assigned points.

Late Edit: Sorry - I'm talking about BI 7.0