cancel
Showing results for 
Search instead for 
Did you mean: 

Removing an aggregate for a dimension

Former Member
0 Kudos

Hello all,

I am new to BI and I was asked to help fix a performance issue for one of the reports.  One of the things that i noticed is that the SQL generated has a max on a nvarchar string (200 characters long) so the performance for the overall report is very bad (> 9 min for 500K recs), but if I execute the script by hand as seen in SCRIPT2, it takes less than a minute.

--SCRIPT2

SELECT

  sum(Table2.Item_count),

  Table1.NAME,

  Table1.ID

FROM

  Table2

           INNER JOIN Table1 ON (Table1.ID = Table2.IntID)

GROUP BY

  Table1.ID,

  Table1.NAME

Below is a simplified version of the SQL script generated from the .UNV & .UNX:

--SCRIPT1

SELECT

  sum(Table2.Item_count),

  MAX( Table1.NAME ),

  Table1.ID

FROM

  Table2

           INNER JOIN Table1 ON (Table1.ID = Table2.IntID)

GROUP BY

  Table1.ID

My question is can I change a universe parameter to change the default aggregation of the string to not aggregate, but to place it into the group by as seen in SCRIPT2 as this impacts many other reports.

Thanks in advance for any help and suggestions.

Accepted Solutions (0)

Answers (3)

Answers (3)

Former Member
0 Kudos

Updated info:

Using a BO 4.0 SP4 patch 9 environment.

Occurs for any string of any type in my environment (nvarchar/varchar/char) -- I don't have text fields.

Former Member
0 Kudos

Most likely the object which used expression Table1.NAME, is defined as a measure.. If its a Varchar field it should be defined as a dimension

Former Member
0 Kudos

Durgamadhab, it is a dimension not a measure.

Former Member
0 Kudos

It should not do a Max for a Dimension.. Is this a common behavior for all measures in this universe?  any other universe affected?

Also check if the projection for the measure is SUM or not...

Also how this dimension is defined? I hope there is no Max defined in dimension it self..

Former Member
0 Kudos

There is no common behavior for all measures.  If I use any measures - summed values, counts, averages, etc - and try to attach a string definition, (name, code, etc), it will have MAX( name ), MAX( code ), etc., which makes no sense to me at all as all of these string definitions are Dimensions so it would not need an aggregate type of function.

In the IDT, under the SQL definition, the SELECT portion states: Table1.NAME so there is no aggregation function.  I also looked at the script using the Show Script button and it pulls SELECT Table1.NAME from Table1.

In the UDT, under the fx (formula) screen, the Select portion also has Table1.NAME so there is no aggregation.

Former Member
0 Kudos

how is the measure defined does it have the aggregation function defined correctly in definition

like SUM(table.measure)  ?

Former Member
0 Kudos

for each measure defined, they have the aggregate defined correctly - such as sum( ... ), count( distinct ... ), count( ... ), min (...), avg( ... ), etc.

Former Member
0 Kudos

Is it just doing this for nvarchar columns but varchar columns are fine?

XI3.1 doesn't like nvarchar and doesn't understand nvarchar(max) at all.

This may be a carry over from that which SAP haven't sorted out.

Try changing the sql to cast(table.column as varchar(200)) and see how that affects performance.

Former Member
0 Kudos

All nvarchar and varchar columns.  There is no varchar (max) or nvarchar(max) values.  Most of them fall in the (n)varchar( 100 ) to (256) range.

Working in version 4.0 SP4 patch 9..

In researching this issue, I saw an obscure article on a blog/forum that mentioned a universe parameter that could be checked/activated.  However, I don't have that link as I was using someone else's computer at that time - and no, I cannot get it as the history was flushed.  I do remember that the title actually was not even for this issue but for something else entirely.

To re-iterate, the issue is this:

SELECT MAX( TBL.NAME ) AS COLUMN_01, -- does not matter if it is varchar or nvarchar

       SUM( TBL.VALUE1 ) AS COLUMN_02,

       SUM( TBL.VALUE2 ) AS COLUMN_03,

       AVG( TBL.VALUE3 ) AS COLUMN_04

    FROM TBL

GROUP BY

       TBL.ID

From that article, it stated that checking a universe parameter would change ALL queries from the above to the script below, which has better performance.

SELECT TBL.NAME AS COLUMN_01, -- does not matter if it is varchar or nvarchar

       SUM( TBL.VALUE1 ) AS COLUMN_02,

       SUM( TBL.VALUE2 ) AS COLUMN_03,

       AVG( TBL.VALUE3 ) AS COLUMN_04

    FROM TBL

GROUP BY

       TBL.ID

       TBL.NAME

Former Member
0 Kudos

If it's the obscure blog that I think it might be, they may have "borrowed" my idea of setting BLOB_COMPARISON to Yes

Former Member
0 Kudos

Mark, read a description of what the BLOB_COMPARISON parameter does

"Species if a query can be generated with a DISTINCT statement when a BLOB file is used in the SELECT statement. It is related to the setting No Duplicate Row in the query properties."

but still don't understand why it would cause the SQL to generate differently, Max( ... ) to a GROUP BY (...)?

Former Member
0 Kudos

Oh I know what it says but does SAP software always do what it claims

What is the setting of the ANSI_92 parameter?

Former Member
0 Kudos

You are correct about the SAP software doing what it claims as I am learning all about that now.

ANSI_92 parameter = YES

Not sure if these will help, but here are some of the settings that I think are correlated.

BLOB_COMPARISON = NO

CORRECT_AGGREGATED_CONDITIONS_IF_DRILL = NO

DISTINCT_VALUES = DISTINCT

FORCE_SORTED_LOV = NO (some of the string columns are LOV's)

SHORTCUT_BEHAVIOR = GLOBAL

Former Member
0 Kudos

Give it a try changing the BLOB_COMPARISON to YES, it worked for something similar in XI3

Former Member
0 Kudos

I tried it but did not work.  However, I did create a new object called TBL.NAME (no index) which has no index on it and applied it to the query and it did work SQL wise meaning that it grabbed the ID in the filter (WHERE clause) and placed it in the GROUP BY.

The output query looks like:

SELECT TBL."NAME (not indexed)" AS COLUMN_01, -- does not matter if it is varchar or nvarchar

       SUM( TBL.VALUE1 ) AS COLUMN_02,

       SUM( TBL.VALUE2 ) AS COLUMN_03,

       AVG( TBL.VALUE3 ) AS COLUMN_04

    FROM TBL

WHERE TBL.ID IN ( 1,2,3)

GROUP BY

       TBL.ID

       TBL."NAME (not indexed)"

But I do not want to add 60 to 70 extra fields unless it is necessary.  Do you have any other suggestions?

Thanks.

Former Member
0 Kudos

No, you're right, you shouldn't have to.

Are you sure that it's the tables and not views with hints that you've had exposed to you?

I've seen it where a dba has set up a separate schema with just views in it so the nasty BO developers cannot get at his precious data

Another thing to check - in your connection details, do you have anything set up in the ConnectInit box?


Former Member
0 Kudos

James, Ive been having a bit more of a think about this and realised another situation where you will get this. Your objects will have Index Keys set up. Have a look on the proprties of one of your text objects and remove the keys. I reckon you'll see the maxes disappear - the TBL.ID value in the group by clause rang a bell with me and I realised that's what it would be.

If you have a star schema, I'd seriously question the usage of them - happy to continue that discussion in this thread as to the benefits and limitations of them.

Former Member
0 Kudos

Hi Mark,

Sorry for the delay in replying.  In the midst (end?) of a release cycle so the last week and next couple of weeks are going to be hectic.

To answer your questions starting from the Feb 6, 2014 date,

1) I am working off of tables (not views) for about 95% of the universe.

2) The ConnectInit box? I am not yet familiar with it.  Can you point out the location of it?

3) I tried it out and it does work for the one object that I had but wouldn't it affect the prompt/filter - I haven't had enough time to do a full test on the ramifications of that change.

For example, wouldn't the removal of the indices cause the filter (where clause) to grab strings instead of id's?

WHERE ItemName in ( N'Apple', N'Banana', N'Cherry') --WHERE ItemName in ( @prompt( ' select item', 'N', multi ... ) )

instead of

WHERE ItemId in (5, 17, 200, 1201) --WHERE ItemId in ( @prompt( ' select item', 'N', multi ... ) )

Or is it using the DB server's indices?

4) I had a quick answer from the lead dev and he states that is a star schema right now.  Not sure if I fully understand that yet but will read up on that.

I would love to hear your thoughts on the benefits and limitations of them.

Looking in your profile, I noticed that you were working on the Webi SDK and I have a separate question regarding SDK's but don't want to put it in this thread so as to not clutter this one.  Should I start a separate thread?

BTW, thanks for all the information so far.

Former Member
0 Kudos

Index awareness is a notional thing. It is called index awareness but is actually not. It is about the user providing details on the Keys tab of the object in Designer. The reason that it is called index awareness is that it assumes that you are following best practice and indexing your key columns. Hence, you are making the universe aware of columns that, in a textbook best practice deployment, are indexed.

Please note that you do not need your key columns to be indexed for this to work, you just need your database's referential integrity to be correct. I'll run through the basics of it.

Say you have a retail data warehouse. You buy goods from suppliers, receive them into your warehouse, deliver them to your stores and your stores sell the products to customers. As such, you would have a series of fact tables and probably aggregated fact tables, all of which have their own surrogate keys and contexts and all the other good best practice stuff. Now let's say you had a Warehouse dimension and a Deliveries Fact table, the Warehouse dimension is one of the arms of the star from the deliveries fact table, which records which products we shipped from our warehouses to our stores.

From that, there's an object called Warehouse Name in our universe.

It is defined as D_WAREHOUSE.WAREHOUSE_NAME

In the Keys section, I would have a Primary Key of D_WAREHOUSE.WAREHOUSE_ID

I would also have a Foreign Key of F_DELIVERIES.WAREHOUSE_ID

So, that's the object.

I want to know about deliveries from warehouses because we want to sharpen up logistics with fuel costs spiralling. So, I build a query in Webi. I drag Warehouse Name, Store Name and Number of Pallets into my results and in my conditions I have my predefined condition to restrict to Last Month's Deliveries and I choose Warehouse Name = Worksop because I want to see how my deliveries look in the UK's East Midlands area and can I utilise my under-used warehouses in yorkshire or the West Midlands.

Without the Keys enabled, I would get, roughly (and without ANSI_92 set to Yes)

select D_W.WAREHOUSE_NAME, D_S.STORE_NAME, SUM(F_D.NO_OF_PALLETS)

FROM F_DELIVERIES F_D, D_WAREHOUSE D_W, D_STORE D_S, D_DELIVERY_CALENDAR D_DC

WHERE D_W.WAREHOUSE_ID = F_D.WAREHOUSE_ID

AND my other joins here..

AND D_W.WAREHOUSE_NAME = 'Worksop'

GROUP BY etc....

I'm sure you get the picture on how SQL looks

Now, with keys enabled, what would happen is, the Foreign Key awareness would kick in and the bolded row would not be seen and the D_WAREHOUSE table would not be used. Instead, BO would quickly lookup the WAREHOUSE_ID for Worksop and put it into a where clause as:

F_D.WAREHOUSE_ID = 123456 or whatever it may be.

Sounds great if you aren't using anything from D_WAREHOUSE and it is - you've eliminated one table and a join from your query and you're looking up against an indexed column (that said, I'd have indexed the Warehouse Name column with it being a likely candidate for where clauses and group by clauses.

However, this only works when you choose the Warehouse Name from a list - if you type it manually or respond to a prompt it won't work. It also won't work for a derived table because you won't have a key (unless that has changed since I last looked).

It will also affect the LOVs and makes them duplicate e.g. SCDs and Product Category/Store Regions would duplicate because the surrogate key makes them unique. You also cannot use the technique of adding "ALL" in your list of values for the same reason.

Also it doesn't work for aggregate aware objects (or again didn't used to) because the key definitions will be all over the place.

Hopefully that's cleared it up for you - someone has set this up with best intentions without fully appreciating the limitations of it. I'd have expected performance to improve for you but obviously not).

(Oh, one final thing before I forget, I gave up on the SDK and let our C# developer take over - his aptitude for programming far outweighs my patience with it!)

former_member182521
Active Contributor
0 Kudos

You may need to set the default aggregation set on Table1.NAME object to none at it's object properties.

Former Member
0 Kudos

Manikandan, in the UDT, I am looking at the properties tab for that dimension and the only things that I see are as follows:

Qualification (set to Dimension

Associate a List of Values: checked

Allow users to edit this list of values: unchecked

Automatic refresh before use: checked

Hierarchical Display: unchecked

Export with universe: checked

Delegate search: unchecked

Other buttons on the screen:

"Restore Default", "Edit...", "Display...".

I do not see anything that has to default aggregation under the properties tab.

Also in the IDT, under the business layer (blx file), there is no properties tab, but a "Custom Properties" tab that has a grid (objects & value).  There is also an advanced 'tab' that does not have anything that states setting the default aggregation level.