cancel
Showing results for 
Search instead for 
Did you mean: 

Union with Constant Values

Former Member
0 Kudos

Hello All,

I am reading to use Union with constant values when dealing with multiple analytic views (so each analytic view has it's own unique measures).  In our example though a union alone wont' work because dimensions (charactersitics) are also different in each analytic view.  So really a join is necessary.  Does a union with constant values solve this at the calculation view?  Meaning it will combine into 1 record like a join?  We don't think so, but want to make sure.  Documentation states that joins will have severe performance (slow).  But unions won't work.  The final output are characters and measures from multiple tables.  So:

1.  Does Union with constant value work here (we think this only means to set the measure to constant 0, but each analytic view must have same identical characteristics in output.  If so, this doesn't work.

2.  If Union with constant value doesn't work then should we do all these tables joins at the analytic level (1 analytic view with many tables joined there in the foundation) or still have multiple analytic views and join at the calculation view level?

The problem at calculation view level (graphical) is that a join can only have 2 inputs.  So if we have 4 indiviual analytic views these "joins" turn into a "pyrmaid".  So we are not sure which is better for performance.

Thanks!

Accepted Solutions (1)

Accepted Solutions (1)

rajarshi_muhuri
Active Participant
0 Kudos

I are having a similar situation where we are testing performance .

I have the BSEG(fact table)-BKPF (line item and header table) , and depending upon the calculation , we are connecting different dimension tables like CSKS, CE4, ANLA etc tables.

Ideally I dont want to connect the dimension tables at the line item level, so there is millions of records (ROW level) . I want to connect the dimension tables after aggregation (SET level).

One of the options I am exploring is pass the fact table to the calculation view and put the "aggregation" node on top of the column table ( fact table) . This aggregates the column fact table and reduces the record set.

If I use the Graphical option to do the joins , it creates a pyramid structure. But this allows only 2 joins to happen , and then upon completion move to the uper join in the pyramid structure.

However if you use SQL script , you can use join multiple tables using multiple statements/procedures . and this happens in absolute paralell.

So , if the tables are big , then I am gaining on performance , despite pure graphical model is higher performing than SQL scripted models.

Former Member
0 Kudos

Hello Rajarshi,

Couple of questions:

1.  Are you saying that writing the SQL is better performing than graphical mode (where you have this "pyrmaid" joins?

2.  Can you explain the difference between Aggregation node and Projection node?  We have not touched the Aggregation node so not sure how to use yet.

Thanks so much!

rajarshi_muhuri
Active Participant
0 Kudos

For a simple scenario , or smaller tables , Graphical modeling option provides the optimum performance .

But calculations after aggregations in the calculation view improves performance for complex scenarios ( i.e deeply nested IF) and and big table sizes .

about joins after aggregation, I dont have a conclusive answer yet, but joins are very expensive , so if we can do joins after aggregation ( reduced set) ,I feel that its more performing. ( I am still testing on BIG data ) will let you know.

2. analytical views are aggregated inherently, so a aggregation node on analytical view will give the same result projection node. So no point using a aggregation node over a analytical view .

but if you are placing a column table ( fact table) on the calculation view, use of aggregation node will aggregate the table .

to exlain better , we have a column table

Attribute         Measure

A                       100

A                       200

B                       100

C                        500

Aggregation node will do this :

A  200

B  100

C  500

Projection will

A   100

A   200

B   100

C   500

But an analytical view (already aggregated)  on that table would be

A  200

B  100

C  500

so aggregation and projection would do nothing  - and the output would be the same

A  200

B  100

C  500

rajarshi_muhuri
Active Participant
0 Kudos

Nancy

I was trying out different techniques , however the results are inconclusive.

Seems that OLAP Engine ( aggregation engine) is on the same level as JOIN engine. So for smaller dimension tables , its a penalty to join later (which makes sense ).

However even if there is one calculation , then it goes one notch down in performance and hits calculation engine  . So if there is any calculation, its definitely better to do it after aggregation has been done and at the calculation view level.   CE functions is quite optimised -and perform well if graphical modelling cannot achive the desired solution .

Finally about having big tables go as column tables in calc view with an aggregation node on the top of it. Here  am finding that one needs to study data well, and only for a well chosen number of facets , does the aggregation do something meaningful . However the business needs  might/will dictate addition facets( attributes) , which will make the group by clause not give any meaningful aggregates .

former_member184768
Active Contributor
0 Kudos

Hi Nancy / Rajarshi,

If the Calc view is created using SQL scripts which utilize ONLY CE functions and based on TABLES, then I think the performance should be similar to that of using the Graphical option.

The CE functions for JOIN is also restricted to TWO objects only, hence I don't think it is different from Graphical option from execution perspective.

But if the SQL script uses an object which uses complex SELECT statement for multiple table joins, then I don't think it will be executed directly by the Calc Engine. The execution should then go to SQL parser and SQL engine and in my opinion, it will have impact on the performance.

Hence for higher data volume, currently I would prefer to use CE function based development (Script or Graphical) rather than using the JOIN using complex select statement.

Regards,

Ravi

ranjit_alapati
Participant
0 Kudos
  • Graphical calculation views are internally converted into CE functions and executed by default in calculation engine and performance is better (but by setting certain properties it is possbile to execute in SQL engine)
  • Script based calculation views with CE functions are also executed in the calculation engine
  • Script based calculation views with out any imperative logic/statements like IF, Case, For and While Loop will not enable parallel execution and performance could be bad

Answers (1)

Answers (1)

former_member184768
Active Contributor
0 Kudos

Hi Nancy,

Quite an interesting scenario.

1) Union with constant value: Picture this... It will bring the data from different analytic views. The measures would be in SUM mode and attributes will be in GROUP BY clause. Hence if the data values from the different analytic views match, then chances are there that the measures from different analytic views would come in a single row.

But as you mentioned that the attributes are also different, the grouping by different attributes may not get them into a single row.

2) Using JOIN: Yes, as of now only two views can be used in JOIN, which means you end up creating a "pyramid" structure. But considerng the MPP architecture of HANA, that should not impact the performance. The query against this will anyway be split into parallel queries for execution.

Hence in my point of view, since the UNION is not the feasible option from functional point of view, JOIN is the only way to go with.

Regards,

Ravi

Former Member
0 Kudos

Hello Ravi,

I will give correct answer points with this reply (was going to with last reply, but wasn't sure if my question would close where I couldn't ask the below).

So sounds like we are on the right path.  Can you tell me what MPP stands for (assuming some type of parallel processing) and where we can read up on HANA and MPP?

Thanks!

former_member184768
Active Contributor
0 Kudos

Hi Nancy,

You are right.. MPP stands for Massive Parallel Processing architecture of HANA.

You can read about it in the HANA Master Guide or numerous documents on HANA Architecture.

In your HANA Studio, please to go Help -> Quick Launch -> Help (right hand side bottom) -> Documentation -> HANA Appliance -> HANA Master Guide.

Also you can do a quick search for HANA MPP.

Please refer to the nice blog on the same. http://scn.sap.com/community/in-memory-business-data-management/blog/2010/10/28/what-does-sap-mean-b...

Regards,

Ravi

Former Member
0 Kudos

Hello Ravi,

We are very confused now and was hoping you could also help in explaining.

We've read many postings about Projection and Aggregation nodes in calculation view.  Our models are all based on analytic views (which have their own joins with attribute views).  Question is, should we use the projection node at all in the caluclation view?  We thought analytic views are aggregated data by default (OLAP)?  If we should use projection, where do we place.  I have seen 2 papers where one has the projection before the union node (so in between the analytic view and union node in the calculation view) and the other has it after the union node, so in between the union node and output node.

Lastly, we've read that calculated measures should be done at the calculation view for performance reasons (as aggregation has happened already)?  Is this true?

Help is greatly appreciated.

Thanks!

rajarshi_muhuri
Active Participant
0 Kudos

I have seen 2 papers where one has the projection before the union node (so in between the analytic view and union node in the calculation view) and the other has it after the union node, so in between the union node and output node.

can you please share those 2 paper' links . but 'yes' depends on the scenario.

Projection node can be used to narrow down / filter down the results.

We thought analytic views are aggregated data by default (OLAP)?

Yes analytical views are aggregated by default.

Lastly, we've read that calculated measures should be done at the calculation view for performance reasons (as aggregation has happened already)?  Is this true?

for big tables, complex calculations, this is true .

Former Member
0 Kudos

Hi all,

very happy to find this post, it almost the most valuable post, i see till now about hana:) It hits lots of issue I met in project.

I have still one question about the projection nodes after analytic view and union. Rajarshi mentioned, it could narrow down the results. Is there benefit on performance with narrow down? As I understood, hana will generate the executed calculation model based on the designed model and the current query on the model. Regardless how many columns are involved in the designed model, if the query doesn't touch those columns, they are not read from table by hana. That means the narrow down will be done automatically.

Is this true from your option? Thanks!

Former Member
0 Kudos

Add also the question, i read a lot about we should filter data as early as possible (better in table level). But same as the column narrow down, it seems the executed calculation model works top-bottom. My understanding is the filter criteira on each level (from query to attribute view) in the model will be pushed down on table level. That is no difference as I put the filter direct on table or let hana push them down on query time? Is this right?

former_member184768
Active Contributor
0 Kudos

Hi,

Couple of points:

  • Projection after Union can help you with filtering the data by rows. Similar to where condition, but at the lower level. It depends on how you define the projection filter. If the projection filter is based on the calculated column, then the filtering will happen in Calc engine, which may have a negative impact on the performance. Please refer to the document here on Query pruning.
  • Your understanding is correct. HANA performs the column stripping, which means the columns which are not selected for reporting are not queried. But the Projection can also help in optimizing the result set further by pushing the filters.
  • Filter criteria push down depends upon how the filtering is implemented. As mentioned, if there is a calculated column on which the filtering is to be performed, then it is not pushed down. You can check this in the Query visual plan.

Hope this helps.

Regards,

Ravi

Former Member
0 Kudos

Hi Ravi,

Thanks for the reply. They are very helpful points and also the link to the article from you If I understand correctly, you want to say, the system could push the filter down but sometime it doesn't maximun the push down, like the example without constant column in your article. So in this situation maybe we can help the system by give some "hints", e.g. manully filter on table or use constant column. Is this understanding correct?

Again about the column stripping. If i design the data model, should I consider how to narrow the columns at lower level (attribute view / analytic view) or just forward the columns as like up to top, since the HANA will do column stripping on query time anyway?

I believe the visualized plan is a very useful tool. However I cannot find much information about it in internet. Do you know where the documents or whatever can be found, which explain the visualized plan in more detail?

Thank you very much!

Best Regards,

Hai