cancel
Showing results for 
Search instead for 
Did you mean: 

The SAP HANA Engines?

Jonathan_Haun
Participant
0 Kudos

I posted the following questions on saphana.com but never received a response. Anyone care to take a look? http://www.saphana.com/message/4304#4304

From what I know there are three basic engines used to processes queries in SAP HANA. There is the Join Engine, the OLAP Engine and the Calculation Engine. What I am not clear on is under what scenario are these engines invoked? My basic understanding is as follows but more information would be appreciated:

  1. Join Engine
    1. Used when querying an Attribute View
    2. Questions:
      1. What If I execute a standard ANSI92 SQL statement from a BI tool or in SAP HANA Studio. Does this also use the Join Engine
      2. If my Analytic View foundation is joined to attribute views, is both the OLAP and JOIN Engine used?
      3. Is this engine embedded in the indexserver?
      4. How does the optimizer play a role with this engine?
  2. OLAP Engine
    1. Analytic Views (without derived columns) use the OLAP Engine
    2. Questions:
      1. Can this only be invoked from the column views is the _SYS_BIC schema that have the term "OLAP" in their name
      2. What syntax is used to create these column views?
      3. Can I create my own column views (outside of activating an information view)
      4. Is this engine embedded in the indexserver?
      5. How does the optimizer play a role with this engine?
  3. Calculation Engine
    1. Calculation Views or Analytic Views with derived columns use this engine
    2. Questions:
      1. Can this only be invoked from the column views is the _SYS_BIC schema
      2. What syntax is used to create these column views?
      3. Can I create my own column views (outside of activating an information view)
      4. Is this engine embedded in the indexserver?
      5. How does the optimizer play a role with this engine?

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor

Hi Jonathan,

not sure if I can answer all your questions, but let's see how far I get...

Jonathan Haun wrote:

  1. Join Engine
    1. Used when querying an Attribute View
    2. Questions:
      1. What If I execute a standard ANSI92 SQL statement from a BI tool or in SAP HANA Studio. Does this also use the Join Engine
      2. If my Analytic View foundation is joined to attribute views, is both the OLAP and JOIN Engine used?
      3. Is this engine embedded in the indexserver?
      4. How does the optimizer play a role with this engine?

The join engine is also used, when you run plain SQL. Actually it's not too difficult to check what engine is used to process a query: simply run the Plan Visualization on your query.

The Plan Operators (POPs) you find in there - represented by the boxes in the graph - are named based on a naming scheme like this:

  • JE<operator_name> => Join Engine (Attribute views, plain SQL)
  • BW<operator_name> => Olap Engine (Analytic views)
  • CE<operator_name> => Calc Engine (Analytic views with calculated attributes, Calculation views)

Answers to the questions:

A1: Depends on what objects you reference in your query. If you're just querying plain base tables then the join engine will be used. As soon as you reference an analytic or calculation view, the other engines will be employed as well.

A2: Nope - during activation of the analytic views, the joins in the attribute views get 'flattened' and included in the analytic view run time object. Only the OLAP engine will be used then.

A3: All the query execution is happening in the indexserver - so, yes, join/olap/calc engine are all major parts of the indexserver.

Jonathan Haun wrote:

  1. OLAP Engine
    1. Analytic Views (without derived columns) use the OLAP Engine
    2. Questions:
      1. Can this only be invoked from the column views is the _SYS_BIC schema that have the term "OLAP" in their name
      2. What syntax is used to create these column views?
      3. Can I create my own column views (outside of activating an information view)
      4. Is this engine embedded in the indexserver?
      5. How does the optimizer play a role with this engine?

A1: Technically the run time objects of analytic views and attribute views are so called column views. These can also be created without the modeler and in this case these views can be placed in arbitrary schemas, not just the _SYS_BIC schema.

For modeled views however, the _SYS_BIC schema is the only location for the run time objects.

A2: not officially documented, but you may check the "source code" of the run time objects by displaying the definition.

A3: sure you can - if you can 😉 As I wrote, this is not officially documented or supported. So use on your own risk.

Besides, you don't have any transport management with these views - so this really might not be something for production scenarios.

A4: see above. Yes!

A5: there are actually a couple of optimizers that work on different levels of query processing.  So the answer here is: yes.

Key for Analytic views is to keep in mind that these are specifically designed to execute star schema queries. So the query optimization process 'knows' the scenario and optimizes according to this star schema scenario.

Jonathan Haun wrote:

Calculation Engine

  1. Calculation Views or Analytic Views with derived columns use this engine
  2. Questions:
    1. Can this only be invoked from the column views is the _SYS_BIC schema
    2. What syntax is used to create these column views?
    3. Can I create my own column views (outside of activating an information view)
    4. Is this engine embedded in the indexserver?
    5. How does the optimizer play a role with this engine?

A1: nope - you'd also see the Calc Engine being used when running SQLScript procedures.

A2: even less documented outside SAP than the column views. Still you could check for the source code, but you'd be unpleasantly surprised to find a large chunk of JSON syntax.

A3: technically, yes, again - if you could ...

A4: Sure it is.

A5: Calc Engine is pretty special and I'd recommend to give the sap notes including their attachments a read:


Hope that these answers give you something to work with.

Cheers, Lars

Jonathan_Haun
Participant
0 Kudos

Lars,

Based on your JOIN ENGINE answer:

"A2: Nope - during activation of the analytic views, the joins in the attribute views get 'flattened' and included in the analytic view run time object. Only the OLAP engine will be used then"

Using this example Plan image at the bottom I noticed that it starts with a "Join" operation and then moves into BWxxxxxx operations. Obviously the Attribute Views must be Joined to the Analytic Foundation but I was never sure which engine processed these joins until you provided the response above. Is the flattening operation occurring at run time or only when the column view is activated? What does the plan mean by "Join" in the below image?

SQL:

SELECT ORDERED_CAL_YEAR, SUM(SALESAMOUNT) AS SALESAMOUNT

FROM "_SYS_BIC"."saphana.internetsales/ANLV_INTERNETSALES/olap"

GROUP BY ORDERED_CAL_YEAR

Jonathan_Haun
Participant
0 Kudos

BTW: Thanks for taking the time to answer these questions. This is very informative.

lbreddemann
Active Contributor
0 Kudos

Hi Jonathan,

just open the join box (little triangle in the upper right corner of the join box).

You'll find that there are several BWJoin... steps being performed.

AFAIK the "flattening" is done during activation.

That means after activating the analytic view, the whole join structure is a star schema query and it doesn't matter whether you join a table in the "data foundation" or in the"logical join" - it boils down to the same star query.

That in mind it becomes clear what attribute views are for: reusable definitions of dimensions.

- Lars

Former Member
0 Kudos

Thanks Lars Breddemann. Very much informative.

Jonathan_Haun
Participant
0 Kudos

Lars,

Thanks again for you guidance on this question. I received a response on saphana.com from another SAP employee but I am not sure about some of the answers. Beyond the conversation about analytic views he mentioned that sometimes the row engine was invoked. Do you know anything about the row engine usage with analytic views?

http://www.saphana.com/message/4397

lbreddemann
Active Contributor
0 Kudos

Hi again

In your example plans, the row engine had not been involved and I actually don't see a way to design an analytic view to use the row engine.

However, you could e.g. write a SQL query that reads data from an analytic view and uses functionality that is only supported in the row engine (e.g. the date conversion *with* character pattern or some of the window functions).

In this case the result set from the analytic view would likely be materialized and handed over to the row engine.

A similar effect would occur, if you mix plain SQL and CE functions in a calculation view.

Therefore, the HA300 is not as inaccurate as you may think, but it surely doesn't cover the inner workings of the engines.

- Lars

Jonathan_Haun
Participant
0 Kudos

Lars,

Thanks again. I understand the concepts now. I have been visualizing plans for a variety of queries using multiple scenarios the last few days. I can see how different design approaches can invoke different engines. One interesting item I noticed is that when I activate an analytic view that contains a calculated column two different column views are created in _SIS_BIC. One column view is coded to leverage the OLAP engine and it its name ends in /olap. I also noticed that it is void of some of the attributes defined as calculated columns. The second column view is activated using the CREATE CALCULATION SCENARIO command. It appears to be build off the parent OLAP column view but using a distinct syntax. In addition this view has all of the calculated columns that were missing from the OLAP column view. 

From a SAP BusinessObjects Universe design standpoint it is important that I leverage the best performing column view. More importantly this information helps me understand which engine I will invoke based how I design the Universe and or the Information View. Thanks again for filling in the gaps.

Regards,

Jonathan Haun

lbreddemann
Active Contributor
0 Kudos

Okay...

that one is easy and I kinda mentioned it already.

Once you start having a calculated column in an analytic view, the runtime object <package_name>/<view_name> is created as a wrapper calculation view.

In this wrapper the <package_name/<view_name>/olap runtime object (which is the pure analytic view w/o calc. column) is referenced.

The thing here is: even if you don't ask for the calculated column, the query will still need to go through the calc. engine in this case.

So, it really makes sense to spend some thoughts on where exactly a calculated column needs to be introduced (if this is actually required).

- Lars

Jonathan_Haun
Participant
0 Kudos

Lars,

Again thanks for all the informational on this topic. I had a followup to one of your statements. You stated "That means after activating the analytic view, the whole join structure is a star schema query and it doesn't matter whether you join a table in the "data foundation" or in the"logical join" - it boils down to the same star query. That in mind it becomes clear what attribute views are for: reusable definitions of dimensions".

My question is:

I don't remember the source but I was reading in a HANA guide that stated that joining in the AV Analytic Foundation would lead to poor performance. That was the extent of the comment. However, as you stated there is technically no difference because joins are all treated the same once activated. I confirmed this when I visualize the plan of an example in my environment. I'm thinking the point of that comment was that joins between large tables or high cardinality tables anywhere in the Analytic View can slow performance. I have experienced this with joins between (for example) a 10M row Order Header and a 150M row order footer. However, If I use (for example) Data Services to pre-join these tables and then load them into a single table in SAP HANA, I will eliminate the join cost and increase the overall performance. To that extent, is it better to merge large fact or transaction tables before loading them into HANA to avoid the cost of joining in the OLAP Engine? Keep in mind I am referencing examples like the Order Header and Order Line Item fact tables.

On a side note, SAP HANA is still faster then any database I have experienced. It still manages to join the large tables faster than traditional databases, but I like to squeeze the absolute highest performance out of my solutions.

lbreddemann
Active Contributor
0 Kudos

hmmm hmmm hmmm...

somebody wrote that something is slow somewhere... No way to argue here 🙂

And actually impossible to give an answer to questions like yours, too.

Analytic views solve a very specific problem, star queries that is.

If you actually want to do star query processing, you know, one fact table, all dimensions only joined to this central table, all filters applied to the dimensions only, then the analytic view is the weapon of choice.

It's built for precisely this.

If this is not what you're after, then maybe there are better ways.

Prejoining during the data load might be an option, as the fastest piece of work still is the one you don't need to do.

Up to now, I've seen very different use cases for most of the mentioned options (and others as well) and usually the implementation variant that got chosen in the end wasn't totally clear from the start.

Alternative approaches are usually tried and tested and of course measured and based on this, a decision is made on what to use.

Which leads me to the all-time sneak out answer: it depends .

Cheers,

Lars

Former Member
0 Kudos

Hi Lars,

Referring to the your answer

"The join engine is also used, when you run plain SQL. Actually it's not too difficult to check what engine is used to process a query: simply run the Plan Visualization on your query.

The Plan Operators (POPs) you find in there - represented by the boxes in the graph - are named based on a naming scheme like this:

  • JE<operator_name> => Join Engine (Attribute views, plain SQL)
  • BW<operator_name> => Olap Engine (Analytic views)
  • CE<operator_name> => Calc Engine (Analytic views with calculated attributes, Calculation views)"

I tried to search in blogs for the functionality but all I could find was the 'Explain Plan' Command from http://www.saphana.com/docs/DOC-2377

I am sorry but I could not find the option of Plan Visualization in SAP HANA Studio. Can you please help me locate this particular functionality. I am using SAP HANA Studio Version 52.

Regards,

Shreepad

henrique_pinto
Active Contributor
0 Kudos

In the SQL Editor, highlight a query, right click on it with the mouse and choose "Explain Plan".

lbreddemann
Active Contributor
0 Kudos

Hi Shreepad,

in I explained how to open the plan visualization.

it can be found close to the explain plan option in the context menu in the SQL editor window.

You can also navigate to it via the Shared SQL Cache list or the Expensive statements list.

Again, the option is to be found in the context menu.

- Lars

justin_molenaur2
Contributor
0 Kudos

First of all - I want to say that this post has been incredibly helpful in understanding some of the visualization and execution plans, really informative.

I was also attempting to understanding when the various engines kick in for processing and I am observing a difference compared to what Jonathan had described described.

- Created analytic view AN_TEST_REFERENTIAL

- One entry in _SYS_BIC.copa-poc/AN_TEST_REFERENTIAL was observed

- In the execution plan, only the BWPop (OLAP engine) was observed with the following statement

SELECT MATNR, SUM(SALES_VAL) FROM "_SYS_BIC"."copa-poc/AN_TEST_REFERENTIAL"

GROUP BY MATNR

All is good at this point and behavior is as expected

- Add a calculated measure, simply SALES_VAL_X2 which is SALES_VAL * 2

- Still only one entry in _SYS_BIC, no additional column view is created. Additionally, the definition of the '/olap' column view contains the definition of the newly calculated column.

- In the execution plan, even though I am requesting the calculated column, only the BWPop (OLAP engine) seems to be firing. with this statement.

SELECT MATNR, SUM(SALES_VAL), SUM(SALES_VAL_X2) FROM "_SYS_BIC"."copa-poc/AN_TEST_REFERENTIAL" GROUP BY MATNR

This just seems different than what was described - and the core difference being that my assumption was that anytime a calculated column was introduced, the Calculation Engine would have to take over.

Any thoughts?

Thanks,

Justin

Jonathan_Haun
Participant
0 Kudos

Justin,

There is a deference between a calculated measure and a calculated column. most calculated measures will stay in the OLAP engine. However, if you had a calculated column (lets say you concatenated two columns) then you would see 2 column views created in _SYS_BIC. One that covers the base star schema using the OLAP Engine. Another that builds on that to process the calculated column in the calculation engine. The column view with "/olap" is the base CV. The one without "/OLAP" is the calculation view. Also notice the different icons: Again this only happens when there is something in your design that has to be managed by the calculation engine.

justin_molenaur2
Contributor
0 Kudos

Ahhh, that is the key here - calculated/derived column vs. a calculated measure. I was under the impression that ANY calculated column would force the calc engine, which is incorrect. This is very helpful as it tells me that pushing any calculated measures into the model is relatively cheap since it stays in the OLAP engine AND also performs the calc after aggregation (unless otherwise specified).

Now after adding a simple calculated column to move the value of MATNR, I can see the CE engine is firing and also the additional column views are created exactly as you have shown.

Thanks a bunch for the super fast response!

Justin

justin_molenaur2
Contributor
0 Kudos

Jonathan, just one further question. Since two objects are created in the _SYS_BIC schema that together form the model, is there only one of these exposed for use by the BOBJ tools? IE does IDT see both of these? If so, I assume the /olap one is the one to use? Same question would be true for AAO and Explorer.

Thanks!

Justin

Jonathan_Haun
Participant
0 Kudos

Assuming that you have been granted SELECT on the CVIEW in the _SYS_BIC schema, you will see both (Actually everything) within IDT. When designing your Universe you can choose between one or the other or both (using @Aggregate_aware). If you choose the Analytic View version, you loose the calculated column but stay in the OLAP Engine for processing. If you choose the calculation view version, you gain the calculated columns, but the calculation engine will be used (in addition to the OLAP Engine).

Jonathan_Haun
Participant
0 Kudos

Also,

I believe Explorer only sees one when designing the InfoSpace and AOO only sees one as well. They both use the calculation view version if memory serves me correct but they might use some intelligence in the backed to automatically switch (Need SAP to clarify). That process is a component of the reporting tool's design. However, when designing with a Universe you have more direct control.

Thanks,

justin_molenaur2
Contributor
0 Kudos

Thanks for the super fast response. In the absence of BOE at the moment, I was also able to test this through Lumira and Excel and found that you could only see one of these objects for use in analysis. So there does seem to be something in the backend that "tells" the tool that only one of the objects is able to be consumed directly.

Thanks for confirming the behavior through IDT. I assume the typical path would be to consume the calculation view version.

Thanks,

Justin

Former Member
0 Kudos

Hi Lars,

Indeed it seems "it depends"

Above you mentioned, "...after activating the analytic view, the whole join structure is a star schema query and it doesn't matter whether you join a table in the "data foundation" or in the"logical join" - it boils down to the same star query."

It seems it does matter (depend ). Following some other commentary and investigation, myself and others (i.e. Ravi)  have seen significantly faster performance by adding tables in the Data Foundation rather than modeling as Attribute Views (i.e. contray to "best practice"). Also, VizPlan results show different plans depending on how the star schema is modeled. Here's the discussion - your commentary would be valued! http://scn.sap.com/thread/3433018

former_member182302
Active Contributor
0 Kudos

Hello Lars,

I saw this option "SQL Engine" in calculation view.

Does that mean am telling to system that execute the query using "SQL Engine". What are the benefits in choosing this option?

Regards,

Krishna Tangudu

Jonathan_Haun
Participant
0 Kudos

Krishna,

In some cases the SQL Engine is faster at processing calculation views built to the standards mentioned in the model guide (listed below).

Copied From page 58 of the The Hana Modeling Guide:

"You can switch the execution of the calculation view to the SQL Engine rather that the Column Engine for better performance by selecting the SQL Engine value in the Execute In property. When this property is set to SQL Engine, the calculation engine generates SQL code for each node at runtime and combines them into one big SQL statement which is then passed to the SQL interface. This might be beneficial since the SQL interface also involves standard SQL optimizer which implements a lot more optimization rules than the Calculation Engine. For example, the most important rule is join ordering determination."

Link to Guide:

http://help.sap.com/hana/SAP_HANA_Modeling_Guide_en.pdf

former_member182302
Active Contributor
0 Kudos

Thanks Jonathan for the explanation and references, i undersand it now.

Regards,

Krishna Tangudu

justin_molenaur2
Contributor
0 Kudos

I had an interesting observation and thought to revisit this thread as it's relevant to the topic.

I am facilitating some testing of other applications feeding from HANA. In many of the previous testing using views to "extract" data from HANA, very wide reads on views have caused extremely high memory usage, and only one single thread can be observed running for a long time.

However, when a very wide select (think SELECT *) is executed directly against the column table, it looks like there is another process going on. There appear to MANY small reads/threads popping up for just a few seconds and then falling off. When I look at the details in performance/threads, I can see the fetched record count is constantly increasing, and the visualize plan looks like this.

Since no specific engine is called out here on the components (JE/BW/CE) and based on the thread activity mentioned above, I have to assume there are just many small reads occurring in the SQL engine itself.

What is interesting to me is this buffer concept. I previously observed that in the other engines, the entire result is built before really pushing out for consumption, whereas here it looks like small pieces are being constantly added to this "buffer". It's really slow, but really cheap when I am monitoring the performance.

The thread details

At some later point

Lars, any info you can provide about what's going on "under the covers" on this? Is there some arbitrary cursor size that just keeps adding to some buffer/providing results to the consumer and then keeps moving on, unlike the OLAP/Calc engines that attempts to build the entire set? From the threads, I can see that each "packet" is right at 1,000 rows each.

The expectation was that large reads/extracts out of HANA would be very expensive to materialize the results, but this seems to contradict that assumption - it's just much slower.

Regards,

Justin

justin_molenaur2
Contributor
0 Kudos

After some investigation, it looks like the smaller resultset thread behavior is a direct result of a consuming application parameter. In this case, the Sqoop tool is passing data into Hadoop, and the parameter was defaulted to 1000 rows at a time.

What is not crystal clear to me at this point is how this type of parameter is actually passed to HANA during a jdbc connection. From what I can see, there is nothing in the SQL statement itself nor anything in the connection string itself, but somehow in the intermittent thread and a query on "SYS"."M_CONNECTIONS" tells me that there is a max size for the connection.

Thread view

M_CONNECTIONS View

This is of interest (at least to me ) since this would clearly have an impact and ability to control the amount of resources that consuming applications/processes have on HANA.

Always something new,

Justin

lbreddemann
Active Contributor
0 Kudos

Hi Justin

To be honest, I have not seen or investigated this feature yet. My first shot at it would be this:

for simple statements like yours, that don't require actual computation, aggregation or joining but look like a mass "dump of data" materializing the resultset first would not be the optimal choice.

Too much memory would be allocated just for spilling out raw data where the typical use case for such a query (e.g. "...let's dump it into a csv file...") usually is not performance critical.

Therefore it makes sense to materialize the resultset chunk wise.

I would guess that the size of these chunks is more or less automatically estimated by the SQLDBC wire protocol (which is also used by JDBC/ODBC) and I am not aware of any option to influence this.

All in all I'd say this is an optimisation for a rather specific use case in order to not overload the system memory with "data dumps", similar to what you have with direct path exports in Oracle that don't pass the cache.

- Lars

upamanyu_mukherjee
Participant
0 Kudos

Hi Lars,

Thanks for all the information that you share!...would like to know whether wrapper calculation view is created on top of an analytical view , when we create an input parameter in the AV...and its impact on performance...in my scenario...i am using an input parameter(derived from table) to apply a dynamic (date)filter on

my central table in data foundation...

Regards,

Upamanyu

lbreddemann
Active Contributor
0 Kudos

Hey Upamanyu,

whether or not a wrapper gets created is something you can easily find out yourself by checking the column views in _SYS_BIC schema.

The performance impact... well, since you either have input parameters or not, there is not really an alternative, is there?

Typically, input parameters get used to reduce processed data early, so performance might get better.

- Lars

upamanyu_mukherjee
Participant
0 Kudos

Hi Lars,

Thanks for your input , however , my concern was that as per best practices we are advised to avoid wrappers and shifting between different engines.

But , at the same time reducing the data set is also crucial.

So does it mean that there is no other way of implementing input parameters other than a wrapper getting created.. or is there any other way out..

Would like to hear your thoughts...

Regards,

Upamanyu

lbreddemann
Active Contributor
0 Kudos

Correct. Input parameters lead to wrappers.

So what?

Why everyone seem to follow "best practices", "golden rules" and the like as strict rules?

Above all, build your model in a way so that it does it's job correctly.

My next concern would be: is the model easy to understand? Would anybody be able to look at it six months from now and understand what it is doing and why?

Only after that piece I would be concerned about engine switches, wrappers or performance.

You can tweak and tune for performance - sure. But typically, if you mess up the first steps, doing performance tuning is a lot harder.

That's my 2 cts on the "best practices" topic.

- Lars

Jonathan_Haun
Participant
0 Kudos

I would recommend that you visualize the execution plan of the SQL to compare the execution and performance of both options. What you will likely find is that the calculation view wrapper is insignificant to the execution of the query.

The benefit of an"enforced" input parameter and variable is that you can force users to reduce the amount of data that that the OLAP engine has to process and increase the overall performance of the query. My take is that a reduction of records will significantly offset the time it takes the calculation search to do its part. In all likelihood, unless you also have calculated columns in your AV, the calculation view wrapper will be insignificant to the execution of the query.

To me a best practice is a way of "warning" you that your design "might" be problematic. They are guidelines with multiple caveats and assumptions. However examining and comparing execution plans is one definitive way to know that your design is sound and optimal.

upamanyu_mukherjee
Participant
0 Kudos

...But that's worth a million

Thanks Again,

Upamanyu

upamanyu_mukherjee
Participant
0 Kudos

Thanks for your insight...my model does not include calculated columns...however I would definitely try and compare the execution plans...

Regards,

Upamanyu

Answers (3)

Answers (3)

Former Member
0 Kudos

I am guessing that you would want to keep your queries isolated to a single "engine" to avoid passing data structures to the different query sub-routines. ex. if designing a schema from scratch for olap type queries I could create my tables in a star-schema fashion to minimize the joins. however, what becomes difficult is getting best olap performance when I am needing to reuse existing relational structure which may require many joins for dimensional and fact results.

anyone want to comment on that or share experiences here? I'm trying to did deeper here what it really means with statements like HANA brings OLAP and OLTP together. my theory is that for non-trivial data volumes this is more a slider on a spectrum where to fully optimize one of these you must make trade-offs for the other.

Former Member
0 Kudos

Hello,

since this post is still the best overview in regards of the engine I would like to add an architectural  question to it. All architectural overviews I found showed a more or less similar picture as http://en.wikipedia.org/wiki/File:Hana.jpg. In Lars’s book I saw a better overview with much more details, but still the BW(OLAP) and JE(Join) Engine are always missing. Does someone know to which part of the index server these engines are related to? Are they part of the calculation engine or is the join engine part of the column store engine or are they independent at all.

Thanks

Patrik

lbreddemann
Active Contributor
0 Kudos

Hi Patrik,

discussion the different SAP HANA engines didn't make it into s and my book as it is mainly aimed at administrators (and the book otherwise would have gotten too thick in printing ).

Anyhow, the processing "engines" are not separate processes within SAP HANA. Basically they are different code paths or subroutines, if you will, that are executed depending on the type of query.

E.g. for simple joins most often the join engine code path with be used. When the query is recognized as an OLAP query, most often the OLAP "engine" code path is used instead.

All of this execution takes place in the same indexserver process, often in parallel at the same time.

From an architectural point of view the column store really is the "base layer" and the join engine, the OLAP engine and the calculation engine can call this base layer to access the data.

The engines also can call each other (e.g. the calc. engine can invoke a OLAP engine query) - so there's no strict layering/hierarchy present there.

Also - just to make life more interesting - the functionality of the engines is overlapping. For example all three engines have different code to join tables.

Not sure this provides you the answers you were looking for; maybe there'll be demand for another book focusing on query execution one day...

- Lars

Former Member
0 Kudos

Hello Lars,

I was just a bit confused because in your book on page 39 (Figure 1.7) only the Calc Engine is mentioned. With this explanation I have now a rough understanding what’s going on in HANA J, thanks a lot!

Patrik

Former Member
0 Kudos

Hi Lars et al.

Thank you for all the information and insights shared on this thread. It is really helpful. I had a couple of questions regarding the execution of scripted calculation views

  1. Does the engine used to execute scripted calculation views vary based on CE functions vs just SQL in the script node.

- CE Functions => Execution in  Calculation engine ( hence more optimized)

- SQLscript without CE functions => Execution in  Calculation + SQL engine.

If the above holds, then at which step in the below execution plan is this determined .

Source: http://help.sap.com/openSAP/HANA1/openSAP_HANA1_Week_03_Unit_03_Procedure_CE_Functions_Presentation....

http://scn.sap.com/thread/3197847

On  executing “Visualize plan” on two scripted views based on the same query - one with SQL and the other using CE functions show that  SQL based scripted calc view is executed via ‘CeSqlPop’ to query the underlying analytic view while the CE function based calculation view pushes down the sub execution to Analytical search followed by a calculation search .

If someone could explain this/ point to the correct resources & documentation it would be really helpful.

Thanks,

Kulsum

lbreddemann
Active Contributor
0 Kudos

Hi there

the details of the inner workings of the SAP HANA execution engines and optimizers are not officially documented or available outside SAP.

What I can tell is that the picture above is partly outdated and partly a simplification and you really shouldn't conclude performance characteristics from it.

So I can only comment on the point


- CE Functions => Execution in  Calculation engine ( hence more optimized)


This is a wrong understanding.

CE functions are not more optimized, faster, more efficient or whatnot.

The SQL commands processed by HANA internally will use the same code than the CE functions.

The major benefit of using SQL however is: with CE functions you tell the database what computations steps to do. It's very low level interaction with the query processing and the developer must care for a whole lot of things - many of them not available to decide during development time.

With SQL this goes away.


Just like other SQL databases, your SQL statement specifies the semantics of your query. You tell the database what conditions the result set should fulfill but not how to best compute it.

As the decision of the most optimal (fastest) way to compute a query result depends on many variables (underlying data, provided conditions, software capabilities, resource availability, etc..) this decision is best taken at the moment of query execution - not when you design the query.

That way your query can make use of newer, improved HANA capabilities and HANA can adapt its execution strategy, without ever changing your code.

So, again, stop using CE functions and use more current HANA revisions (SPS 8 is two major releases old now)!

As a last word: looking at the available literature on query optimization for the more classic players in the DBMS market you will find that it is a pretty complex topic.

However, most of the row-store disk-based orient products follow similar approaches when it comes to query optimization. Common access paths and join implementations (index search, hash/sort merge/nested loop, ...) are used throughout most of them and usually even understood conceptually by non-DB experts.

And herein lies a little problem - those shared, well-known techniques that are so easy to understand are massive simplifications of what actually happens.

With column store, based in-memory, highly parallel data processing in SAP HANA the applied execution methods and with it the approaches to optimize their use for a query, are a lot more complex.

What I'm saying is this: this topic is complex and highly volatile as the developers are constantly trying to improve the support for ever new usage pattern. Don't expect easy answers and don't hope that what you know about SPS8 optimization will hold true in SPS10...

- Lars

Former Member
0 Kudos

Thank you for the prompt reply.

Former Member
0 Kudos

Hi All,

Thanks for this worferful post.I have seen real performance issue when AT with Calculated fields are used in a AV. When I use the same Table of AT and Join it in the Data foundation their was a 1000% improvement. But that is against the best practices of SAP of not using AT. Digging more I found the root cause for performance issue was my Calculated fields in AT. In the visual plan where Tables are joined at Table level only OLAP engine is used even with Calculated fields where as when AT with Calculated Fields even CALC engine comes to play.

Any way if any one can explain what to do and what not to do when building will be helpful.

Regards

Purnaram.k

lbreddemann
Active Contributor
0 Kudos

Hi Purnaram,

you might want to have a look into the HANA modeling workshop sessions presented on SAP TechED.

Also, the updated HA300 modeling courses from SAP education include at least parts of the dos, dont's and what to do how tips.

- Lars