cancel
Showing results for 
Search instead for 
Did you mean: 

MaxDB 7.7 - performance problem with self joins

Former Member
0 Kudos

My query does a self join four times and performance in maxdb compared to Oracle is very slow. Comparing the explain plan for Maxdb it looks like each self join utilizes the wrong index and the page counts seem to be was too high. I have the same ddl structure in both Oracle and Maxdb.

I tried Maxdb configured with auto stats with default sampling, I also tried a 100% sampling and ran the stats by hand and the same plan is chosen by the optimizer. 100% sampling is ok because the tables in question are relatively small and static (a few hundred MB).

I don't want to put a hint in every query that does a self join for Maxdb because that is code maintenance nightmare. I'm not sure if forcing another index access woudl be faster anyway because I haven't tested it.

Any idea why Maxdb self join is performing so badly?

MAXDB explain plan:

R3 XA4REGION RANGE CONDITION FOR INDEX 7395

ONLY INDEX ACCESSED

REGIONTYPEID (USED INDEX COLUMN)

NAME (USED INDEX COLUMN)

EFFDATE (USED INDEX COLUMN)

R5 XA1REGION JOIN VIA MULTIPLE INDEXED COLUMNS 7395

NAME (USED INDEX COLUMN)

REGIONTYPEID (USED INDEX COLUMN)

REGIONGROUPID (USED INDEX COLUMN)

TAXAREAID (USED INDEX COLUMN)

R4 XA1REGION JOIN VIA MULTIPLE INDEXED COLUMNS 7395

NAME (USED INDEX COLUMN)

REGIONTYPEID (USED INDEX COLUMN)

REGIONGROUPID (USED INDEX COLUMN)

TAXAREAID (USED INDEX COLUMN)

R2 XA1REGION JOIN VIA MULTIPLE INDEXED COLUMNS 7395

NAME (USED INDEX COLUMN)

REGIONTYPEID (USED INDEX COLUMN)

REGIONGROUPID (USED INDEX COLUMN)

TAXAREAID (USED INDEX COLUMN)

R1 XA1REGION JOIN VIA MULTIPLE INDEXED COLUMNS 7395

NAME (USED INDEX COLUMN)

REGIONTYPEID (USED INDEX COLUMN)

REGIONGROUPID (USED INDEX COLUMN)

TAXAREAID (USED INDEX COLUMN)

NO TEMPORARY RESULTS CREATED

S H O W RESULT IS COPIED , COSTVALUE IS 24724

Here's Oracle explain for the same query:

-


Id

Operation

Name

Rows

Bytes

Cost (%CPU)

Time

-


0

SELECT STATEMENT

1

170

14 (15)

00:00:01

1

SORT ORDER BY

1

170

14 (15)

00:00:01

2

NESTED LOOPS

1

170

13 (8)

00:00:01

3

NESTED LOOPS

1

136

11 (10)

00:00:01

4

NESTED LOOPS

1

102

9 (12)

00:00:01

  • 5

HASH JOIN

1

68

7 (15)

00:00:01

  • 6

INDEX RANGE SCAN

XA4REGION

64

2176

3 (0)

00:00:01

  • 7

INDEX RANGE SCAN

XA4REGION

88

2992

3 (0)

00:00:01

  • 8

INDEX RANGE SCAN

XA5REGION

1

34

2 (0)

00:00:01

  • 9

INDEX RANGE SCAN

XA5REGION

1

34

2 (0)

00:00:01

  • 10

INDEX RANGE SCAN

XA5REGION

1

34

2 (0)

00:00:01

-


Accepted Solutions (0)

Answers (1)

Answers (1)

lbreddemann
Active Contributor
0 Kudos

Hi Joe,

before going into the technical details here I've to make a few remarks:

<b>1. comparing explain plans/costs just don't make sense between different database products.</b>

There are so many difference in how queries can and will be handled between Oracle and MaxDB and any other Database.

Having the "same DDL structure" in both systems is very often nonsense.

The right question is: my query takes n (milli)seconds on system A, what can I do to make it at least as fast as this on system B?

<b>2. Four time selfjoin? It's likely your performance issue is on the design layer.</b>

If I look to your Oracle Explain plan, there are several Index Range Scans used in nested loops a sort order by and yet the biggest chunk of data handled in one step is only 88 rows!

Although you state that the table is "a few hundred MB" in size. Are you are using Bind-Variables on the Oracle side? Are there histograms involved to catch up a skew data distribution?

Usually for a rather even distribution the selectivity calcuation is not so much different for both system...

<b>3. Database stats:Performance problems are not solved by updating the statistics. </b>

Only in rare cases where the DBA did not took care of his duties and left the stats completely alone the necessary information are missing.

There is no red button marked "update statistics" and suddenly your performance is better.

Ok, now to your question.

Please give these information when you ask for performance issues:

Query itself.

Tabledefinition (mark out the primary key!).

Indexdefinitions (order of columns is highly important).

Optimizer Stats.

Database Parameter setup.

Given that there is a good chance we can help you with your query.

KR Lars

Former Member
0 Kudos

1. I agree the same ddl structure is usually not sufficient due to nuances between vendor optimizer engines, but most vendor optimizer engines are similar. Most optimizer's deficiency can be overcome by additional indexes or optimizer hints. The goal is the fastest query performance across vendor engines with the least amount of code changes for an application that tries to database engine agnostic. That's one of the reasons for a jdbc driver for an application! Developers try to think the database engine doesn't matter - but we know different.

2. I agree the four time selfjoin is a current design detriment, but it's legacy. The Oracle optimizer utilizes secondary indexes and makes a judgment that the secondary indexes can be range scanned to get to the data faster than the primary indexes. Oracle judges correctly. There are no histograms. Binda variables are being used on both Maxdb and Oracle. Maxdb utilizies the primary index for the self join and is slow. If I delete the secondary indexes in Maxdb the execution time actually improves.

3. You comment about statistics gathering is nonsensical. If the statistics are gathered on a small sampling of data, and that data is not a good representation of the selectivity of that data the optimizer will choose the wrong access path based on the information it has. Isn't that why the selectivity is configurable for statistic generation?

Posting specific code and ddl structure for an application that is not open source wouldn't be a good idea. I'll have to work with a dedicated SAP developer.

I am looking for insight into how the Maxdb optimizer handles self join not bashing Maxdb. The Oracle explain plan was added for comparision. I will add that Sql Server's explain plan is comparable to Oracle but requires less indexes. Unfortunetly, DB2 suffers from the same poor performance on this self join as Maxdb.

lbreddemann
Active Contributor
0 Kudos

> 1. I agree the same ddl structure is usually not

> sufficient due to nuances between vendor optimizer

> engines, but most vendor optimizer engines are

> similar.

That's not true - they are VERY different in may ways.

> Most optimizer's deficiency can be overcome

> by additional indexes or optimizer hints. The goal is

> the fastest query performance across vendor engines

> with the least amount of code changes for an

> application that tries to database engine agnostic.

> That's one of the reasons for a jdbc driver for an

> application! Developers try to think the database

> engine doesn't matter - but we know different.

>

> 2. I agree the four time selfjoin is a current design

> detriment, but it's legacy. The Oracle optimizer

> utilizes secondary indexes and makes a judgment that

> the secondary indexes can be range scanned to get to

> the data faster than the primary indexes. Oracle

> judges correctly.

It does so for Oracle databases - this may not be the right thing for any other DBMS.

> There are no histograms. Binda

> variables are being used on both Maxdb and Oracle.

> Maxdb utilizies the primary index for the self join

> and is slow. If I delete the secondary indexes in

> Maxdb the execution time actually improves.

Your MaxDB explain plan proved that it uses a secondary Index instead of the primary key.

In MaxDB everytime you need a Secondary Index + a Tableacess this is leading to (at least) two b*trees to be acessed.

Oracle uses direct pointers in its Indexes (ROWIDs) which enable it to directly jump from the index leaf level to the data indexed. That can make a big difference.

> 3. You comment about statistics gathering is

> nonsensical. If the statistics are gathered on a

> small sampling of data, and that data is not a good

> representation of the selectivity of that data the

> optimizer will choose the wrong access path based on

> the information it has. Isn't that why the

> selectivity is configurable for statistic

> generation?

The feature to setup a sample size for statistics is there to make gathering statistics on very large tables/indexes possible at all. In SAP environments there are well documented heuristics what samplesizes can be used with what tablesizes to get a good tradeoff between precision and runtime to gather the statistics.

My point was: if there are no administrative mistake like using a too little sample size than just collecting new statistics won't help.

> Posting specific code and ddl structure for an

> application that is not open source wouldn't be a

> good idea. I'll have to work with a dedicated SAP

> developer.

Hmm... so it IS a SAP application and not so "legacy" as you mentioned. Therefore the design could possibly be changed.

Nevertheless this is a interesting point: do you really consider a single table/query design out of a relational database with >40K tables relevant to be a offense for a license restriction?

Basically it's not about revealing application logic or functionality up to a point where somebody could reuse it.

> I am looking for insight into how the Maxdb optimizer

> handles self join not bashing Maxdb. The Oracle

> explain plan was added for comparision. I will add

> that Sql Server's explain plan is comparable to

> Oracle but requires less indexes. Unfortunetly, DB2

> suffers from the same poor performance on this self

> join as Maxdb.

Sorry - but that won't help anybody.

Still the offer stands: bring in something to reproduce and you might get some help about this.

KR Lars

Former Member
0 Kudos

You are correct on the secondary index in the Maxdb explain plan. I incorrectly stated it was not using a secondary index, when it fact it was utilizing one of the secondary indexes. What I should have typed is that it's not utilizing the other secondary index available to the optimizer.

The 3rd party application is not just a SAP application it also runs with Oracle, Peoplesoft, JD Edwards.

Yes, the physical data model could be changed for SAP to allow performance optimization specific for Maxdb.

Thank you for your insight.

Former Member
0 Kudos

It turns out there was a MaxDb kernel issue. The MaxDB group made a one-off kernel for me that solved the problem.