cancel
Showing results for 
Search instead for 
Did you mean: 

CBO of MaxDB 7.7.06.07 - ORDER BY issue

Former Member
0 Kudos

Hello,

I made some tests to become more familiar with the CBO of MaxDB. Two identical tables were created, ZTEST6 and ZTEST7. The fields are:

MANDT CLNT3 - primary key

UNIID CHAR12- primary key

OTHID CHAR12

TEXT CHAR120

A secondary index on fields MANDT+OTHID was created too. Both tables have 100.000 records. Now I executed

SELECT

  • FROM "ZTEST7"

WHERE

"MANDT" = ? AND "UNIID" BETWEEN ? AND ?

ORDER BY "MANDT" , "UNIID"

and

SELECT

  • FROM "ZTEST6"

WHERE

"MANDT" = ? AND "UNIID" BETWEEN ? AND ?

ORDER BY "MANDT" , "OTHID"

The parameter in the WHERE clause were the same, in both cases 80.000 records were selected.

The first statement uses a full table scan with a costvalue of 2731. That is what I expected.

The second statement used a RANGE CONDITION FOR INDEX on fields MANDT+OTHID with a costvalue of 3722. So it looks like the ORDER BY statment confused the CBO to go via a different access path with a worse costvalue. Does anybody has an explanation for that behavior?

Regards BErnd

Accepted Solutions (0)

Answers (5)

Answers (5)

Former Member
0 Kudos

>>Sorry but you are comparing oranges and apples here.

10 tables were created with the same statement with the same data and the same order. So I guess they should have the same structure on the disks too. Each table occupies appr. 4200 pages (33MB) and they were created in a sorted order. Because of that procedure, I think ZTEST6 and ZTEST7 are identically.

Thanks for explanation what RESULT IS (NOT) COPIED in the execution plan means, I did not know before.

I am using the monitors in SAPGui, transaction DB50.

Now I will read your MaxDB blog related to space oddities. Done, very helpful to understand that sorted loads occupy less space in the target compared to the source.

In summary now I know that the sorting costs are covered when calculating the costvalue. As far as I remember, in DB2 the sorting effort is not that clear visible.

I wanted to close the ticket, but now I removed as recommended MANDT from the secondary index of table ZTEST6, because MANDT is part of the primary index. The result is confusing me even more, the costvalue is significant higher then before for a statement that should get benefit of the secondary index.

SELECT * FROM ZTEST6"

WHERE

"MANDT" = '001' AND "OTHID" BETWEEN '100000010000' AND '100000090000'

ORDER BY

"MANDT" , "OTHID"

Execution plan:

COLUMN OR INDEX STRATEGY PAGECOUNT

ZTEST6~Z01 RANGE CONDITION FOR INDEX 918

OTHID (USED INDEX COLUMN)

MANDT (USED KEY COLUMN)

RESULT IS COPIED , COSTVALUE IS 14406

QUERYREWRITE - APPLIED RULES:

DistinctPullUp 1

Former Member
0 Kudos

Hi,

now you are back to sorting "RESULT IS COPIED "

by using index Z01, result is only sorted by OTHID and you require -

ORDER BY "MANDT" , "OTHID" - so it has to be resorted but as optimizer decided to pick Z01 , this plan is(should be) still

cheaper then using primary index (try it disapbling secondary index and compare costs)

Regards

Ivan

lbreddemann
Active Contributor
0 Kudos

>

> >>Sorry but you are comparing oranges and apples here.

> 10 tables were created with the same statement with the same data and the same order. So I guess they should have the same structure on the disks too. Each table occupies appr. 4200 pages (33MB) and they were created in a sorted order. Because of that procedure, I think ZTEST6 and ZTEST7 are identically.

Why guess about this?

Why don't you just check the size of the tables in DB50?

Be aware that the effect of inserting data in a sorted manner will be gone as soon as you insert/change/delete data that makes a B*Tree rebalancing necessary.

> In summary now I know that the sorting costs are covered when calculating the costvalue. As far as I remember, in DB2 the sorting effort is not that clear visible.

To my knowledge DB2 also clearly states whether a sort will be done or not in the execution plan and it assigns a cost factor to it.

> I wanted to close the ticket, but now I removed as recommended MANDT from the secondary index of table ZTEST6, because MANDT is part of the primary index. The result is confusing me even more, the costvalue is significant higher then before for a statement that should get benefit of the secondary index.

Ok, Ivan already delivered the correct answer here.

Well, from what you wrote and asked I came to believe that you may start with reading the available documentation first.

Check out the Wiki pages about the [SQL Optimizer|http://wiki.sdn.sap.com/wiki/display/MaxDB/SQL+Optimizer].

regards,

Lars

Former Member
0 Kudos

Something went wrong with the IEplorer.

Resource monitzor:

#P/E #R/E avr.runtime

SELECT * from ZTEST7 17200 80001 68,279

SELECT * from ZTEST6 242000 80001 13,256

Why the statement on ZTEST6 is so much faster then that one on ZTEST7, even when more pages needs to be read, I do not know. May be because of the data cache.

Regards Bernd

lbreddemann
Active Contributor
0 Kudos

PART 2

Statement 1.2: ORDER BY primary key for ZTEST7:


SELECT * FROM "ZTEST7"
WHERE
"MANDT" = '001' AND "UNIID" BETWEEN '100000000000' AND '100000080000'
ORDER BY "MANDT" , "UNIID"


TABLENAME                        | COLUMN_OR_INDEX                  | STRATEGY                                 | PAGECOUNT  |
-------------------------------- | -------------------------------- | ---------------------------------------- | ---------- |
ZTEST7                           |                                  | RANGE CONDITION FOR KEY                  |        776 |
                                 | MANDT                            |      (USED KEY COLUMN)                   |            |
                                 | UNIID                            |      (USED KEY COLUMN)                   |            |
SQLCURS_2                        |                                  |    RESULT IS NOT COPIED , COSTVALUE IS   |        621 |
SQLCURS_2                        |                                  | QUERYREWRITE - APPLIED RULES:            |            |
SQLCURS_2                        |                                  |    DistinctPullUp                        |          1 |

Now, what's that?

How can it be that the same statement comes up with the same execution plan but different costs for the ZTEST7 table??

The reason for that is that I created ZTEST7 via INSERT (SELECT * FROM ZTEST6).

By doing this, the rows where put into the target table in the same order as the the primary key is defined.

Doing that leads to right-most page splits - a common effect in B*Tree implementaions.

See my blog [MaxDB - Space oddities, take II|http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/13002] [original link is broken] [original link is broken] [original link is broken]; for more on this.

So I think we can agree, that we cannot simply run two different statements on two supposedly equal tables and still expect to have the same table.

--> check next part of the reply

lbreddemann
Active Contributor
0 Kudos

Part 3

Anyway. The interesting question still is: "Wouldn't it be better to do a table scan anyhow?"

Let's check this by focussing just on the ZTEST6, the table that currently has the supportive index:

Statement 2.2.: ORDER BY supported by index


SELECT
* FROM "ZTEST6"
WHERE
"MANDT" = '001' AND "UNIID" BETWEEN '100000000000' AND '100000080000'
ORDER BY "MANDT" , "OTHID"

 TABLENAME                        | COLUMN_OR_INDEX                  | STRATEGY                                 | PAGECOUNT
 -------------------------------- | -------------------------------- | ---------------------------------------- | ----------
 ZTEST6                           | ZTEST6~Z01                       | RANGE CONDITION FOR INDEX                |        991
                                  | MANDT                            |      (USED INDEX COLUMN)                 |
                                  | MANDT                            |      (USED KEY COLUMN)                   |
                                  | UNIID                            |      (USED KEY COLUMN)                   |
 SQLCURS_2                        |                                  |    RESULT IS NOT COPIED , COSTVALUE IS   |       1873
 SQLCURS_2                        |                                  | QUERYREWRITE - APPLIED RULES:            |
 SQLCURS_2                        |                                  |    DistinctPullUp                        |          1

As expected the costs to read the data in a different order are higher now.

But we see that the result needed no materialization to be delivered (RESULT IS NOT COPIED).

Now, what would be the effort for this table without the index?

To check this we could use a hint, drop the index or simply disable it from being considered by the optimizer.

Let's go for the latter option:


alter index "ZTEST6~Z01" on ztest6 disable

and run the explain again (see part 4 ... how I 'love' this forum editor.... grrr):

lbreddemann
Active Contributor
0 Kudos

PART 4

-> Index disabled, same statement...


TABLENAME                        | COLUMN_OR_INDEX                  | STRATEGY                                 | PAGECOUNT  |
-------------------------------- | -------------------------------- | ---------------------------------------- | ---------- |
ZTEST6                           |                                  | RANGE CONDITION FOR KEY                  |       1083 |
                                 | MANDT                            |      (USED KEY COLUMN)                   |            |
                                 | UNIID                            |      (USED KEY COLUMN)                   |            |
SQLCURS_2                        |                                  |      RESULT IS COPIED   , COSTVALUE IS   |       4390 |
SQLCURS_2                        |                                  | QUERYREWRITE - APPLIED RULES:            |            |
SQLCURS_2                        |                                  |    DistinctPullUp                        |          1 |

Now that's the alternative to be considered here!

Without the index, the resultset needs to be materialized (RESULT IS COPIED) to sort it.

This leads to tremendous costs of 4390 topping the 1873 that were expected when the index was available.

Concerning your resource monitor data: make very sure to understand how your client app is fetching the data.

Depending on this, the values may highly differ in the resource monitor.

E.g. when using DB Studio only a few hundred rows are initally fetched and you would have to scroll the list down to fetch all rows.

Hope this helps to resolve your confusion.

regards,

Lars

Former Member
0 Kudos

Hello,

I do not want to drop the secondary index, but I will remove MANDT from the index and try again.

Both tables are sorted, I did not make to much effort for creating data.

100.000 records

first record MANDT= 001 and UNIID=100000000000 and OTHID=100000000000

last record MANDT= 001 and UNIID=100000099999 and OTHID=100000099999

Execution plan details are:

TABLENAME COLUMN OR INDEX STRATEGY PAGECOUNT

ZTEST7 RANGE CONDITION FOR KEY 4167

MANDT (USED KEY COLUMN)

UNIID (USED KEY COLUMN)

SHOW RESULT IS NOT COPIED , COSTVALUE IS 2731

SHOW QUERYREWRITE - APPLIED RULES:

SHOW DistinctPullUp 1

TABLENAME COLUMN OR INDEX STRATEGY PAGECOUNT

ZTEST6 ZTEST6~Z01 RANGE CONDITION FOR INDEX 991

MANDT (USED INDEX COLUMN)

OTHID (USED INDEX COLUMN)

MANDT (USED KEY COLUMN)

SHOW RESULT IS NOT COPIED , COSTVALUE IS 3538

SHOW QUERYREWRITE - APPLIED RULES:

SHOW DistinctPullUp 1

I guess the costvalue is exactly the number of pages the respective statement needs to read. So at least both can be compared.

From the resource monitor I got the following data:

#P/E #R/E avr.runtime

SELECT * from ZTEST7 17200 80001 68,279

SELECT * from ZTEST6 242000 80001 13,256

lbreddemann
Active Contributor
0 Kudos

Hello again.

Sorry but you are comparing oranges and apples here.

You've two statements with different order by conditions.

One of your tables supports this order by via an index.

The other one doesn't.

What you should look at instead of counting optimizer costs is: how big would the effort be to run the statement without a supporting storage structure?

Let's check:

Statement 1.1: ORDER BY primary key for ZTEST6:


SELECT * FROM "ZTEST6"
WHERE
"MANDT" = '001' AND "UNIID" BETWEEN '100000000000' AND '100000080000'
ORDER BY "MANDT" , "UNIID"

TABLENAME                        | COLUMN_OR_INDEX                  | STRATEGY                                 | PAGECOUNT  |
-------------------------------- | -------------------------------- | ---------------------------------------- | ---------- |
ZTEST6                           |                                  | RANGE CONDITION FOR KEY                  |       1083 |
                                 | MANDT                            |      (USED KEY COLUMN)                   |            |
                                 | UNIID                            |      (USED KEY COLUMN)                   |            |
SQLCURS_2                        |                                  |    RESULT IS NOT COPIED , COSTVALUE IS   |        882 |
SQLCURS_2                        |                                  | QUERYREWRITE - APPLIED RULES:            |            |
SQLCURS_2                        |                                  |    DistinctPullUp                        |          1 |

--> see next part of reply!

Former Member
0 Kudos

>

> The first statement uses a full table scan with a costvalue of 2731. That is what I expected.

> The second statement used a RANGE CONDITION FOR INDEX on fields MANDT+OTHID with a costvalue of 3722. So it looks like the ORDER BY statment confused the CBO to go via a different access path with a worse costvalue. Does anybody has an explanation for that behavior?

>

> Regards BErnd

Hi,

as far as I know cost value is relative value , not absolute number and you should only compare cost values of SAME statement

->Try dropping secondary index on table ZTEST6, then run second statement. You should get cost value you can compare to 3722.

regards

Ivan

lbreddemann
Active Contributor
0 Kudos

> as far as I know cost value is relative value , not absolute number and you should only compare cost values of SAME statement

> ->Try dropping secondary index on table ZTEST6, then run second statement. You should get cost value you can compare to 3722.

Hi Ivan (happy new year by the way ;-).

The important point here is to see that the optimizer actually did the right thing by avoiding the extra work of sorting the data.

On a very basic level the work required for both statements is that:

Statement a) (ORDER BY primary key):

Read 80% of all rows via the primary B*TREE and get the order of rows automatically by doing that.

Statement b) (ORDBER BY secondary key)

Read 80% and sort them. As this is lots of data, this leads to materialized intermediate resultsets.

OR

Read the whole secondary index B*TREE to find all rows that fit to the search conditions and get the order by using the sedondary index.

So all in all the decision here is: what is more expensive? Looking up 80.000 records in order of the secondary key or reading/writing/re-reading 80.000 records directly from the primary B*TREE?

Since in MaxDB we generally try to avoid intermediate resultsets the index range scan wins in this case.

regards,

Lars

lbreddemann
Active Contributor
0 Kudos

Hello Bernd,

we need a bit more information.

Please post the exact explain plan output.

Also it would be helpful to know more about the data you put into the tables.

Anyhow, since your second query requires the data to be sorted via "MANDT" , "OTHID" it is likely to be much quicker to just read all fitting rows in the correct order via the secondary index.

Be aware that MaxDB stores the full primary key as a logical pointer to the rows in the table.

That means that you don't need to add columns (like MANDT) to any secondary index if they are already part of the primary key.

It also means, that MaxDB can evaluate the UNID condition already on the index.

So basically, MaxBD can read along the index, ignoring the non-fitting rows and still get all rows in the wanted order without the need of re-sorting an intermediate result set.

To me the optimizer does the right thing here.

regards,

Lars