Solved: Oracle "DISTINCT" vs. "SORT + Delete adjacent dupl...

peter_strauss · ‎05-17-2013

Hello,

I have a simple "select distinct" statement on a small table. This single statement accounts for over 80% of all CPU time in the DB.

There are quite a lot of comments that in abap coding SELECT DISTINCT should be avoided and replaced with a select to an internal table followed by sort + delete adjacent duplicates.

I cannot understand why this would be a general recommendation.

Is ABAP better than Oracle (or other databases) at filtering duplicates?

I have a feeling that if I do this I will reduce CPU time on the DB but increase it (maybe to a greater extent) on the application servers.

Are there certain criteria where this recommendation makes sense?

At a general level this is a question about ABAP, however I would be interested to know the opinion of Oracle experts so I'm posting here.

Former Member · ‎05-17-2013

Hi Peter,

Because, "SELECT DISTINCT" does sort operation. If you take a look at same queries with the DISTINCT statement and without the DISTINCT statement, both execution plans should be same. Under this circumstance data read strategy does not show any difference on both queries. Only difference is sort operation performed while reading the dataset with the DISTINCT statement. It is not make a performance penalty to use SELECT DISTINCT, when the table contains a few records. It will make a serious performance problem when the dataset to be sorted is large.

Because of this reason, it is good to be analyze the table characteristic while accessing the data, first.

From the ABAP point of view, you just read the dataset with the secondary key and sort it in the memory. Then, delete the duplicaties over the memory. By doing so, the operation will be done in the memory. I am not so sure behind of the scene of DISTINCT statement at the Oracle level, but as far as I am sure that more additional operations are performed during the operation, than a simple ABAP call.

However, it is important to understand why the records are duplicated in the table. Another approach ma be choose to apply normalization forms and read the data with the correct indexes to optimize your application.

Best regards,

Orkun Gedik

stefan_koehler · ‎05-17-2013

Hi Peter,

> This single statement accounts for over 80% of all CPU time in the DB.

Where do you get this information from? AWR? How often is this single SQL executed in a specific time frame? Is the CPU load caused by a single run or in sum of all executed SQLs? Just looking at the shared pool (e.g. by DBACOCKPIT) gives you wrong values as other SQLs are aged out and so on.

> There are quite a lot of comments that in abap coding SELECT DISTINCT should be avoided and replaced with a select to an internal table followed by sort + delete adjacent duplicates.

That might work pretty well, if the data set is not very large, but just think about the needed memory if you got a large dataset. The large amount of data needs to be transferred to the application server first and handled afterwards.

> Because, "SELECT DISTINCT" does sort operation. If you take a look at same queries with the DISTINCT statement and without the DISTINCT statement, both execution plans should be same.

@Orkun: No, it certainly does not. This was the case in Oracle 10g pre-times. Oracle introduced hash aggregation in Oracle 10g. Here is just a tiny test case on my 11.2.0.3.2 database.

http://oracle-randolf.blogspot.de/2011/01/hash-aggregation.html

SYS@T11:133> create table TESTAB as select * from dba_objects;
SYS@T11:133> select distinct(owner) from TESTAB;

> The surprising thing was that the select WITHOUT the distinct clause was actually slower (and had higher CPU time).

The question here is - are the execution plans the same? How did you find out the "higher CPU time"? CPU is usually caused by excessive logical I/O or specific functions (like hashing and so on). Oracle SQL Monitor is perfect to determine the CPU causer.

> Running the same select, including the DISTINCT and adding a FULL hint reduced the CPU time by about half. (the table only has about 10000 rows so a full table scan doesn't take long at all).

That observation confirms the suspicion about a different execution plan. It seems like an index access and the corresponding table access (by rowid) causes unnecessary logical I/O (maybe bad clustering factor as well) and high CPU time is the consequence. Execution plan information would make everything clear (especially with possible filters). By the way the amount of rows has nothing to do with the work that needs to be done by a full table scan (high water mark). You can have a table with only one row, but read hundreds of GBs by FTS

> lack of official supporting documentation is what brought me here.

ABAP developers are usually not very good SQL developers and so they do what they know best. Transfer all to the ABAP application server and handle it there. This might work for the first time (when the dataset is not huge and the parallel usage is not high), but unfortunately it will not scale out properly.

The database engine is able to do it much smarter and better in many cases (if the SQL is written well and the database structure is fitting) , but you need to know how to handle different database platforms to get the best result.

Regards

Stefan

Former Member · ‎05-17-2013

There are quite a lot of comments that in abap coding SELECT DISTINCT should be avoided and replaced with a select to an internal table followed by sort + delete adjacent duplicates.

Can you provide any links to these recommendations?

Oracle "DISTINCT" vs. "SORT + Delete adjacent duplicates"

Accepted Solutions (1)

Accepted Solutions (1)

Answers (2)

Answers (2)

Re: "Failed to update setup engine executables. Pr...

Re: Timer showing while sending the mails from SAP

Re: Best practice to connect to multiple databases...

SAP Hana Calculation view input parameter from JPA...

Re: Adaptation error in sap mdg ui screen customer