cancel
Showing results for 
Search instead for 
Did you mean: 

Disastrous slowdown with “equal-operator” select from DBERCHZ1

Former Member
0 Kudos

Disastrous slowdown with u201Cequal-operatoru201D select from DBERCHZ1

Hello.

Please, advise.

We have migrated/cloned system to new powerful hardware. From the system start we experience noticeable slowdown with some programs. For example background batch billing jobs started from transaction EA38 lasts about 12 hours against 3 hours in previous older system. The background job uses standard ABAP-report RFKK_MASS_ACT_SINGLE_JOB. No significant customer-extensions are involved here. Using EA38 with debugging-mode (enter command u2018DBUGu2019) activates running report RFKK_MASS_ACT_SINGLE_JOB in dialog mode allowing tracing and debugging this process in detail. Observing this process weu2019ve noticed that most of the time it spends reading huge (1.3 billion records) table DBERCHZ1 with one single condition on key-field BELNR with operator u201Cequalu201D (u201C=u201D).

This is very important because next we have conducted some research running simple test program.

The test consists of preparing list of 5000 document numbers (BELNR), and then using this list in SQL-request to DBERCHZ1~BELNR (and some left joins) within FOR ALL ENTRIES. Also the ORACLE-hint (%_HINTS ORACLE '&max_in_blocking_factor ...&') was used to vary parameter rsdb\max_in_blocking_factor. When rsdb\max_in_blocking_factor is greater than 1, IN-operator used to split documents list. When rsdb\max_in_blocking_factor = 1, then EQUAL (u201C=u201D) operator used instead. Time of the SQL-request was measured several times in low-system-load period. Here is rsdb\max_in_blocking_factor VS time (approximate in seconds):

100	7,4 s
10	1,9 s
5	2,3 s
1	1400 s	(!)

As you can see the dropdown is terrible when using EQUAL (u201C=u201D) operator in SQL-request.

We were lucky to carry out the same test in old system as it was not shut down yet. Even it is not high-loaded now the results are extremely significant:

100	2,7 s
10	12 s
5	2,3 s
1	26 s	(!)

The execution plans are equal for both systems in case of rsdb\max_in_blocking_factor = 1. Yes, they varies in other cases, but it doesnu2019t have significant influence on the result. So older system is not as much affected from using EQUAL operator as new system is. This test shows that there are some problems in the new system.

We have spent long time discussing problem with our Oracle-basis-specialists, but they claim bad ABAP-code. Though tests show that problems are most likely in DB-side.

Can you advise/conjecture what settings could be involved in this problem? Why using EQUAL operator in SQL-request slows down the new system so much?

Accepted Solutions (0)

Answers (4)

Answers (4)

Former Member
0 Kudos

We were experiencing slowdowns with the program RFKK_MASS_ACT_SINGLE_JOB as well. We applied SAP Note 1587089 and it fixed our problems.

Mary Kathryn

Former Member
0 Kudos

Hi Mary,

Note 1587089 seems to be about some kind of dead locks when jobs get locked in wait status.

We have performance problem with specific SQL queries.

For now we bypassed problem implementing SAP Note 1538837. Actualy there is not so much in this Note - it just helps to avoid modifications of sandard functions providing enhacement spot ISU_BI_BILLING_SPOT. We used method ERCHZ_NO_JOIN from this spot to choose branch of the code (FM ISU_DB_ERCHZ_SELECT_BILL_NEW) containing SQL queries without JOIN statements joining results on the ABAP-side.

* >>>>> note 1538837
* Implementation of BAdI to change no_join without modification
  TRY.
      CALL BADI cl_isu_bi_exithandler=>isu_bi_cust_control->erchz_no_join
        CHANGING
          xy_no_join = no_join.
    CATCH cx_badi_initial_reference
         INTO l_badi_error.                             "#EC NO_HANDLER
  ENDTRY.
* <<<<< note 1538837

* >>>>> note 934374
  if no_join is initial.
    perform db_erchz_select_join using x_belnr
                              changing dbt_erchz
                                       dbt_erchz1.
  else.
    perform db_erchz_select using x_belnr
                         changing dbt_erchz
                                  dbt_erchz1.
  endif.
* <<<<< note 934374

Downgrading to such simple SQL query helps to avoid problems with database. On the other hand implementing JOIN-logic on the ABAP-side consumes more aplication server memory but we dont have problem with that.

But problem with specific SQL-queries is still there. It can be seen through my specific tests. Thereby I consider this topic is still unresolved. We bypassed one problem in one program but there is no guarantee that some other program could experience the same problems.

Edited by: Vyacheslav Cherchik on Feb 14, 2012 8:55 AM

stefan_koehler
Active Contributor
0 Kudos

Hello Vyacheslav,

1) Maybe the execution plan seems to be the same, but it is not (pushed down predicates, different access / filter predicates or join conditions)

2) Without any information about where the data come from (cache or disk) and its wait times it is not possible to provide any conclusion

3) Maybe the order of the data has changed due to sorted export (logical / physical I/O)

4) Maybe your OS is not configured probably (I/O scheduler for example) or your both operating systems where configured completely different (file system cache of OS was also used and so on).

Your case is very specific, it needs to take a look under the hood. I have specialized on such cases - check my profile / homepage if you need assistance.

Regards

Stefan

P.S.: Just out of curiosity .... why not using the default max_in_blocking_factor of 5?

Former Member
0 Kudos

Hello Mark,

Hello Stefan,

I appreciate your participation!

Iu2019ve gathered requested information.

Here is the new systemu2019s spfile.ora received recently from basis-support-team:

*._cursor_features_enabled=10
*._fix_control='5099019:ON','5705630:ON','5765456:3','6221403:ON','6329318:ON','6399597:ON','6430500:ON','6440977:ON','6626018:ON','6670551:ON','6972291:ON','7325597:ON','7692248:ON','7891471:ON','9196440:ON','9495669:ON','6055658:OFF'
*._optim_peek_user_binds=FALSE
*._optimizer_mjc_enabled=FALSE
*._second_spare_parameter=1
*._sort_elimination_cost_ratio=10
*.background_dump_dest='/oracle/RT2/saptrace/background'
*.compatible='10.2.0'
*.control_file_record_keep_time=30
*.control_files='/oracle/RT2/origlogA/cntrl/cntrlRT2.dbf','/oracle/RT2/origlogB/cntrl/cntrlRT2.dbf','/oracle/RT2/sapdata1/cntrl/cntrlRT2.dbf'
*.core_dump_dest='/oracle/RT2/saptrace/background'
*.db_block_size=8192
*.db_cache_size=429496729600
*.db_files=1022
*.db_flashback_retention_target=2880
*.db_name='RT2'
*.db_recovery_file_dest='/oracle/RT2/oraflash'
*.db_recovery_file_dest_size=214748364800
*.event='10027 trace name context forever, level 1','10028 trace name context forever, level 1','10142 trace name context forever, level 1','10183 trace name context forever, level 1','10191 trace name context forever, level 1','10411 trace name context forever, level 1','10629 trace name context forever, level 32','10753 trace name context forever, level 2','38068 trace name context forever, level 100','38085 trace name context forever, level 1','38087 trace name context forever, level 1','44951 trace name context forever, level 1024','14532 trace name context forever, level 1'
*.filesystemio_options='SETALL'
*.log_archive_dest_1='LOCATION=/oracle/RT2/oraarch/RT2arch'
*.log_archive_format='%t_%s_%r.dbf'
*.log_buffer=1048576
*.log_checkpoints_to_alert=TRUE
*.max_dump_file_size='20000'
*.open_cursors=800
*.optimizer_index_cost_adj=20
*.parallel_execution_message_size=16384
*.parallel_max_servers=1600
*.parallel_threads_per_cpu=1
*.pga_aggregate_target=85899345920
*.processes=5840
*.query_rewrite_enabled='FALSE'
*.recyclebin='OFF'
*.remote_os_authent=TRUE
*.replication_dependency_tracking=FALSE
*.sessions=11680
*.shared_pool_size=8589934592
*.star_transformation_enabled='TRUE'
*.undo_management='AUTO'
*.undo_retention=43200
*.undo_tablespace='PSAPUNDO'
*.user_dump_dest='/oracle/RT2/saptrace/usertrace'

They didnu2019t send spfile for old system but Iu2019ve looked in both systems in transaction

DB02 \ Performance \ Additional Functions \ Database Parameters

A believe those are the same settings that are in file. Iu2019ve compared settings sets from DB2 and found difference between systems only in memory/cache etc. Here are differences Iu2019ve found:

Parameter			NEW system	OLD system
db_cache_size			429496729600	161061273600
db_flashback_retention_target	2880		1440
parallel_max_servers		1600		640
pga_aggregate_target		85899345920	25769803776
processes			5840		4880
sessions			11680		9760
shared_pool_size		8589934592	4294967296

Here is server configuration:

Server FUJITSU PRIMERGY RX900 S2 (CPU 80 Core, RAM 1TB), Hi-End Disk Array FUJITSU ETERNEUS DX8700, server is connected with array via FC, file system is ext3, RAID10, OS SLES11 SP1 + actual updates, DB ORACLE 10.2.0.4.

1) Execution plans are equal in both systems. This is sure. They differ only in estimation time/counts. But tree, order, indexes, joins, predicates, accesses are the same.

2) First run of test program (at the start of the workday for example) gets data without cache, but following runs use cache. Test time measuring was performed with a lot of runs with later averaging excluding influence of the cache and run sequence. I donu2019t know about wait times. Where can I check it?

3) The unsorted export was performed (default SAP-installer setting) so data order was not changed.

4) O/S SLES11 SP1 is configured according to SAP recommendations

About rsdb/max_in_blocking_factor.

It was used ONLY in %_HINTS ORACLE expression just to compare IN and EQUAL operators! When rsdb/max_in_blocking_factor = 1 it is equivalent to using SELECT witin LOOP with EQUAL operator.

The system value of this parameter is default u201C5u201D as recommended. And by the way my tests show that indeed value u201C5u201D is OK for most of the cases. Also test shows that using FOR ALL ENTRIES is better than LOOP about 10 times (in our OLD system, from 2.3 seconds to 26 seconds). At the same time in NEW system performance decrease is dramatic (from 2.3 seconds to 1400 seconds). And that I believe reveals some problems with DBERCHZ1 table in our NEW system.

To brush any doubts about rsdb/max_in_blocking_factor I rewrote test program without using this parameter. Just a LOOP with SELECT with EQUAL operator. And the results are the same.

Stefan, thanks for your assistance offer. I will notify our experts and show them your contacts.

Former Member
0 Kudos

And that I believe reveals some problems with DBERCHZ1 table in our NEW system.

Would be interesting to compare the clustering factor of DBERCHZ1~0 (lower is better). This could have a huge impact, is quite likely as Stefan already mentioned.

Cheers Michael

Former Member
0 Kudos

Hi, Michael

I looked at u201CDB02 \ Space \ Segments \ Detailed Analysisu201D for DBERCHZ1~0 segment. Tab u201CStorageu201D:

			NEW		OLD		NEW/OLD
Space			
Allocated space (Kb)	54641344	70584320	0,77
Allocated blocks	6830168		8823040		0,77
Allocated extents	838		1100		0,76
Extent Structure			
Initial extent (Kb)	1740800		1740800	
Next extent (Kb)	0		0	
Min extents		1		1	
Max extents		2147483645	2147483645	
PCT increase (%)	0		0	
Block Structure			
Block size (Kb)		8		8	
PCT free (%)		10		10	
Transactions initial	2		2	
Transactions maximum	255		255	
Header minimum (Byte)	107		107	
Data maximum (Byte)	5926		5926	
Contents			
*Block levels		3		3	
*Leaf blocks		6829471		8818199		0,77
*Distinct keys		1388697000	1296882000	1,07
*Avg. leaf blocks per key	1	1	
*Avg. data blocks per key	1	1	
*Clustering factor	248633000	160852000	1,55
Freelist Stats			
Process freelists	0		0	
Freelist groups		0		0	
Degree (Parallel query)	1		1	
Instances (Parallel query)	1	1	
Other			
Index type		NORMAL		NORMAL	
Index status		VALID		VALID	
Uniqueness		UNIQUE		UNIQUE

There is difference between systems: 248633000 (NEW) against 160852000 (OLD). Could it affect so much on performance? What really is u201Cclusteringu201D? Is it some kind of file fragmentation? What arrangements should be made?

stefan_koehler
Active Contributor
0 Kudos

Hello Vyacheslav,

1) You only take a look at the ST05 execution plan via SAP. I would like to check the execution plans with DBMS_XPLAN and the options "ALL ALLSTATS OUTLINE PEEKED_BINDS LAST".

2) Perform a SQL trace with wait statistics to get the needed details or execute the SQL in SQL*Plus and perform an "AUTOTRACE TRACEONLY" to get an overview for each execution.

3) Ok, but even if you perform an unsorted exported the "physical" data order (can) change - as you can see in your posted index statistics. How many rows are in the table?

4) But you are talking about database performance and not SAP right now. Here is an article about the I/O scheduler for databases for example: http://www.redhat.com/magazine/008jun05/features/schedulers/

Regards

Stefan

Former Member
0 Kudos

Is it some kind of file fragmentation?

The clustering factor is the number of blocks i would have to read if i read all rows through an index. It represents the relation of the ordering of the table's rows compared to an index.

What arrangements should be made?

To get a better clustering factor one would have to reorg the table with that sort options. Regarding the size of your table, this is not yet an option.

The clustering factor is always larger than the number of the index leaf blocks (this would be best, and means the table data is sorted the same way as the index). The upper limit is the number of rows.

The table data order is relevant when doing index range scans.

See note [832343 - FAQ: Clustering factor|https://service.sap.com/sap/support/notes/832343] for a full explanation.

Could it affect so much on performance?

Well the clustering factor is higher than before, this certainly means more buffer gets somewhere around factor 1.5. But looking at your data, there could be other reasons adding to the problem.

Lets continue with the steps Stefan recommends and keep the clustering factor on the backside of our heads for the moment.

Cheers Michael

stefan_koehler
Active Contributor
0 Kudos

Hello Michael,

The clustering factor is always larger than the number of the index leaf blocks (this would be best, and means the table data is sorted the same way as the index). The upper limit is the number of rows.

i guess this is just a typo ... you mean if the clustering factor is the same as the amount of data blocks (of the corresponding table) - this would be the best )

Regards

Stefan

Former Member
0 Kudos

The clustering factor is always larger than the number of the index leaf blocks

The clustering factor is always larger than the number of the table blocks

Yes absolutely right, thanks for correcting that!

Cheers Michael

Former Member
0 Kudos

Hi,

what esle changed when you migrated?

Also what's your SAP ReleaseOracle version O/S etc

Did Oracle get patched Patching ?

Mark

Former Member
0 Kudos

Hereu2019s test ABAP-SQL:

  select
...< about 30 fields >
  from
    dberchz1
    left join dberchz2 on dberchz2~belnr    = dberchz1~belnr and
                          dberchz2~belzeile = dberchz1~belzeile
    left join dberchz3 on dberchz3~belnr    = dberchz1~belnr and
                          dberchz3~belzeile = dberchz1~belzeile
    left join dberchr  on dberchr~belnr     = dberchz1~belnr and
                          dberchr~belzeile  = dberchz1~belzeile
  into corresponding fields of table __erchs_st
  for all entries in __erchs u201C(5000 entries)
  where
    dberchz1~belnr = __erchs-belnr and
  %_HINTS ORACLE l_hint.

Hereu2019s test SQL-statement from trace-view (ST05) in case of u201Crsdb\max_in_blocking_factor = 1u201D:

SELECT
 < about 30 fields >
FROM
  "DBERCHZ1" T_00 LEFT OUTER JOIN "DBERCHZ2" T_01 ON T_01 . "MANDT" = :A0 AND T_01 . "BELNR" = T_00
  . "BELNR" AND T_01 . "BELZEILE" = T_00 . "BELZEILE" LEFT OUTER JOIN "DBERCHZ3" T_02 ON T_02 . "MAN
  DT" = :A1 AND T_02 . "BELNR" = T_00 . "BELNR" AND T_02 . "BELZEILE" = T_00 . "BELZEILE" LEFT OUTER
  JOIN "DBERCHR" T_03 ON T_03 . "MANDT" = :A2 AND T_03 . "BELNR" = T_00 . "BELNR" AND T_03 . "BELZE
  ILE" = T_00 . "BELZEILE"
WHERE
  T_00 . "MANDT" = :A3 AND T_00 . "BELNR" = :A4

Execution Plan for u201Crsdb\max_in_blocking_factor = 1u201D:

SELECT STATEMENT ( Estimated Costs = 6 , Estimated #Rows = 10 )

       11 HASH JOIN OUTER
          ( Estim. Costs = 6 , Estim. #Rows = 10 )
          Estim. CPU-Costs = 10.469.722 Estim. IO-Costs = 4
          Access Predicates

           8 HASH JOIN OUTER
             ( Estim. Costs = 4 , Estim. #Rows = 10 )
             Estim. CPU-Costs = 6.983.584 Estim. IO-Costs = 3
             Access Predicates

               5 HASH JOIN OUTER
                 ( Estim. Costs = 3 , Estim. #Rows = 10 )
                 Estim. CPU-Costs = 3.499.188 Estim. IO-Costs = 2
                 Access Predicates

                   2 TABLE ACCESS BY INDEX ROWID DBERCHZ1
                     ( Estim. Costs = 1 , Estim. #Rows = 10 )
                     Estim. CPU-Costs = 13.452 Estim. IO-Costs = 1

                       1 INDEX RANGE SCAN DBERCHZ1~0
                         ( Estim. Costs = 1 , Estim. #Rows = 10 )
                         Search Columns: 2
                         Estim. CPU-Costs = 6.137 Estim. IO-Costs = 1
                         Access Predicates

                   4 TABLE ACCESS BY INDEX ROWID DBERCHZ2
                     ( Estim. Costs = 1 , Estim. #Rows = 2 )
                     Estim. CPU-Costs = 7.727 Estim. IO-Costs = 1

                       3 INDEX RANGE SCAN DBERCHZ2~0
                         ( Estim. Costs = 1 , Estim. #Rows = 2 )
                         Search Columns: 2
                         Estim. CPU-Costs = 5.817 Estim. IO-Costs = 1
                         Access Predicates

               7 TABLE ACCESS BY INDEX ROWID DBERCHR
                 ( Estim. Costs = 1 , Estim. #Rows = 4 )
                 Estim. CPU-Costs = 6.387 Estim. IO-Costs = 1

                   6 INDEX RANGE SCAN DBERCHR~0
                     Search Columns: 2
                     Estim. CPU-Costs = 3.049 Estim. IO-Costs = 0
                     Access Predicates

          10 TABLE ACCESS BY INDEX ROWID DBERCHZ3
             ( Estim. Costs = 1 , Estim. #Rows = 8 )
             Estim. CPU-Costs = 8.129 Estim. IO-Costs = 1

              9 INDEX RANGE SCAN DBERCHZ3~0
                ( Estim. Costs = 1 , Estim. #Rows = 8 )
                Search Columns: 2
                Estim. CPU-Costs = 6.017 Estim. IO-Costs = 1
                Access Predicates

former_member188883
Active Contributor
0 Kudos

Hi Vyacheslav,

Can you run update database statistics for all tables. You can use brtools to run update database statistics.

Post this, check the results.

Regards,

Deepak Kori

Former Member
0 Kudos

Hi Mark,

Hi Deepak,

Basis-specialists contended that AS ABAP (CI+DI) settings of application servers are identical and settings of DB Oracle are identical. In newer system SGA and PGA are increased significantly.

Migration was performed by heterogeneous system copy from O/S Solaris to O/S Linux.

Patching OS Linux for DB Oracle was performed according to SAP recommendations.

Oracle version is 10.2.0.4.0. Patch level is the same in both systems.

After migration all statistics were updated: both system-statistics and all table-statistics.

Former Member
0 Kudos

Hi,

Is it possible to paste the spfile.ora or if you are not using spfile the init<SID>.ora

What's your disk array ?

Regards

Mark