on 01-24-2012 12:39 PM
Disastrous slowdown with u201Cequal-operatoru201D select from DBERCHZ1
Hello.
Please, advise.
We have migrated/cloned system to new powerful hardware. From the system start we experience noticeable slowdown with some programs. For example background batch billing jobs started from transaction EA38 lasts about 12 hours against 3 hours in previous older system. The background job uses standard ABAP-report RFKK_MASS_ACT_SINGLE_JOB. No significant customer-extensions are involved here. Using EA38 with debugging-mode (enter command u2018DBUGu2019) activates running report RFKK_MASS_ACT_SINGLE_JOB in dialog mode allowing tracing and debugging this process in detail. Observing this process weu2019ve noticed that most of the time it spends reading huge (1.3 billion records) table DBERCHZ1 with one single condition on key-field BELNR with operator u201Cequalu201D (u201C=u201D).
This is very important because next we have conducted some research running simple test program.
The test consists of preparing list of 5000 document numbers (BELNR), and then using this list in SQL-request to DBERCHZ1~BELNR (and some left joins) within FOR ALL ENTRIES. Also the ORACLE-hint (%_HINTS ORACLE '&max_in_blocking_factor ...&') was used to vary parameter rsdb\max_in_blocking_factor. When rsdb\max_in_blocking_factor is greater than 1, IN-operator used to split documents list. When rsdb\max_in_blocking_factor = 1, then EQUAL (u201C=u201D) operator used instead. Time of the SQL-request was measured several times in low-system-load period. Here is rsdb\max_in_blocking_factor VS time (approximate in seconds):
100 7,4 s
10 1,9 s
5 2,3 s
1 1400 s (!)
As you can see the dropdown is terrible when using EQUAL (u201C=u201D) operator in SQL-request.
We were lucky to carry out the same test in old system as it was not shut down yet. Even it is not high-loaded now the results are extremely significant:
100 2,7 s
10 12 s
5 2,3 s
1 26 s (!)
The execution plans are equal for both systems in case of rsdb\max_in_blocking_factor = 1. Yes, they varies in other cases, but it doesnu2019t have significant influence on the result. So older system is not as much affected from using EQUAL operator as new system is. This test shows that there are some problems in the new system.
We have spent long time discussing problem with our Oracle-basis-specialists, but they claim bad ABAP-code. Though tests show that problems are most likely in DB-side.
Can you advise/conjecture what settings could be involved in this problem? Why using EQUAL operator in SQL-request slows down the new system so much?
We were experiencing slowdowns with the program RFKK_MASS_ACT_SINGLE_JOB as well. We applied SAP Note 1587089 and it fixed our problems.
Mary Kathryn
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Mary,
Note 1587089 seems to be about some kind of dead locks when jobs get locked in wait status.
We have performance problem with specific SQL queries.
For now we bypassed problem implementing SAP Note 1538837. Actualy there is not so much in this Note - it just helps to avoid modifications of sandard functions providing enhacement spot ISU_BI_BILLING_SPOT. We used method ERCHZ_NO_JOIN from this spot to choose branch of the code (FM ISU_DB_ERCHZ_SELECT_BILL_NEW) containing SQL queries without JOIN statements joining results on the ABAP-side.
* >>>>> note 1538837
* Implementation of BAdI to change no_join without modification
TRY.
CALL BADI cl_isu_bi_exithandler=>isu_bi_cust_control->erchz_no_join
CHANGING
xy_no_join = no_join.
CATCH cx_badi_initial_reference
INTO l_badi_error. "#EC NO_HANDLER
ENDTRY.
* <<<<< note 1538837
* >>>>> note 934374
if no_join is initial.
perform db_erchz_select_join using x_belnr
changing dbt_erchz
dbt_erchz1.
else.
perform db_erchz_select using x_belnr
changing dbt_erchz
dbt_erchz1.
endif.
* <<<<< note 934374
Downgrading to such simple SQL query helps to avoid problems with database. On the other hand implementing JOIN-logic on the ABAP-side consumes more aplication server memory but we dont have problem with that.
But problem with specific SQL-queries is still there. It can be seen through my specific tests. Thereby I consider this topic is still unresolved. We bypassed one problem in one program but there is no guarantee that some other program could experience the same problems.
Edited by: Vyacheslav Cherchik on Feb 14, 2012 8:55 AM
Hello Vyacheslav,
1) Maybe the execution plan seems to be the same, but it is not (pushed down predicates, different access / filter predicates or join conditions)
2) Without any information about where the data come from (cache or disk) and its wait times it is not possible to provide any conclusion
3) Maybe the order of the data has changed due to sorted export (logical / physical I/O)
4) Maybe your OS is not configured probably (I/O scheduler for example) or your both operating systems where configured completely different (file system cache of OS was also used and so on).
Your case is very specific, it needs to take a look under the hood. I have specialized on such cases - check my profile / homepage if you need assistance.
Regards
Stefan
P.S.: Just out of curiosity .... why not using the default max_in_blocking_factor of 5?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Mark,
Hello Stefan,
I appreciate your participation!
Iu2019ve gathered requested information.
Here is the new systemu2019s spfile.ora received recently from basis-support-team:
*._cursor_features_enabled=10
*._fix_control='5099019:ON','5705630:ON','5765456:3','6221403:ON','6329318:ON','6399597:ON','6430500:ON','6440977:ON','6626018:ON','6670551:ON','6972291:ON','7325597:ON','7692248:ON','7891471:ON','9196440:ON','9495669:ON','6055658:OFF'
*._optim_peek_user_binds=FALSE
*._optimizer_mjc_enabled=FALSE
*._second_spare_parameter=1
*._sort_elimination_cost_ratio=10
*.background_dump_dest='/oracle/RT2/saptrace/background'
*.compatible='10.2.0'
*.control_file_record_keep_time=30
*.control_files='/oracle/RT2/origlogA/cntrl/cntrlRT2.dbf','/oracle/RT2/origlogB/cntrl/cntrlRT2.dbf','/oracle/RT2/sapdata1/cntrl/cntrlRT2.dbf'
*.core_dump_dest='/oracle/RT2/saptrace/background'
*.db_block_size=8192
*.db_cache_size=429496729600
*.db_files=1022
*.db_flashback_retention_target=2880
*.db_name='RT2'
*.db_recovery_file_dest='/oracle/RT2/oraflash'
*.db_recovery_file_dest_size=214748364800
*.event='10027 trace name context forever, level 1','10028 trace name context forever, level 1','10142 trace name context forever, level 1','10183 trace name context forever, level 1','10191 trace name context forever, level 1','10411 trace name context forever, level 1','10629 trace name context forever, level 32','10753 trace name context forever, level 2','38068 trace name context forever, level 100','38085 trace name context forever, level 1','38087 trace name context forever, level 1','44951 trace name context forever, level 1024','14532 trace name context forever, level 1'
*.filesystemio_options='SETALL'
*.log_archive_dest_1='LOCATION=/oracle/RT2/oraarch/RT2arch'
*.log_archive_format='%t_%s_%r.dbf'
*.log_buffer=1048576
*.log_checkpoints_to_alert=TRUE
*.max_dump_file_size='20000'
*.open_cursors=800
*.optimizer_index_cost_adj=20
*.parallel_execution_message_size=16384
*.parallel_max_servers=1600
*.parallel_threads_per_cpu=1
*.pga_aggregate_target=85899345920
*.processes=5840
*.query_rewrite_enabled='FALSE'
*.recyclebin='OFF'
*.remote_os_authent=TRUE
*.replication_dependency_tracking=FALSE
*.sessions=11680
*.shared_pool_size=8589934592
*.star_transformation_enabled='TRUE'
*.undo_management='AUTO'
*.undo_retention=43200
*.undo_tablespace='PSAPUNDO'
*.user_dump_dest='/oracle/RT2/saptrace/usertrace'
They didnu2019t send spfile for old system but Iu2019ve looked in both systems in transaction
DB02 \ Performance \ Additional Functions \ Database Parameters
A believe those are the same settings that are in file. Iu2019ve compared settings sets from DB2 and found difference between systems only in memory/cache etc. Here are differences Iu2019ve found:
Parameter NEW system OLD system
db_cache_size 429496729600 161061273600
db_flashback_retention_target 2880 1440
parallel_max_servers 1600 640
pga_aggregate_target 85899345920 25769803776
processes 5840 4880
sessions 11680 9760
shared_pool_size 8589934592 4294967296
Here is server configuration:
Server FUJITSU PRIMERGY RX900 S2 (CPU 80 Core, RAM 1TB), Hi-End Disk Array FUJITSU ETERNEUS DX8700, server is connected with array via FC, file system is ext3, RAID10, OS SLES11 SP1 + actual updates, DB ORACLE 10.2.0.4.
1) Execution plans are equal in both systems. This is sure. They differ only in estimation time/counts. But tree, order, indexes, joins, predicates, accesses are the same.
2) First run of test program (at the start of the workday for example) gets data without cache, but following runs use cache. Test time measuring was performed with a lot of runs with later averaging excluding influence of the cache and run sequence. I donu2019t know about wait times. Where can I check it?
3) The unsorted export was performed (default SAP-installer setting) so data order was not changed.
4) O/S SLES11 SP1 is configured according to SAP recommendations
About rsdb/max_in_blocking_factor.
It was used ONLY in %_HINTS ORACLE expression just to compare IN and EQUAL operators! When rsdb/max_in_blocking_factor = 1 it is equivalent to using SELECT witin LOOP with EQUAL operator.
The system value of this parameter is default u201C5u201D as recommended. And by the way my tests show that indeed value u201C5u201D is OK for most of the cases. Also test shows that using FOR ALL ENTRIES is better than LOOP about 10 times (in our OLD system, from 2.3 seconds to 26 seconds). At the same time in NEW system performance decrease is dramatic (from 2.3 seconds to 1400 seconds). And that I believe reveals some problems with DBERCHZ1 table in our NEW system.
To brush any doubts about rsdb/max_in_blocking_factor I rewrote test program without using this parameter. Just a LOOP with SELECT with EQUAL operator. And the results are the same.
Stefan, thanks for your assistance offer. I will notify our experts and show them your contacts.
Hi, Michael
I looked at u201CDB02 \ Space \ Segments \ Detailed Analysisu201D for DBERCHZ1~0 segment. Tab u201CStorageu201D:
NEW OLD NEW/OLD
Space
Allocated space (Kb) 54641344 70584320 0,77
Allocated blocks 6830168 8823040 0,77
Allocated extents 838 1100 0,76
Extent Structure
Initial extent (Kb) 1740800 1740800
Next extent (Kb) 0 0
Min extents 1 1
Max extents 2147483645 2147483645
PCT increase (%) 0 0
Block Structure
Block size (Kb) 8 8
PCT free (%) 10 10
Transactions initial 2 2
Transactions maximum 255 255
Header minimum (Byte) 107 107
Data maximum (Byte) 5926 5926
Contents
*Block levels 3 3
*Leaf blocks 6829471 8818199 0,77
*Distinct keys 1388697000 1296882000 1,07
*Avg. leaf blocks per key 1 1
*Avg. data blocks per key 1 1
*Clustering factor 248633000 160852000 1,55
Freelist Stats
Process freelists 0 0
Freelist groups 0 0
Degree (Parallel query) 1 1
Instances (Parallel query) 1 1
Other
Index type NORMAL NORMAL
Index status VALID VALID
Uniqueness UNIQUE UNIQUE
There is difference between systems: 248633000 (NEW) against 160852000 (OLD). Could it affect so much on performance? What really is u201Cclusteringu201D? Is it some kind of file fragmentation? What arrangements should be made?
Hello Vyacheslav,
1) You only take a look at the ST05 execution plan via SAP. I would like to check the execution plans with DBMS_XPLAN and the options "ALL ALLSTATS OUTLINE PEEKED_BINDS LAST".
2) Perform a SQL trace with wait statistics to get the needed details or execute the SQL in SQL*Plus and perform an "AUTOTRACE TRACEONLY" to get an overview for each execution.
3) Ok, but even if you perform an unsorted exported the "physical" data order (can) change - as you can see in your posted index statistics. How many rows are in the table?
4) But you are talking about database performance and not SAP right now. Here is an article about the I/O scheduler for databases for example: http://www.redhat.com/magazine/008jun05/features/schedulers/
Regards
Stefan
Is it some kind of file fragmentation?
The clustering factor is the number of blocks i would have to read if i read all rows through an index. It represents the relation of the ordering of the table's rows compared to an index.
What arrangements should be made?
To get a better clustering factor one would have to reorg the table with that sort options. Regarding the size of your table, this is not yet an option.
The clustering factor is always larger than the number of the index leaf blocks (this would be best, and means the table data is sorted the same way as the index). The upper limit is the number of rows.
The table data order is relevant when doing index range scans.
See note [832343 - FAQ: Clustering factor|https://service.sap.com/sap/support/notes/832343] for a full explanation.
Could it affect so much on performance?
Well the clustering factor is higher than before, this certainly means more buffer gets somewhere around factor 1.5. But looking at your data, there could be other reasons adding to the problem.
Lets continue with the steps Stefan recommends and keep the clustering factor on the backside of our heads for the moment.
Cheers Michael
Hello Michael,
The clustering factor is always larger than the number of the index leaf blocks (this would be best, and means the table data is sorted the same way as the index). The upper limit is the number of rows.
i guess this is just a typo ... you mean if the clustering factor is the same as the amount of data blocks (of the corresponding table) - this would be the best )
Regards
Stefan
Hi,
what esle changed when you migrated?
Also what's your SAP ReleaseOracle version O/S etc
Did Oracle get patched Patching ?
Mark
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hereu2019s test ABAP-SQL:
select
...< about 30 fields >
from
dberchz1
left join dberchz2 on dberchz2~belnr = dberchz1~belnr and
dberchz2~belzeile = dberchz1~belzeile
left join dberchz3 on dberchz3~belnr = dberchz1~belnr and
dberchz3~belzeile = dberchz1~belzeile
left join dberchr on dberchr~belnr = dberchz1~belnr and
dberchr~belzeile = dberchz1~belzeile
into corresponding fields of table __erchs_st
for all entries in __erchs u201C(5000 entries)
where
dberchz1~belnr = __erchs-belnr and
%_HINTS ORACLE l_hint.
Hereu2019s test SQL-statement from trace-view (ST05) in case of u201Crsdb\max_in_blocking_factor = 1u201D:
SELECT
< about 30 fields >
FROM
"DBERCHZ1" T_00 LEFT OUTER JOIN "DBERCHZ2" T_01 ON T_01 . "MANDT" = :A0 AND T_01 . "BELNR" = T_00
. "BELNR" AND T_01 . "BELZEILE" = T_00 . "BELZEILE" LEFT OUTER JOIN "DBERCHZ3" T_02 ON T_02 . "MAN
DT" = :A1 AND T_02 . "BELNR" = T_00 . "BELNR" AND T_02 . "BELZEILE" = T_00 . "BELZEILE" LEFT OUTER
JOIN "DBERCHR" T_03 ON T_03 . "MANDT" = :A2 AND T_03 . "BELNR" = T_00 . "BELNR" AND T_03 . "BELZE
ILE" = T_00 . "BELZEILE"
WHERE
T_00 . "MANDT" = :A3 AND T_00 . "BELNR" = :A4
Execution Plan for u201Crsdb\max_in_blocking_factor = 1u201D:
SELECT STATEMENT ( Estimated Costs = 6 , Estimated #Rows = 10 )
11 HASH JOIN OUTER
( Estim. Costs = 6 , Estim. #Rows = 10 )
Estim. CPU-Costs = 10.469.722 Estim. IO-Costs = 4
Access Predicates
8 HASH JOIN OUTER
( Estim. Costs = 4 , Estim. #Rows = 10 )
Estim. CPU-Costs = 6.983.584 Estim. IO-Costs = 3
Access Predicates
5 HASH JOIN OUTER
( Estim. Costs = 3 , Estim. #Rows = 10 )
Estim. CPU-Costs = 3.499.188 Estim. IO-Costs = 2
Access Predicates
2 TABLE ACCESS BY INDEX ROWID DBERCHZ1
( Estim. Costs = 1 , Estim. #Rows = 10 )
Estim. CPU-Costs = 13.452 Estim. IO-Costs = 1
1 INDEX RANGE SCAN DBERCHZ1~0
( Estim. Costs = 1 , Estim. #Rows = 10 )
Search Columns: 2
Estim. CPU-Costs = 6.137 Estim. IO-Costs = 1
Access Predicates
4 TABLE ACCESS BY INDEX ROWID DBERCHZ2
( Estim. Costs = 1 , Estim. #Rows = 2 )
Estim. CPU-Costs = 7.727 Estim. IO-Costs = 1
3 INDEX RANGE SCAN DBERCHZ2~0
( Estim. Costs = 1 , Estim. #Rows = 2 )
Search Columns: 2
Estim. CPU-Costs = 5.817 Estim. IO-Costs = 1
Access Predicates
7 TABLE ACCESS BY INDEX ROWID DBERCHR
( Estim. Costs = 1 , Estim. #Rows = 4 )
Estim. CPU-Costs = 6.387 Estim. IO-Costs = 1
6 INDEX RANGE SCAN DBERCHR~0
Search Columns: 2
Estim. CPU-Costs = 3.049 Estim. IO-Costs = 0
Access Predicates
10 TABLE ACCESS BY INDEX ROWID DBERCHZ3
( Estim. Costs = 1 , Estim. #Rows = 8 )
Estim. CPU-Costs = 8.129 Estim. IO-Costs = 1
9 INDEX RANGE SCAN DBERCHZ3~0
( Estim. Costs = 1 , Estim. #Rows = 8 )
Search Columns: 2
Estim. CPU-Costs = 6.017 Estim. IO-Costs = 1
Access Predicates
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Mark,
Hi Deepak,
Basis-specialists contended that AS ABAP (CI+DI) settings of application servers are identical and settings of DB Oracle are identical. In newer system SGA and PGA are increased significantly.
Migration was performed by heterogeneous system copy from O/S Solaris to O/S Linux.
Patching OS Linux for DB Oracle was performed according to SAP recommendations.
Oracle version is 10.2.0.4.0. Patch level is the same in both systems.
After migration all statistics were updated: both system-statistics and all table-statistics.
User | Count |
---|---|
88 | |
10 | |
10 | |
9 | |
7 | |
7 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.