on 08-24-2009 1:20 PM
Hi all.
I have a very weird problem with a test SAP system after it was refreshed from PROD copy. Many users are complaining that their transactions are geting hang. And indeed I see in SM66 their processes running for a long time and then time out. Some of the users are calling Workflow transactions which takes ~50 min to get a response . Some other are running some customized programms which run some select queries from Z tables.
However, in ST06 the CPU is continiously 100% idle (IBM p6 Series, 24 CPUs Application Server, 4 CPUs the database server - different LPARs) . Total memory 72 GB and 50GB is always free. It is amazing , it is such a powerfull machine and users are complaining.
I checked in ST03n and I see long wait times for RFCc.
Could you please though an idea where else I could search for hints? I am running out of ideas. I have checked all possible logs, transactions (no dumps, SM21 clean, O/S IBM AIX wp_disp,dev_w* showed no errors.)
Thanks in advance.
Loukas
Any hints in ST02? SAP or database memory parameters configured much too low perhaps? Where did you get your current parameters from?
What is your database by the way?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks guys for the replies.
Database is Oracle 10.2 which by the way after the refresh patched/upgraded to 10.2.0.4 (PatchSet3).
If the patch would be the issue then the transactions would be also slow on the second test machine which everything runs as expected.
ST03n shows :
~28.000ms response time for DIA
~7.000ms response tiem for RFC
~3.000ms response time for BTC/AutoABAP
The rest are about 1.000ms or less.
Long "Roll Wait Time" also for RFC ~6.300ms
Regardins indexes I have compered the tables+indexes of the main transactions which are impacted between the 2 systems using the same PROD copy and they are identical.
I have not run SQL Traces yet . They are deactivated . I should activate and then deactivate them at the time that the users are running the transactions. It is a bit difficult to synchronize as the users are mainly in France and Spain, I am in Germany.
The parameters are the same as on our PROD SAP system as it is recently refreshed. Test system with the issue is the same HW like the prod one:
Appl Server 24CPUs / 72GB RAM
DB Server 4 CPUs / 26 GB RAM
DB statistics are running every day and they are successful. No missing indexes etc.
how could I check the compiling by the way?
Thanks again,
Loukas
Edited by: Loukas Rougkalas on Aug 24, 2009 2:56 PM
I have done it already:
CPU User% Kern% Wait% Idle% Physc Entc
ALL 2.1 2.2 0.0 95.7 0.14 4.5
PAGING MEMORY
Faults 6 Real,MB 71680
Steals 0 % Comp 17
PgspIn 0 % Noncomp 16
PgspOut 0 % Client 16
PageIn 0
PageOut 0
Sios 0
PAGING SPACE
Size,MB 65536
% Used 0
% Free 100
Name PID CPU% PgSp Owner
disp+wor 2019352 0.9 29.1 st2adm
saposcol 1986642 0.2 3.9 st2adm
topas 2023494 0.1 3.0 st2adm
PatrolAg 618708 0.1 22.1 patrol
snmpmagt 770066 0.0 3.5 patrol
p_ctmag 2175074 0.0 0.4 root
java 475370 0.0 44.5 root
igspw_mt 2093082 0.0 16.1 st2adm
igsmux_m 1220840 0.0 10.4 st2adm
igspw_mt 2097178 0.0 16.1 st2adm
rt-fcpar 360596 0.0 0.4 root
gil 249978 0.0 0.9 root
rt-fcpar 471216 0.0 0.4 root
java 647370 0.0 33.4 st2adm
prole 352280 0.0 3.0 root
gwrd 745674 0.0 6.9 st2adm
sshd 2183354 0.0 0.7 lrougka
rtcmd 303136 0.0 1.1 root
sendmail 291014 0.0 1.0 root
nmon12e_ 2113560 0.0 4.8 root
Edited by: Loukas Rougkalas on Aug 24, 2009 3:20 PM
From topas:
Disk Busy% KBPS TPS KB-Read KB-Writ
hdisk1 2.0 22.0 5.0 0.0 22.0
hdisk0 1.0 22.0 5.0 0.0 22.0
dac1 0.0 0.0 0.0 0.0 0.0
dac1utm 0.0 0.0 0.0 0.0 0.0
dac2 0.0 0.0 0.0 0.0 0.0
dac2utm 0.0 0.0 0.0 0.0 0.0
dac3 0.0 0.0 0.0 0.0 0.0
dac3utm 0.0 0.0 0.0 0.0 0.0
hdisk2 0.0 0.0 0.0 0.0 0.0
hdisk3 0.0 0.0 0.0 0.0 0.0
hdisk4 0.0 0.0 0.0 0.0 0.0
hdisk5 0.0 0.0 0.0 0.0 0.0
hdisk6 0.0 0.0 0.0 0.0 0.0
hdisk7 0.0 0.0 0.0 0.0 0.0
hdisk8 0.0 0.0 0.0 0.0 0.0
hdisk9 0.0 0.0 0.0 0.0 0.0
hdisk10 0.0 0.0 0.0 0.0 0.0
hdisk11 0.0 0.0 0.0 0.0 0.0
ST06 is not giving any statistics on all our systems.
ST03n shows :
~28.000ms response time for DIA
~7.000ms response tiem for RFC
~3.000ms response time for BTC/AutoABAP
Judging by these, there must be something wrong with your settings... have you compared the db settings against your PRD system?...
Also, you can do an SQL trace via ST01 and see what sort fo response you're getting from the statements.
Regards
Juan
Sorry my mistake:
After the system refresh only the application server parameters are the same like PROD . The DB parameters remane the original ones of the test system.
However, we perfromed another system refresh back in June on this test system and the system worked fine afterwards, with the application parameters overwritten by PROD and the DB parameters remained the original ones.
I run a SQL trace in ST01 and I have some high "Lasts (us)" on the 3rd column:
14:31:17:373 SQL 5861,540 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 8,434 Ret.Value: 1,403
14:31:23:236 SQL 4 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 8,434 Ret.Value: 0
14:31:23:236 SQL 5903,161 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 8,434 Ret.Value: 1,403
14:31:29:140 SQL 4 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 8,434 Ret.Value: 0
14:31:29:140 SQL 5873,093 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 8,434 Ret.Value: 1,403
14:31:35:15 SQL 3 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 8,434 Ret.Value: 0
14:31:35:15 SQL 5895,909 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 8,434 Ret.Value: 1,403
14:31:40:913 SQL 4 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 8,434 Ret.Value: 0
14:31:40:913 SQL 5865,997 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 8,434 Ret.Value: 1,403
14:31:46:780 SQL 3 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 1,739 Ret.Value: 0
14:31:46:780 SQL 562 ZAOPTD_INV_HDR Prog: ZAOP_CL_GDBL_BUFFER===========CP Row: 1,739 Ret.Value: 1,403
-
Here is also the execution plan:
SELECT
DISTINCT "GUID"
FROM
"ZAOPTD_AST_HDR"
WHERE
"MANDT" = :A0 AND "OBJID" = :A1 AND "IS_DELETED" = :A2#
Execution Plan
SELECT STATEMENT ( Estimated Costs = 1,576 , Estimated #Rows = 1 )
5 3 HASH UNIQUE
( Estim. Costs = 1,576 , Estim. #Rows = 1 )
Estim. CPU-Costs = 86,771,521 Estim. IO-Costs = 1,561
5 2 TABLE ACCESS BY INDEX ROWID ZAOPTD_AST_HDR
( Estim. Costs = 1,575 , Estim. #Rows = 1 )
Estim. CPU-Costs = 80,717,708 Estim. IO-Costs = 1,561
1 INDEX RANGE SCAN ZAOPTD_AST_HDR~01
( Estim. Costs = 1,574 , Estim. #Rows = 1 )
Search Columns: 2
Estim. CPU-Costs = 80,716,218 Estim. IO-Costs = 1,561
Access Predicates Filter Predicates
Edited by: Loukas Rougkalas on Aug 24, 2009 4:44 PM
Edited by: Loukas Rougkalas on Aug 24, 2009 4:45 PM
Edited by: Loukas Rougkalas on Aug 24, 2009 4:48 PM
Your memory, CPU and disk are having nominal values.
It looks like main chuck is in the dialog response time.
In ST03, inorder to seperate whether the issue lies in SAP or in Database, list out what the DB response time for the equallent dialog response time ~28000ms.
Also if you can see the top dialog response time in ST03N, which can give an indea what transaction involved, response time,network time.
If roll-wait time is more, which indicates it could be due to RFC communication between systems.
Is this system connected to any BI system, which can obviusly take much load?
Can you confirm whether the kernel, oracle patch, oracle client version remains same between your prod and this test system?
My other suspected areas are Invalid generation, SAPGUI, Network.
Go to SGEN=>Regenerate existing loads=>Only generate objects with invalidate load.
This will generate the objects in invalid state.
Was there any change happened in SAPGUI when compared to prior refresh?
Thanks Vijay for your response .
Regarding oracle patches, oracle client, kernel it is as following:
Kernel is identical like PROD.
Oracle Patch is 10.2.0.2 on PROD vs 10.2.0.4 on TEST. I am planning to upgrade the PROD as well in September.
Basically the same patch (10.2.0.4) has been applied on another test system (using the same PROD copy as the problematic one) where the transactions are much much faster.
Indeed there was an issue with SAP GUI. We had several users using SAP GUI 6.20 , some 6.40 and some had upgraded to 7.10.
Most of them have been advised to upgrade to 7.10, patch 13. But in which regard would a SAP GUI version impact the runnign transactions/sql queries?The slow responses have come from mixed users, mainly from the ones using 6.20 and 7.10.
By the way the SAP systems PROD+TEST are 4.7 EE SR2.0.
Our O/S guys have eliminated , together twith the NW people, any network problems between application and database server.
Last thing which I am going to do is to append the oracle parameters from initSID.ora from TEST and PROD.
I will also run the SGEN to check the invalid state of the objects.
Thanks anyway for the moment.
Rgds,
Loukas
Hello Loukas,
one more question: You upgraded the test system to Oracle 10.2.0.4. Did you also apply all the interim patches that are listed in SAP note 1137346 ?
There are quite a few among them that are designed to solve performance problems, especially Optimizer-Merge-Patch 8599814.
regards
Indeed very odd.
Now have you done an SQL trace to see if the indexes are been used properly?, also is it compiling?... Have you check the response times in ST03n?, have you run the statistics?...
What version is your DB?
Regards
Juan
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
87 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.