on 10-17-2007 10:30 AM
Hi
We has a problem with our production ERP ECC 5 system yesterday, we are running EEC5 on Windows 2003 64 Bit with 24GB RAM, 8 CPU's, MS SQL 2000.
Yesterday afternoon we had an intermittent problem for about 1 hour, where SAP was freezing at the users PC in lots of transactions, I could see the system was slow from my own admin transactions, SM50 was showing lots of load programs and generally was very slow updating with about 25 dialog process showing 'running', and having having an action/report against them, the UPD process were also very slow at updating, normally we have a few dialog processes active and they change quickly as performance is pretty good..
The system returned to normal after about an hour, looking at ST03N I have found very large WAITTIME per dialog step times at the time we bad the issues, in expert mode under time profile, our usual average wait time per dialog step is about 0.5 MS, yesterday when we had the problem it was just over 2000 MS ! for that hour, total wait per dialog step time is normally about 7s, yesterday during the problem it was just over 24,000 S, this then dropped down to about 7s the next hour with average wait time per dialog step about 0.4 again, so back to normal. Average GUI time was also about 3 times the usual amount, but I guess this is a knock on from the large WAITTIME.
In ST06 at the same time CPU was about 90% idle 8% user and 2% system, we have 8 CPU's, so they were not being hammerred.
I have checked SM37 and there were no long running background jobs around the time that could have caused this.
A few months ago we went live in one of our big european countries and concurrent users on the system has gone from 100 users to about 180 users, yesterday when I looked at SM04 we had 172 users active in 240 sessions.
On the system we currently have 30 Dialog sessions, 7 UPD, 8 BGD, 2 SPO, 2UP2. Looking at the SAP help in ST03, it says large wait times are generally down to high CPU useage, ours looked ok at the time, also says could be the number of work processes, that was my thoughts also, as the dispatcher was stalling trying to allocate work processes, I remember from the ADM100 course SAP generally said about 7 users to 1 dialog work process as a rule of thumb, so I guess with 30 Dialogs we may need to increase that.
Memory usage was good, EM used about 5GB with 5GB free, ST06 was showing free memory at about 17GB
I spoke to our network team at the time and they said all was ok from their view.
Has anyone got any ideas what could cause this problem or an area I can look into further?
Thanks for any help.
Hello ecnrip,
Check which task type have a high value for "wait time"
In ST03 select the day where you have see this value that you send
us.
Click on button dialogue, background, RFC,etc and verify each time the value
"wait time".
When you see the task type that make your high wait time.
Click on button "time" profile.
You will see when the problem appear in the column "wait time"
You can double click to see what was running when the problem appear.
It may be you have not enough WP of type "diag" or "background" or you can also
change the scheduling of some background if it is possible or add more WP if
you have enough resource on your system or reduce // processing.
Have you any CPU resource problem?
Regards,
Jafer
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Shantanu
I have not changed the work processes yet, as this is a production instance it is not easy to get downtime as we run 24*7 factories, so before I do anything I want to make sure it is correct (as much as you can !!!!)
I found this documentation from SAP :
http://help.sap.com/saphelp_nw04/helpdata/en/02/962817538111d1891b0000e8322f96/content.htm
Where it states for a 4 CPU windows platform, which we have, you should have 20-25 Work processes , I am more used to Unix where the rule of thumb was to use your concurrent users as a measure for amount of Dialog work porocesses, around 7 users to 1 work processes.
We have dual core multithread processors, so SAP/Windows actually recognise 8 CPU's, whereas we actually have 4 physical CPU's on the server, so I'm not sure what the SAP recommendatiuons for that is ???
Dear encrip,
SAP collects the hardware related information using SAPOSCOL who is simply collecting data from the Operating System. As far as memory or cpu is concerned this is where the information is coming from.
Hence, I think that if your OS recognizes 8 CPUS's then SAP will function accordingly and you should configure the number of work processes based on that.
As I said, the switch in work process numbers between dialog and background can also be done using Operation Modes and you do not need to restart the system for that.
I was wondering if you could try that.
Here is some info on that
http://help.sap.com/saphelp_nw70/helpdata/en/c4/3a5e4f505211d189550000e829fbbd/frameset.htm
Regards
Shantanu
Hi Shantanu
I have OP modes setup already, reducing dialogs and increasing BGD for the evening, I cannot really increase the DIA during the day as BGD are frequently used as they are setup, so I do not want to create a bottle neck on the jobs processing, we run quite a lot of frequent jobs as part of our daily operation, so I do not want to impact them if there are not enough BGD work processes.
We have had no repeat of the problem since Tuesday, so I am going to monitor performance over the next week, WAITTIME response has been between 0.2 and 0.8 per hour average, so is good.
When I look at the CPU times in SM50, the last 4 dialog processes all have times below 2 minutes utilisation, the system has been up since the start of September, so if there was a lack of dialog's I would expect their utilisation numbers to be higher as they would be used frequently.
I agree about the OS Collector, I just never realised on Windows that number of work proceesses was linked to the amount of CPU's, rather than the user base/memory which is more significent on Unix.
Let me clarify the requirement of dialog work processes in the system;
Normally when you size for the first time I would recommed to choose the dialog process numbers = 4 * number of CPU cores. In your case = 4 * 8 = 32
The problem that you have seen with dialog process bottleneck can happen because of many reason. First I will look at the 'User' and 'type' of running processes. Most of the times it may cause because of transactional RFCs.
In this case, you may look at your RFC resource configurations and adjust number of dialog processes assigned for RFCs.
The av. wait time for at least a month workload is a deciding factor for adjusting the number of work processes. Still, the question is whether you can increase the number of processes in the same system or you need an additional application server. Deciding factor here is the Av. CPU time for dialog tasks.
/Manoj
Hello encrip,
Also check if some Dialog processes had gone into PRIV mode at that time. As a result of some transaction accessing large amounts of data from the database.
This would result in those work processes not being able to multiplex and hence the effective available work processes reduces causing high wait time.
Regards
Shantanu
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hey,
In that case, my view is high wait time can only be caused due to insufficient number of work processes. Try increasing the number of work processes in the instance profile if you can afford to reboot the box, or else please try and configure operation modes and for day time, reduce the number of BG processes to 4 and number of Dialog WP's increase by 4. Although remember, total number of wp's should remain constant.
Hope this helps
Regards
Shantanu A Sardeshmukh
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.