Production Stalling, Very large Wait times
We has a problem with our production ERP ECC 5 system yesterday, we are running EEC5 on Windows 2003 64 Bit with 24GB RAM, 8 CPU's, MS SQL 2000.
Yesterday afternoon we had an intermittent problem for about 1 hour, where SAP was freezing at the users PC in lots of transactions, I could see the system was slow from my own admin transactions, SM50 was showing lots of load programs and generally was very slow updating with about 25 dialog process showing 'running', and having having an action/report against them, the UPD process were also very slow at updating, normally we have a few dialog processes active and they change quickly as performance is pretty good..
The system returned to normal after about an hour, looking at ST03N I have found very large WAITTIME per dialog step times at the time we bad the issues, in expert mode under time profile, our usual average wait time per dialog step is about 0.5 MS, yesterday when we had the problem it was just over 2000 MS ! for that hour, total wait per dialog step time is normally about 7s, yesterday during the problem it was just over 24,000 S, this then dropped down to about 7s the next hour with average wait time per dialog step about 0.4 again, so back to normal. Average GUI time was also about 3 times the usual amount, but I guess this is a knock on from the large WAITTIME.
In ST06 at the same time CPU was about 90% idle 8% user and 2% system, we have 8 CPU's, so they were not being hammerred.
I have checked SM37 and there were no long running background jobs around the time that could have caused this.
A few months ago we went live in one of our big european countries and concurrent users on the system has gone from 100 users to about 180 users, yesterday when I looked at SM04 we had 172 users active in 240 sessions.
On the system we currently have 30 Dialog sessions, 7 UPD, 8 BGD, 2 SPO, 2UP2. Looking at the SAP help in ST03, it says large wait times are generally down to high CPU useage, ours looked ok at the time, also says could be the number of work processes, that was my thoughts also, as the dispatcher was stalling trying to allocate work processes, I remember from the ADM100 course SAP generally said about 7 users to 1 dialog work process as a rule of thumb, so I guess with 30 Dialogs we may need to increase that.
Memory usage was good, EM used about 5GB with 5GB free, ST06 was showing free memory at about 17GB
I spoke to our network team at the time and they said all was ok from their view.
Has anyone got any ideas what could cause this problem or an area I can look into further?
Thanks for any help.