on 11-21-2013 2:36 PM
Hello,
The problem described here is different than the one described here :
http://scn.sap.com/thread/3165911
So I created a dedicated thread.
Even though the consequences are the same (namely : the diagnostic agents, are eating all the CPU)
In our case the SMD agents do not appear as top CPU processes, but the system CPU utilization is Huge.
We had the problems for months, we thought it was a hardware problem ... until today, when I switched off the Diagnostic agents
We are running Sap Solution manager 7.1 SPS6 with several other SAP production systems on our HP-UX Superdome.B.11.31 U ia64. SAP Diagnostic Agent "On The Fly" feature has been activated
We are dealing wit huge CPU performance issues, on average 30% of the system CPU was used by "something"
We have opened SAP customer calls, and HP-UX calls, performed firmware update without any tangible result.
I did not think the DIA Agent were responsible since they were not listed in top cpu processes,
in transaction st06, but I am starting to think that CPU system usage is not reflected here :
Anyway, after reading that thread I thought I would try to stop the diagnostic Agent :
Before Stopping the Diagnostic Agent
After Stopping the diagnostic Agent
I think the screenshots speak for themselves, the bloody Diagnostic Agents were eating all our CPU
(USer and System CPUs)
Bravo to me, I have solved our CPU bottleneck ... but our Production systems are no longer monitored, anything can happen now, (DB,SAP,Unix,Application) ....we won't be alerted, which might be a problem to explain to the business
Our SAP Host Agent PAtch Number has been updatdd to 168
Our SOL LM-Service patch level is SPS6 patch 2 , SAP asked us to patch it to level 3, I will do that, and I am quite certain it will solve nothing, based on what is said in the previous post.
Anyone from SAP reading this thread?
Thanks and Regards
Hello Raoul,
I have encountered this kind of unusually high CPU consumption due to Diagnostics Agents from time to time. Probably caused by jstart.exe process.
The very first thing I would do : updating LMSERVICE06P to the latest patch level.
NB : patch 5 is available.
Most of the issues I have experienced with RCA/MAI on different SP level were finally solved by a LMSERVICE or Wily patch.
Only problem is that sometime I had to wait for many weeks for the proper patch 😉
Worth a try at least...
Sebastien
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
We upgraded to the very latest LM-Service SP3
Have you checked your ulimit settings for the agent user id (like smdadm)? SAP suggested
these
http://wiki.scn.sap.com/wiki/display/SMSETUP/Diagnostics+Agent+Troubleshooting
(look for ulimit)
We tried this on our test server which was running high CPU for the agents but its behaving better now . Currently making the same change to our assurance box.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello,
I had the autorisation from the business team to restart the DIA agents this week ;
Thoses are the changes I have done beforehand:
We noticed some improvements after restarting the Dia Agents :
When the SMD Agents are down CPU Idle is around 40-60 %, when SMD agents are up and running CPU Idle is now around 15-35 %,
It means that we can let the DIA Agents run during period of low/moderate business activity, but we will have to shut down the Dia Agents during month end.
I am obviously still trying to improve things with SAP Support, but the problems seems complex,
when having a look at the top CPU processes (glance, top, ST06) the da1adm processes are not displayed as top consumers, which would suggest that the Agents are not directly causing the high CPU usage but rather the calls that hey make.
FYI, here is a screenshot of table E2E_RESOURCES
Thanks
Hello All,
FYI.
SAP Support ahs analyzed the latest thread dumps and they suggest the following :
In order to isolate the problem, I kindly ask you to temporarily disablethe e2dcc dbinfo job.
You can do this by updating the following property from Agent
Administration, application com.sap.smd.agent.application.e2edcc, for
the concerned agents (main agent + agents on-the-fly):
job.dbinfo.disable = true
I am still waiting for the Business Team to give me the approval to restart the Diagnostic Agents, so that I can test the new settings. I will let you know
Have a nice day
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Raoul,
We get a similar situation after a solman upgrade, for us the problem was the configuration of table E2E_RESOURCES.
Can you show us the content of that table ?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Raoul
You should really go forward through SAP support for this. It's good that you notify the community of potential problems. Based on this discussion for example, I've asked infrastructure to check if we see strange behaviour like this here but I'm not so sure because it can be very case specific - bound to combinations of OS / params / software / ...
I assume that this is not general behaviour though. SAP cannot possibly test every combination that exists because there are too many elements involved, too many combinations that can be made.
If you're not getting anywhere with SAP support in the end, I can try to get hold of the right persons who would be interested in this but normally you should get there through SAP support also.
Best regards
Tom
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Tom,
I have indeed opened a sap customer call, last week :
SAP Support asked me to generate a thread dump of the DIA Agent, I sent them the results,
Here is another BEFORE/AFTER comparison, when I started the SMD agent last week, in order to generate the dumps , it is quite impressive
BEFORE :
Moderate User CPU utilization
Very Low System CPU utilization
CPU Idle is High
AFTER :less than 15 minutes atfer the first screenshot
Very High User CPU utilization
High System CPU utilization
CPU Idle is almost null
Phone is ringing , users are yelling :" SAP is Slow !"
Each one of the 10 CPUs on the server is impacted .. and we only have 7 Diagnostic agents
Hi Raoul
I've received feedback from our infrastructure support team who have monitored the DIA agents because I notified them about your thread discussion and we don't seem the same behaviour. CPU usage is very low of the DIA agents here.
We also have DIA agents on the fly mechanism set up and running. If you want I can get the technical details of the combo that is in use.
Best regards
Tom
Please collect cpu utilization per core with any OS tool (like sar). Also at the same time collect output of vmstat command. Attach results to message.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
81 | |
9 | |
9 | |
7 | |
7 | |
6 | |
6 | |
6 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.