on 07-09-2013 11:19 PM
Ok, first the details. Environment is Sun Solaris, Oracle, SM7.1 sp7, EM 9.1. latest SMD agents. EM has been installed and running for several months, this is a new installation. Solman running for years, SP7 for a few months now.
We can call up Introscope Webview and log in with standard ID, interface works. Enterprise Manager starts and runs with no issues.
Problem - for some reason last week EM disconnected itself from SolMan and now we cannot get Solman to see the running instance of EM again. Managed System config fails now because EM is not seen, can't finish configuring some systems. EWA reports are coming in gray because of missing Introscope metrics. Currently have 16 systems configured through Managed Systems Config. All were reporting in quite nicely till noon Friday. No system outages occurred that day, had restarted Solman 2 days prior to bump up Shared Memory setting to 300m to avoid short dumps because of monitoring activity. Xmx and Xms set to 2048 in lax file (were 1024, I increased it to see if the error would go away. It did not).
Stopped/started EM several times, stopped/started SMD agents, short of rebooting Solman (production system, not easy to get time slice to do that), I'm not sure where to look any more.
Not really seeing any errors in EM logs. As far as they're concerned, it's running fine. There are SOME metrics being reported in to Introscope, not sure why some get there and others not. EM Self Monitoring screen in Introscope shows 137 agents connected (about right), 2,626 metrics (seems low). So the agents ARE getting there, but something is still blocking the connection.
Seeing "failed to bind to server socket... Address already in use" in the Introscope log, checked those port and nothing else seems to be using it (8081:6001). When I stop EM, the 6001 entry goes away.
Also seeing "Error accessing to Enterprise Manager (socketTest) ... java.net.ConnectException: Connection refused" on the app server that EM is running on, but that's a symptom of the real problem and not really useful (to me, anyway).
I've gone through a lot of the posts here already, tried various things, getting close to reboot time. Any suggestions to try before I have to go that route?
Thanks.
Bernie
Unbelievable. Today mysteriously port 6001 decides to start working again and EM is getting metrics ... and of course "no one did anything"..
400 agents and 355k+ metrics. Looks like we're back to "normal". Thanks all for all the suggestions - next time I'll go corral a network guy first.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Can you attach full log files with errors (from SolMan and EM)?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
zip file attached - the seven10.txt file is the log from July 10 for Introscope. The other is the App server log. I had delete the first portion of the trc file to get the upload to accept the size.
Tried telnetting to server 6001 , also got error - connection refused. Yet nothing seems to be using that port.
Found this in the defaultTrace.18 zip, is this the SMD agent for SolMan?
#1.5 #00144F879C2400600000864A000016550004E12A3C3D6B59#1373471422376#com.sap.engine.services.httpserver.server.Log##com.sap.engine.services.httpserver.server.Log#J2EE_GUEST#0##6DE757BFE97811E2C55800000018FB1E#6de757bfe97811e2c55800000018fb1e-0#6de757bfe97811e2c55800000018fb1e#SAPEngine_Application_Thread[impl:3]_22##0#0#Info#1#/System/HttpAccess/Access#Plain###10.70.71.161 : POST /GRMGHeartBeat/EntryPoint HTTP/1.1 200 3927 [97] d[100] c[16192256]#
#1.5 #00144F879C240030000083A1000016550004E12A3D2246B9#1373471437375#com.sap.smd.server##com.sap.smd.server#SMD_ADMIN#27498##E382A6F7E97711E2C0E900000018FB1E#e382a6f7e97711e2c0e900000018fb1e-0#e382a6f7e97711e2c0e900000018fb1e#SAPEngine_Application_Thread[impl:3]_1##0#0#Error##Plain###[SMDManager.registerPendingAgent] Receive registration for an already existing entry. Registration REJECTED
Existing entry :
AgentHandleEntry: com.sap.smd.SMDManager$AgentHandleEntry@1c33f25e
JNDI key : fssprmap03_SMD_SMDA97@2013.07.10-10.43.01.913
Server Name : fssprmap03
ID : fssprmap03_SMD_SMDA97
CanonicalHostName: fssprmap03
HostName : fssprmap03
Address : 10.70.74.150
AgentHandleWrap : com.sap.smd.local.AgentHandleWrapper@74a508f7
Incoming entry:
AgentHandleEntry: com.sap.smd.SMDManager$AgentHandleEntry@7a3c43cd
JNDI key : null
Server Name : fssprmap03
ID : fsssx4050_SMD_SMDA97
CanonicalHostName: fsssx4050
HostName : fsssx4050
Address : 10.70.74.142
AgentHandleWrap : com.sap.smd.local.AgentHandleWrapper@4a9225d6#
#1.5 #00144F879C240030000083A2000016550004E12A3D224BBD#1373471437376#com.sap.smd.server##com.sap.smd.server#SMD_ADMIN#27498##E382A6F7E97711E2C0E900000018FB1E#e382a6f7e97711e2c0e900000018fb1e-0#e382a6f7e97711e2c0e900000018fb1e#SAPEngine_Application_Thread[impl:3]_1##0#0#Error##Java###Agent Registration failed
[EXCEPTION]
{0}#1#com.sap.smd.server.manager.SMDException: [SMDManager.registerPendingAgent] Receive registration for an already existing entry. Registration REJECTED
Existing entry :
AgentHandleEntry: com.sap.smd.SMDManager$AgentHandleEntry@1c33f25e
JNDI key : fssprmap03_SMD_SMDA97@2013.07.10-10.43.01.913
Server Name : fssprmap03
ID : fssprmap03_SMD_SMDA97
CanonicalHostName: fssprmap03
HostName : fssprmap03
Address : 10.70.74.150
AgentHandleWrap : com.sap.smd.local.AgentHandleWrapper@74a508f7
Incoming entry:
AgentHandleEntry: com.sap.smd.SMDManager$AgentHandleEntry@7a3c43cd
JNDI key : null
Server Name : fssprmap03
ID : fsssx4050_SMD_SMDA97
CanonicalHostName: fsssx4050
HostName : fsssx4050
Address : 10.70.74.142
AgentHandleWrap : com.sap.smd.local.AgentHandleWrapper@4a9225d6
As for your seven10 file the only thing that stands out, from what I saw, are the dashboards errors which just could mean you have missing files.
Have you tried just restarting the Introscope service on the server?
Does 'telnet IP.add.re.ss 6001' provide any useful information, if successful should just be a blank terminal and after about 10 secs a connection lost message?
What versions are the LM* components on your Java side currently at?
Have you tried just restarting the java side?
No, the agent listed there is for a different system. Someone installed it with the virtual host name instead of VIP name, now it's running on a different host and causing problems.. Grrr...
Netstat 6001 gives "connection refused". Netstat 8081 connects just fine.
Have not tried just restarting Java, may give that a shot.
LM Tools 7.02 SP11 (1000.7.02.11.0.20120216212322
LM Services 7.10 SP7 (1000.7.10.7.1.20121219150800) as of 02/13/13
LM Tools has SP13 patch 2
LM Service has SP08
However knowing that 6001 doesn't work in your system, and someone else installing the EM; at this point it may be in your interest to install the EM. 6001 is how the SMD agents send data/stats in, 8081 is the interface that is used to view the data.
I can imagine that your system is going to be grumpy if you try to remove the EM; you may have to install it again with different ports, go into SolMan and re-assign all of the SMD agents to the new installation, then you can uninstall/reinstall to fix the 6001 issue, then reassign every agent, uninstall the temp EM setup.
User | Count |
---|---|
80 | |
9 | |
9 | |
7 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.