J2EE systems suddenly stopped talking SLD
Suddenly, on April 16, early in the AM, J2EE communication from multiple servers to a central SLD passed away of unknown causes... sniff..
Ok, not so dramatic, however..we've had multiple j2ee systems (55+, dual and single stack) reporting in to a central SLD for a couple of years now, so we can probably assume that configuration isn't the issue.
All abap systems (dual or single stack) continue to sucessfully update SLD. 3 J2EE systems continue to work as well.
Nothing was done by infrastructure or Solman config teams on that date (yes, "nothing was done" is an overused term but seems to be true in this case).
Firewall rules weren't changed, data flows on that port between managed systems and SM.
The servers all have a lot of ports in Close_wait state, pointing at SM:port. I've run stopsap/startsap on several of the constipated servers, but this has had no effect on anything getting through. Not sure if this is relevant or just another red herring.
Oh, and because we're on Solaris, there's no handy dandy command to kill close_wait port statuses (stati?)
PSAPTEMP tablespace on SM had filled up at some point and we were receiving multiple "queue has reached its limit" errors on SLD. We added 20gb to psaptemp. I'm wondering if this may have inelegantly killed a lot of threads which are now in close_wait status.
We can log on to Visual Admin console on those servers and push an update, it gives us a succesful completion message, but no updates get through to SM SLD.
We don't see any real hints while looking through logs, SM doesn't provide anything because none of the attempts seem to be getting that far.
All of the systems are up and running, active, everything seems to be working other than the SLD update.
I've seen several messages here on SCN about similar problems, but they all seemed to be configuration related - since ours have been working for the better part of 2 years (or longer), I don't see that as an issue.
Short of starting to rebooting servers all the way up to SolMan (probably close to 70 vms!!!) , any ideas on where to look? This is starting to become an issue. We have upgrades going on, and without SLD updates, LMDB and SMSY are getting stale and MOPZ is having issues.