Solved: J2EE nodes restart due to missed broadcast

Former Member · ‎06-09-2011

We have 3 XI application instances with one j2ee server node on each instance. Two are on same server (say server A), one on a different server (say server B) and we have SCS, DB etc on a third server (say C).

We have an issue with J2EE nodes restarting occasionally due to missed message broadcast.

[Framework -> criticalShutdown] Missed broadcast due to network problems. Restart is required (advanced Reconnect)
Jun 8, 2011 1:32:07 PM              com.sap.engine.core.Framework [Thread[Thread-559731,5,SAPEngine_System_Thread[impl:5]_Group]] Fatal: Critical shutdown was invoked.
Reason is: Missed broadcast due to network problems. Restart is required (advanced Reconnect)
[Framework -> criticalShutdown] Exiting Listener Loop. This requires a restart of the node. Possible reason is an interrupted reconnect session to the message server.
Jun 8, 2011 1:32:07 PM              com.sap.engine.core.Framework [SAP J2EE Engine|MS Socket Listener] Fatal: Critical shutdown was invoked. Reason is: Exiting Listener
 Loop. This requires a restart of the node. Possible reason is an interrupted reconnect session to the message server.

dev_server0 shows

[Thr 109283] JLaunchIExitJava: exit hook is called (rc=-333)
[Thr 109283] **********************************************************************
*** ERROR => The Java VM terminated with a non-zero exit code.
*** Please see SAP Note 940893 , section 'J2EE Engine exit codes'
*** for additional information and trouble shooting.
**********************************************************************
[Thr 109283] SigISetIgnoreAction : SIG_IGN for signal 20
[Thr 109283] JLaunchCloseProgram: good bye (exitcode=-333)

SCS dev_ms shows

[Thr  1] Wed Jun  8 13:32:02 2011
[Thr  1] ***LOG Q0I=> NiPWrite: writev (32: Broken pipe) [niuxi_mt.c 1174]
[Thr  1] *** ERROR => MsSRead: NiBufReceive (rc=NIECONN_BROKEN) [msxxserv_mt. 9334]

[Thr  1] Wed Jun  8 13:32:03 2011
[Thr  1] *** ERROR => MsSClientHandle: MsSRead C3 (J2EE3233250), MSEINTERN [msxxserv_mt. 3814]
[Thr  1] MsJ2EE_AddDisconnectedNode: add node [3233250] into disconnect list
[Thr  1] MsSSuspendCheck: C7 (J2EE603656750), no write within 40 secs, disconnect now
[Thr  1] MsJ2EE_AddDisconnectedNode: add node [603656750] into disconnect list
[Thr  1] MsJ2EE_CleanDisconnectedNodes: clean disconnect list

We have checked with network team, there is no network problem. The load is normal and there are no abnormally large XI messages processed. Can anyone help us in fixing the problem from reoccurring.

regards

AKD

Former Member · ‎03-16-2013

We have since upgraded to PI 7.3.

PI 7.3 seems to be much better in this regard.

Former Member · ‎10-11-2011

We updated to the latest release of IBM JDK. The problem did not occur for 2-3 months and has now reappeared.

We did not find and long running GC or paging. The time outs are already set to 1 hr.

It seems that the broadcast is incomplete or not being correctly read:

#1.#001A64A7C244051000000001001790D20004AEA438966CCC#1317920254094#com.sap.engine.core.cluster.impl6.ms.MSRawConnection##com.sap.engine.core.cluster.impl6.ms.MSRawConnection.receiveRawMessage()####n/a##496143d0f03c11e0a403001a64a7c244#SAP J2EE Engine|MS Socket Listener##0#0#Error##Plain###java.io.IOException: Error reading length field of message. EOF has been reached.
        at com.sap.engine.core.cluster.impl6.ms.MSMessageHeader.read(MSMessageHeader.java:442)
        at com.sap.engine.core.cluster.impl6.ms.MSMessageObjectImpl.readHeader(MSMessageObjectImpl.java:142)
        at com.sap.engine.core.cluster.impl6.ms.MSRawConnection.receiveRawMessage(MSRawConnection.java:1660)
        at com.sap.engine.core.cluster.impl6.ms.MSListener.run(MSListener.java:86)
        at com.sap.engine.frame.core.thread.Task.run(Task.java:64)
        at com.sap.engine.core.thread.impl5.SingleThread.execute(SingleThread.java:81)
        at com.sap.engine.core.thread.impl5.SingleThread.run(SingleThread.java:152)

Former Member · ‎06-14-2011

Could you show what happens in the default trace before the broadcast error?

Is there any long running gc? Is your system paging?

J2EE nodes restart due to missed broadcast

Accepted Solutions (1)

Accepted Solutions (1)

Answers (2)

Answers (2)

advance Data action to link two dimensions

Publish Message for External System

Navigation with filters inside a Fiori Elements oD...

Change BW Query Initial Variable Input at runtime

Re: Unable to fetch Oauth token of Grant_Type pass...