cancel
Showing results for 
Search instead for 
Did you mean: 

J2EE nodes restart due to missed broadcast

Former Member
0 Kudos

We have 3 XI application instances with one j2ee server node on each instance. Two are on same server (say server A), one on a different server (say server B) and we have SCS, DB etc on a third server (say C).

We have an issue with J2EE nodes restarting occasionally due to missed message broadcast.

[Framework -> criticalShutdown] Missed broadcast due to network problems. Restart is required (advanced Reconnect)
Jun 8, 2011 1:32:07 PM              com.sap.engine.core.Framework [Thread[Thread-559731,5,SAPEngine_System_Thread[impl:5]_Group]] Fatal: Critical shutdown was invoked.
Reason is: Missed broadcast due to network problems. Restart is required (advanced Reconnect)
[Framework -> criticalShutdown] Exiting Listener Loop. This requires a restart of the node. Possible reason is an interrupted reconnect session to the message server.
Jun 8, 2011 1:32:07 PM              com.sap.engine.core.Framework [SAP J2EE Engine|MS Socket Listener] Fatal: Critical shutdown was invoked. Reason is: Exiting Listener
 Loop. This requires a restart of the node. Possible reason is an interrupted reconnect session to the message server.

dev_server0 shows

[Thr 109283] JLaunchIExitJava: exit hook is called (rc=-333)
[Thr 109283] **********************************************************************
*** ERROR => The Java VM terminated with a non-zero exit code.
*** Please see SAP Note 940893 , section 'J2EE Engine exit codes'
*** for additional information and trouble shooting.
**********************************************************************
[Thr 109283] SigISetIgnoreAction : SIG_IGN for signal 20
[Thr 109283] JLaunchCloseProgram: good bye (exitcode=-333)

SCS dev_ms shows

[Thr  1] Wed Jun  8 13:32:02 2011
[Thr  1] ***LOG Q0I=> NiPWrite: writev (32: Broken pipe) [niuxi_mt.c 1174]
[Thr  1] *** ERROR => MsSRead: NiBufReceive (rc=NIECONN_BROKEN) [msxxserv_mt. 9334]

[Thr  1] Wed Jun  8 13:32:03 2011
[Thr  1] *** ERROR => MsSClientHandle: MsSRead C3 (J2EE3233250), MSEINTERN [msxxserv_mt. 3814]
[Thr  1] MsJ2EE_AddDisconnectedNode: add node [3233250] into disconnect list
[Thr  1] MsSSuspendCheck: C7 (J2EE603656750), no write within 40 secs, disconnect now
[Thr  1] MsJ2EE_AddDisconnectedNode: add node [603656750] into disconnect list
[Thr  1] MsJ2EE_CleanDisconnectedNodes: clean disconnect list

We have checked with network team, there is no network problem. The load is normal and there are no abnormally large XI messages processed. Can anyone help us in fixing the problem from reoccurring.

regards

AKD

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

We have since upgraded to PI 7.3.

PI 7.3 seems to be much better in this regard.

Answers (2)

Answers (2)

Former Member
0 Kudos

We updated to the latest release of IBM JDK. The problem did not occur for 2-3 months and has now reappeared.

We did not find and long running GC or paging. The time outs are already set to 1 hr.

It seems that the broadcast is incomplete or not being correctly read:

#1.#001A64A7C244051000000001001790D20004AEA438966CCC#1317920254094#com.sap.engine.core.cluster.impl6.ms.MSRawConnection##com.sap.engine.core.cluster.impl6.ms.MSRawConnection.receiveRawMessage()####n/a##496143d0f03c11e0a403001a64a7c244#SAP J2EE Engine|MS Socket Listener##0#0#Error##Plain###java.io.IOException: Error reading length field of message. EOF has been reached.
        at com.sap.engine.core.cluster.impl6.ms.MSMessageHeader.read(MSMessageHeader.java:442)
        at com.sap.engine.core.cluster.impl6.ms.MSMessageObjectImpl.readHeader(MSMessageObjectImpl.java:142)
        at com.sap.engine.core.cluster.impl6.ms.MSRawConnection.receiveRawMessage(MSRawConnection.java:1660)
        at com.sap.engine.core.cluster.impl6.ms.MSListener.run(MSListener.java:86)
        at com.sap.engine.frame.core.thread.Task.run(Task.java:64)
        at com.sap.engine.core.thread.impl5.SingleThread.execute(SingleThread.java:81)
        at com.sap.engine.core.thread.impl5.SingleThread.run(SingleThread.java:152)

Former Member
0 Kudos

Could you show what happens in the default trace before the broadcast error?

Is there any long running gc? Is your system paging?

0 Kudos

Hi all,

In addition,

Usually the "Missed broadcast " can happen either memory issue or casued by network and message server issues.

Memory Issue:

- Check if the heap memory is paged to the disk;

- Check if the server has enough physical memory to hold all the heap. Refer the SAP note 723909 Java VM settings for J2EE 6.40/7.0

- Finally, check if the server nodes are taking so long time to performing the GCs from std_server<n> file. Attach here the last std_server<x>.out files before the crash.

- If you are facing OutOfMemory Issue, you should triger heapdumps at the time when it crashes with OOM. Kindly refer to SAP Note 1004255 for more details. Then it can be analyzed with MAT.

Network and message server issues:

Network and message server issuesIf there are no problems with the memory, we recommend you do is increase the amount of time that the j2ee system will wait on a response from the message server to help avoid this issue.

You can edit the parameter in the configTool:

GlobalServer -> Manager -> ClusterManager -> ms.confirmation.timeout.

set it to "ms.reconnect.timeout to 600000"

Kindly refer to the following web page from help.sap.com:

http://help.sap.com/saphelp_nwce10/helpdata/en/56/0d09bb8df2f848b3003bf

84df5a9a/frameset.htm

Thanks and g'day.