cancel
Showing results for 
Search instead for 
Did you mean: 

Application servers went down when CI got down

Former Member
0 Kudos

Hello Experts,

We had a situation where our PE2 CI server got hung. Server team
rebooted the server ( CI went down). There are 40 APP servers attached
to this CI. We want to know why all 40 APP servers went down when CI
was down. Ideally that should not be the case.

Can you help us analyse this?

dev_disp file on one of the app server shows below error message:

Sun Feb 16 19:26:50 2014

*** ERROR => DpEnvCheck: Waiting for answer from msg server since 300 secs [dpxxdisp.c   8362]

*** ERROR => DpEnvCheck: Connection to msg server will be closed [dpxxdisp.c   8364]

***LOG Q0M=> DpMsDetach, ms_detach () [dpxxdisp.c   13051]

MBUF state OFF

MBUF component DOWN

Sun Feb 16 19:30:53 2014

***LOG Q0I=> NiIRead: P=10.197.4.52:3910; L=10.197.5.9:60032: recv (104: Connection reset by peer) [nixxi.cpp 5087]

*** ERROR => NiIRead: SiRecv failed for hdl 18/sock 11

    (SI_ECONN_BROKEN/104; I4; ST; P=10.197.4.52:3910; L=10.197.5.9:60032) [nixxi.cpp    5087]

*** ERROR => MsINiRead: NiBufReceive failed (NIECONN_BROKEN) [msxxi.c      2829]

*** ERROR => MsIReadFromHdl: NiRead (rc=NIECONN_BROKEN) [msxxi.c      1867]

***LOG Q1K=> MsIAttachEx: StoC check failed, Kernel not compatible with system (rc=-100) [msxxi.c      820]

*** ERROR => Kernel incompatible to already connected instances (see dev_ms for details) [dpxxdisp.c   12599]

DpHalt: shutdown server >pe2app01_PE2_10                         < (normal)

DpHalt: stop work processes

Sun Feb 16 19:22:22 2014

*** WARNING => DpEnvCheck: no answer from msg server since 34 secs, but dp_ms_keepalive_timeout(300 secs) not reached [dpxxdisp.c   8383]

Did this happen due to reboot of the server?

A quick response would be much appreciated.

Best Regards

Sachin Bhatt

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos


Hi Sachin,

This is very basic concept. The CI has the message server which heps to communiate between all instances. If the CI goes down then the app servers cannot stand on their own, they will collapse. This is expected. If you see the error messges in the log file,you wll see entries where it says, it is waiting for response from message server but since the CI is down cannot connect to it hence went down.

Hope this clarifies your question.

Thanks,

Arindam

Former Member
0 Kudos

Hi Arindam,

Thanks for the input.

As far as I am able to recollect, I understand that Work process on AS should remain in reconnect mode till it establishes its connection back with CI (message server).

Best Regards

Sachin Bhatt

Former Member
0 Kudos

You are partly correct, The AS waits for the message server to come up but once the timer expires it goes down. In your case it was set to 300 secs (5 mins). If you look at the log, the below portion mentions it

"

*** ERROR => DpEnvCheck: Waiting for answer from msg server since 300 secs [dpxxdisp.c   8362]

*** ERROR => DpEnvCheck: Connection to msg server will be closed [dpxxdisp.c   8364]

The parameter ms/conn_timeout controls this. By default it isset to 300 secs.

Thanks,

Arindam