cancel
Showing results for 
Search instead for 
Did you mean: 

ICM Trace File (dev_icm)

Former Member
0 Kudos

Hi guys

Im picking up watchdog errors on our ICM Monitor

ERROR => IcmMpiWatchDogThread: Error 14 reading wakeup OOB [icxxthr_mt.c 1396]

ERROR => IcmMpiWatchDogWakeUp: MpiWriteOOBData failed (rc=14) [icxxthr_mt.c 2255]

Release 720

Any idea what could be causing this?

This seems to be intermittent as it does not happen all the time

Shiva

Accepted Solutions (0)

Answers (4)

Answers (4)

Former Member
0 Kudos

Hi shiva ,

From the log interpretation its due to Network traffic which is causing the problem.

yes your right its not happening all the time , if you can see the log

[Thr 140476589897472] Mon Oct 14 09:42:27 2013

[Thr 140476589897472] *** WARNING => Connection request from (30/31/0) to host: #WEZAISAS0024.bweafrica.com, service: 25 failed (NIE

[Thr 140476589897472]  {002a1b79} [icxxconn_mt.c 2271]

[Thr 140476588840704] *** WARNING => Connection request from (30/31/0) to host: #WEZAISAS0024.bweafrica.com, service: 25 failed (NIE

[Thr 140476588840704]  {002a1b7a} [icxxconn_mt.c 2271]

[Thr 140477069756256] Mon Oct 14 13:12:38 2013

[Thr 140477069756256] NiSelISelectInt: socketNo 111, selHandle 741

[Thr 140476587783936] Mon Oct 14 13:45:26 2013

[Thr 140476587783936] *** ERROR => HttpPlugInHandleNetData: server: premature EOS (0/0) in request [http_plg_mt. 2005]

[Thr 140476587783936] Mon Oct 14 14:27:29 2013

[Thr 140476587783936] *** WARNING => Connection request from (28/29/0) to host: #WEZAISAS0024.bweafrica.com, service: 25 failed (NIE

[Thr 140476587783936]  {00301fc1} [icxxconn_mt.c 2271]

[Thr 140476577806080] Mon Oct 14 14:42:34 2013

[Thr 140476577806080] IcmCreateWorkerThreads: created worker thread 10

[Thr 140476458063616] Mon Oct 14 14:47:46 2013

[Thr 140476458063616] IcmWorkerThread: end worker thread 10

[Thr 140476587783936] Mon Oct 14 15:12:29 2013

[Thr 140476587783936] *** WARNING => Connection request from (28/29/0) to host: #WEZAISAS0024.bweafrica.com, service: 25 failed (NIE

[Thr 140476587783936]  {00312080} [icxxconn_mt.c 2271]

[Thr 140476590954240] Mon Oct 14 15:27:29 2013

[Thr 140476590954240] *** WARNING => Connection request from (32/33/0) to host: #WEZAISAS0024.bweafrica.com, service: 25 failed (NIE

[Thr 140476590954240]  {003320b9} [icxxconn_mt.c 2271]

[Thr 140476577806080] Mon Oct 14 16:22:31 2013

[Thr 140476577806080] {00362155} Traffic Control: Nettimeout (30) exceeded by peer: 10.45.248.148:52930 [icxxthr_mt.c 4303]

[Thr 140476577806080] CONNECTION (id=54/8533):

on OCT 14 , it has logged on different timings (9 , 1.42 , 2.45 .....) , and the NIE error is for only one host if you can notice. Try to check if that host went down or are there any network issues between your local server and that remote server.

Hope this helps !!!

regards,

keerthi

0 Kudos

the return core 14 means  handle outdated or guid too old.

This is due to the ICM and work process communication.

I believe that the MPI communication with the work process raise some wrong structure.  The MPI structure is two sides communication and a fail in the synchronization can cause this problem.

If a work process crash, restart or fail this can happens.

I believe that you should concentrate the reason for the error checking the work process.

A restart can help to solve this problem, however, the root cause should be checked by WP side.

Regards

Clebio

AtulKumarJain
Active Contributor
0 Kudos
Former Member
0 Kudos

Atul,

We've maintained parameters settings as per the note in all our environments , but still getting these errors in dev

Shiva

Former Member
0 Kudos

Hi,

I think that you should go on higher patch as

This note should fix your issues.

1875752 - System suffers from performance degration

It looks that you are at patch level 433 and issue is fixed in 441.

Thanks

Rishi abrol