cancel
Showing results for 
Search instead for 
Did you mean: 

IPC Send timeout detected. Sender: ospid 8949 [oracle@<hosts>

Former Member
0 Kudos

Hi Friends,

We are running SAP BIW 7.10 on Linux and Oracle 11g.

We are facing issue in ODS jobs many times the ODS jobs got hanged  and it doesn't completed. After that we manually terminate the jobs and start it again. We have 4 DB nodes in oracle RAC and 8 SAP application instances.

We have checked the trace log and found many times the alert as "IPC Send timeout detected. Sender: ospid 8949 [oracle@<DB hosts>"

Regards

Ganesh Tiwari

Accepted Solutions (0)

Answers (1)

Answers (1)

stefan_koehler
Active Contributor
0 Kudos

Hi Ganesh,

i can not imagine, that this is the only corresponding message in the asm, cluster and rdbms (alert) log file.

Your description sounds like one RAC instance is hanging due to various reasons and the healthy node(s) are requesting a RAC member kill escalation. Please provide more detailed information.

Regards

Stefan

Former Member
0 Kudos


Hi Stefan,

Thanks for your reply.

This is an RDBMS alert log.

Regards

Ganesh Tiwari

stefan_koehler
Active Contributor
0 Kudos

Hi Ganesh,

yes, but from the healthy instances or crashing instance? It is Oracle RAC and so you need to cross check much more components like ASM, CSSD, etc. and not only one alert log file.

Getting a system state dump from the hanging instance is the more tricky part, if it is not automatically created  by Oracle.

Regards

Stefan

Former Member
0 Kudos

Hi Stefan,

Thanks for your reply.

We have collected cluster log from all the nodes

[ctssd(3383)]CRS-2409:The clock on host XXXXXXXX is not
synchronous with the mean cluster time. No action has been taken as the Cluster
Time Synchronization Service is running in observer mode.

2014-10-02 00:46:25.879:

[ctssd(3383)]CRS-2409:The clock on host XXXXXXXX is not
synchronous with the mean cluster time. No action has been taken as the Cluster
Time Synchronization Service is running in observer mode.

2014-10-02 01:16:26.581:

[ctssd(3383)]CRS-2409:The clock on host XXXXXXX is not
synchronous with the mean cluster time. No action has been taken as the Cluster
Time Synchronization Service is running in observer mode.

2014-10-02 01:46:27.287

2014-10-02 22:52:11.204:

[cssd(3115)]CRS-1612:Network
communication with node xxxxxxxx (3) missing for 50% of timeout interval.  Removal of this node from cluster in 14.850
seconds

2014-10-02 22:52:19.218:

[cssd(3115)]CRS-1611:Network
communication with node xxxxxxxxxxx (3) missing for 75% of timeout
interval.  Removal of this node from
cluster in 6.840 seconds

2014-10-02 22:52:23.220:

[cssd(3115)]CRS-1610:Network
communication with node xxxxxxxxx (3) missing for 90% of timeout interval.  Removal of this node from cluster in 2.840
seconds

2014-10-02 22:52:26.060:

[cssd(3115)]CRS-1632:Node xxxxxxxx
is being removed from the cluster in cluster incarnation 306673017

2014-10-02 22:52:26.082:

[cssd(3115)]CRS-1601:CSSD
Reconfiguration complete. Active nodes are xxxxxxxxx, xxxxxxxxx,xxxxxxxx

Regards

Ganesh Tiwari

Former Member
0 Kudos


Hi Friends

Please reply I have issue in our BW system ODS jobs being failed.

stefan_koehler
Active Contributor
0 Kudos

Hi Ganesh,

you already found the issue as one node is removed (due to not reachable) from your 4 node RAC cluster. You have to figure out why your node stops responding - there are a lot of reasons from hardware to operating system to GI stack to interconnect issues.

How should we assist you in this case?

Regards

Stefan