cancel
Showing results for 
Search instead for 
Did you mean: 

Having a problem with NetWeaver systems on VMware when VMotion happens

0 Kudos

We have started to migrate systems to VMware as we migrate to 64-bit. But we are running into issues with network disconnects and short dumps that occur at the same time VMotions happen. Oddly, we don't see issues with every VMotion, just some of them.

We are running VMware ESX 3.5x. The guests are all Windows Server 2003 64-bit. We have BW 3.5 and Solution Manager 7 landscapes. The database servers are MS SQL Server 2005 64-bit on separate physical hardware (distributed SAP install). The BW install consists of a CI with 6 additional application servers. The Solution Manager systems are all single application/CI servers systems.

The type of errors we see are

>SYSBATCH R49 Communication error, CPIC return code 020, SAP return code 223

>SYSBATCH R5A > Conversation ID: 20680799

>SYSBATCH R64 > CPI-C function: CMRCV

> Q0I Operating system call recv failed (error no. 10054)

> Q0U Client <appserver>_PRD_00 is not known to the message server

> Q0I Operating system call SiPeekPendConn failed (error no. 10061)

and

> Unable to reach central lock handler

We've also seen short dumps if the database is being accessed during VMotion.

From the errors everything appears to be related to a network disconnect. But the disconnect in VMotion is supposed to only last a few milliseconds. Would that be enough to cause these types of issues?

Any help or advise is appreciated. I would also like to hear if anyone else has similar experiences.

Thanks,

Brian Fitzgerald

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

1) Operating system call SiPeekPendConn error 10061:

Connection failed means that it is not possible to connect to the service from

a network point of view. The problem could be the host can not be reached or the service on the specified tcp/ip port can not be reached.

Pls check them with ping and telnet commands

when the port is open you will get an empty screen -> ok. If you get an error you will need to fix the error, usually the port is not open, or the wrong port is specified, or the entry is missing in the service file

Also check file dev_rd for more clues for further reference please check note148832

2) Operating system call recv failed (error no. 10054)

Contact your network specialists to make an extended and detailed analysis of the WAN/LAN to find out what was happening, monitoring and looking the sent and received packages by the network protocol and find the solution as well

Note 155147 - WinNT: Connection reset by peer

Note 545177 - FAQ: Preliminary steps in analyzing RFC connections

Note 500235 - Network Diagnosis with NIPING

3)Unable to reach central lock handler

This could be for various reasons and you need to find out the exact cause for this error.

http://help.sap.com/saphelp_nw70/helpdata/en/f0/b57338788f4d72e10000009b38f8cf/content.htm

Thank you,

Shyam

0 Kudos

Thanks Shyam for your response. Unfortunately, we have already investigated the items in your response and did not find anything significant. That is why we are wondering about the interactions between SAP on Windows in a VMware guest and the guest's interaction with the host and the host's interaction with the network. We are not seeing anything that points to a physical network problem so we are looking for any insights in the virtualization stack and the impact it could have in contributing to this type of error.

Thanks again,

Brian