Troubleshooting SAP HANA System Replication

System Replication is NOT Host Auto-Failover.

System Replication allows you to replicate your SAP HANA database data from one computer to another computer to compensate for system failures.

System replication is set up so that a secondary standby system is configured as an exact copy of the active primary system

The secondary system can be located near the primary system or, it can be installed in a remote site

Important Files ;

When troubleshooting System replication issues the following files are the most relevant ones.

–nameserver_<hostname> – hdbnsutil log file

     Contains information on each execution of hdbnsutil

–nameserver_<hostname> – nameserver trace file

     Documents registration of sites as well as takeover process

–indexserver_<hostname> – indexserver trace file

     Documents individual takeover process -> important for log shipping issues

–daemon_<hostname> – daemon trace file

     Shows quick overview of starting/stopping of processes and connections to other site

Network Issue

There are two requirements for successful networking for HANA Disaster Recovery:

–1. “Throughput”: It must be possible to transport the size of the persistently stored data within one day from the primary to the secondary.

–2. “Latency”: The redo log shipping wait time for 4 KB log buffers must be less than a millisecond or in a low single-digit millisecond range – depending on the application requirements (relevant for synchronous replication only).

Symptoms of insufficient network capabilities can manifest like this:

–Replication status is “Initializing” for a very long time

–Multiple log file entries such as the following:

o43349]{-1}[-1/-1] 2015-01-28 23:56:15.149841 e Stream NetworkChannelCompletion.cpp(00626) : NetworkChannelCompletionThread #2 NetworkChannel FD 24 [0x00007ffb67ca5a00]  {refCnt=3, idx=2} (invalid)-> Connected,[r---]

: Error in asynchronous stream event: exception  1: no.2110001 (Basis/IO/Stream/impl/NetworkChannelCompletion.cpp:546)

Generic stream error: getsockopt, Event=EPOLLERR - , rc=110: Connection timed out

o[76620]{-1}[-1/-1] 2015-01-09 08:52:54.642605 e sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(00369) : Closing connection to siteID 2. LogShipping was waiting for 80 seconds (logshipping_timeout = 60)!

[225893]{-1}[-1/-1] 2015-01-09 08:58:57.893574 e Stream NetworkChannelCompletion.cpp(00626) : NetworkChannelCompletionThread #0 NetworkChannel FD 2206 [0x00007f955dcfa700]  {refCnt=6, idx=0}> Connected,[-w--]

: Error in asynchronous stream event: exception  1: no.2110001 (Basis/IO/Stream/impl/NetworkChannelCompletion.cpp:546)

Generic stream error: getsockopt, Event=EPOLLERR - , rc=32: Broken pipe


If the issue is occuring as you are setting up replication you should check the nameserver_<hostname> – hdbnsutil log file

on both sits and ensure that you are using the right pommand on the right system

the following link details what command should be run on which system

SAP HANA Administration Guide - SAP Library

Known Issues

Due to an issue with consistency impacted by savepoints and log shipping, a known bug can cause inconsistent savepoint versions.

Crash Stack:

exception 1: no.1000000 (DataAccess/impl/DisasterRecoveryProtocol.hpp:627)Assertion failed: 0 == (oldState & state)exception throw location:

1: 0x00007f08f077e0e1 in DataAccess::ReplicationProtocolSecondaryHandler::preloadTables

2: 0x00007f08f079583d in DataAccess::DisasterRecoverySecondaryHandlerImpl::setupDataShippingHandler

3: 0x00007f08f07959cc in DataAccess::DisasterRecoverySecondaryHandlerImpl::resetDataShippingHandler

4: 0x00007f08f0795d16 in DataAccess::DisasterRecoverySecondaryHandlerImpl::reconnectDataHandleretc.

Solution is to re-register and conduct a full replication again to ensure that both sites are consistent with each other:

hdbnsutil -sr_register --force_full_replica

More information: 2075771 - SAP HANA DB: System Replication - Possible persistence corruption on secondary site

Use full notes and blogs

SCN Blog “HANA System Replication - Take-over process”:

SCN Blog “Registering a Secondary system for System Replication – Troubleshooting” –

Registering a Secondary system for System Replication - Troubleshooting

System Replication Configuration Parameters of global.ini file:

SCN Document “How to Perform System Replication for SAP HANA”:

SCN Document “Network Recommendations for SAP HANA System Replication”:

Video - SAP HANA Academy - Administration: System Replication in SAP HANA Studio -

2057595     FAQ: SAP HANA High Availability

1999880     FAQ: SAP HANA System Replication

2012564     HANA Support for VLAN Trunking Protocol (VTP) based on IEE 802.1Q

2063657     HANA System Replication takeover decision guideline

2033624     System replication: Secondary system hangs during takeover

2053504     System replication: Hanging client processes after a takeover

2105185     System Replication Stopped On Statistics Server Due To OOM On Primary System

2053629     HANA System Replication stops after restart

2050830     Registering a secondary system via HANA Studio fails with error 'remoteHost doe

1984882     Using HANA System Replication for Hardware Exchange with minimum/zero Downtime

1876398     Network  configuration for System Replication in HANA SP6

1995412     Secondary site of System Replication runs out of disk space due to closed data

1945676     Correct usage of hdbnsutil -sr_unregister

2081563     secondary system's replication mode and replication status changed to "UNKNOWN"

1834153     HANA high availability disaster tolerance config

