Troubleshooting SAP HANA System Replication
System Replication is NOT Host Auto-Failover.
System Replication allows you to replicate your SAP HANA database data from one computer to another computer to compensate for system failures.
System replication is set up so that a secondary standby system is configured as an exact copy of the active primary system
The secondary system can be located near the primary system or, it can be installed in a remote site
Important Files ;
When troubleshooting System replication issues the following files are the most relevant ones.
–nameserver_<hostname>.00000.xxx.trc – hdbnsutil log file
Contains information on each execution of hdbnsutil
–nameserver_<hostname>.3xx01.xxx.trc – nameserver trace file
Documents registration of sites as well as takeover process
–indexserver_<hostname>.3xx03.xxx.trc – indexserver trace file
Documents individual takeover process -> important for log shipping issues
–daemon_<hostname>.3xx00.xxx.trc – daemon trace file
Shows quick overview of starting/stopping of processes and connections to other site
Network Issue
There are two requirements for successful networking for HANA Disaster Recovery:
–1. “Throughput”: It must be possible to transport the size of the persistently stored data within one day from the primary to the secondary.
–2. “Latency”: The redo log shipping wait time for 4 KB log buffers must be less than a millisecond or in a low single-digit millisecond range – depending on the application requirements (relevant for synchronous replication only).
Symptoms of insufficient network capabilities can manifest like this:
–Replication status is “Initializing” for a very long time
–Multiple log file entries such as the following:
o43349]{-1}[-1/-1] 2015-01-28 23:56:15.149841 e Stream NetworkChannelCompletion.cpp(00626) : NetworkChannelCompletionThread #2 NetworkChannel FD 24 [0x00007ffb67ca5a00] {refCnt=3, idx=2} (invalid)->192.168.92.5/30103_tcp Connected,[r---]
: Error in asynchronous stream event: exception 1: no.2110001 (Basis/IO/Stream/impl/NetworkChannelCompletion.cpp:546)
Generic stream error: getsockopt, Event=EPOLLERR - , rc=110: Connection timed out
o[76620]{-1}[-1/-1] 2015-01-09 08:52:54.642605 e sr_dataaccess DisasterRecoveryPrimaryImpl.cpp(00369) : Closing connection to siteID 2. LogShipping was waiting for 80 seconds (logshipping_timeout = 60)!
[225893]{-1}[-1/-1] 2015-01-09 08:58:57.893574 e Stream NetworkChannelCompletion.cpp(00626) : NetworkChannelCompletionThread #0 NetworkChannel FD 2206 [0x00007f955dcfa700] {refCnt=6, idx=0} 10.7.5.130/38103_tcp->10.122.211.130/48499_tcp Connected,[-w--]
: Error in asynchronous stream event: exception 1: no.2110001 (Basis/IO/Stream/impl/NetworkChannelCompletion.cpp:546)
Generic stream error: getsockopt, Event=EPOLLERR - , rc=32: Broken pipe
Tips
If the issue is occuring as you are setting up replication you should check the nameserver_<hostname>.00000.xxx.trc – hdbnsutil log file
on both sits and ensure that you are using the right pommand on the right system
the following link details what command should be run on which system
SAP HANA Administration Guide - SAP Library
Known Issues
Due to an issue with consistency impacted by savepoints and log shipping, a known bug can cause inconsistent savepoint versions.
Crash Stack:
exception 1: no.1000000 (DataAccess/impl/DisasterRecoveryProtocol.hpp:627)Assertion failed: 0 == (oldState & state)exception throw location:
1: 0x00007f08f077e0e1 in DataAccess::ReplicationProtocolSecondaryHandler::preloadTables
2: 0x00007f08f079583d in DataAccess::DisasterRecoverySecondaryHandlerImpl::setupDataShippingHandler
3: 0x00007f08f07959cc in DataAccess::DisasterRecoverySecondaryHandlerImpl::resetDataShippingHandler
4: 0x00007f08f0795d16 in DataAccess::DisasterRecoverySecondaryHandlerImpl::reconnectDataHandleretc.
Solution is to re-register and conduct a full replication again to ensure that both sites are consistent with each other:
hdbnsutil -sr_register --force_full_replica
More information: 2075771 - SAP HANA DB: System Replication - Possible persistence corruption on secondary site
Use full notes and blogs
SCN Blog “HANA System Replication - Take-over process”: http://scn.sap.com/docs/DOC-52345
SCN Blog “Registering a Secondary system for System Replication – Troubleshooting” –
Registering a Secondary system for System Replication - Troubleshooting
System Replication Configuration Parameters of global.ini file: http://help.sap.com/saphelp_hanaplatform/helpdata/en/0c/d257970d514abd8ddf9ee1f45f3bca/content.htm
SCN Document “How to Perform System Replication for SAP HANA”: http://scn.sap.com/docs/DOC-47702
SCN Document “Network Recommendations for SAP HANA System Replication”: http://scn.sap.com/docs/DOC-56044
Video - SAP HANA Academy - Administration: System Replication in SAP HANA Studio - https://www.youtube.com/watch?v=oBUiWMjARpc
2057595 FAQ: SAP HANA High Availability
1999880 FAQ: SAP HANA System Replication
2012564 HANA Support for VLAN Trunking Protocol (VTP) based on IEE 802.1Q
2063657 HANA System Replication takeover decision guideline
2033624 System replication: Secondary system hangs during takeover
2053504 System replication: Hanging client processes after a takeover
2105185 System Replication Stopped On Statistics Server Due To OOM On Primary System
2053629 HANA System Replication stops after restart
2050830 Registering a secondary system via HANA Studio fails with error 'remoteHost doe
1984882 Using HANA System Replication for Hardware Exchange with minimum/zero Downtime
1876398 Network configuration for System Replication in HANA SP6
1995412 Secondary site of System Replication runs out of disk space due to closed data
1945676 Correct usage of hdbnsutil -sr_unregister
2081563 secondary system's replication mode and replication status changed to "UNKNOWN"
1834153 HANA high availability disaster tolerance config