Skip to Content

HANA System Replication - Take-over process

Purpose

This blog will focus on practical aspects of the take-over process of SAP HANA System Replication (HSR).

Pre-reading

There is enough information available which gives a good overview which this blog does not aim to replace. Therefore you should read through the material to get a good knowledge of HSR before you continue with this blog.

Un-confusion

To reduce the babylonian confusion, some short notes on the used lingo:

  • System Replication is NOT Host Auto-Failover
  • System Replication is NOT Scale Out
  • System Replication is Disaster Tolerance (DT) / Disaster Recovery (DR)
  • System Replication synchronizes data between two data centers (Site A and Site B)
  • There is always one (logical) primary and one secondary system, e.g. site A is primary and site B is secondary. After a takeover, site B is (logically) primary system. Thus, primary and secondary changes, whereas site A and B will refer to a physical instance.
  • A takeover is making a secondary system functioning as primary system. Note that this explicitly does not include changing the state of the primary (in exceptional/disaster situations, the secondary must not depend on having access to the primary site to be able to change the state)
  • Failback: back to original setup, e.g. a takeover from the backup site to the preferred site: the preferred site may have a better internet connectivity, better reachable by clients, etc.

Introduction

SAP HANA tries to reduce the dependencies to specific environment as far as possible, i.e. have minimal assumptions about technology used in the data center and how and where it is decided to start a takeover.

Situations requiring a takeover in most cases (i.e. putting planned maintenance of hardware or software upgrade aside) have some exceptional character, e.g. loosing network connection or having hardware problems. Hence, a takeover mechanism should make least possible assumption about design, behavior, or availability of external resources. Three out of the most important effects are:

  • No automated takeover: HANA has no built-in mechanism for an automated takeover: it depends on the environment if and how a takeover takes places. Also, on site B not all information to make such a decision may be available (site B may have problems to contact site A, without that site A is effected in any sense)
  • If site B is taking over, HANA will not make any attempt to stop or change the state of site A: it depends on the data center infrastructure and the exception causing the takeover, how site A can be made unavailable for clients to avoid a "split brain."
  • If and how backups are synced between data centers is in the responsibility of the administrators. Multiple ways are supported, whereas different things have to be considered with different solutions.

Prerequisites

All your services on the primary site have to be reachable by the secondary site: every daemon on the secondary site will connect to its counterpart on the primary site.

Make sure your daemons are listening to the sockets the secondary site is connecting to (check with e.g. netstat -ptl | grep hdb). If some services are only listening on localhost, you have to (in HANA studio) go to Configuration -> global.ini -> communication -> listeninterface: .global (or set to .internal where you can also define the addresses where the server should be bound to. Check section 10.2 of the HAN Admin Guide for more details).

In a scale out, run the (hdbnsutil) commands only on the server which is acting as master, which is not necessarily the first node, e.g. after a host auto-failover of the first-node.

The tool to use is hdbnsutil. It should always be run as <sid>adm user. Using this tool, you will use a "sitename" as parameter. This is a logical name used to identify the two sites. You only have to take care to not provide the same site name to the two sites.

Basic run through

1) Enable system replication

site A has to be started and in mode "none" (you can check your SR state with "hdbnsutil -sr_state")

site B has to be stopped and in mode "none"

on site A, run

siteA # hdbnsutil -sr_enable --name=<sitename site A>

on site B, run

siteB # hdbnsutil -sr_register --remoteHost=<IP, DN or host name of site A> --remoteInstance=<instance ID> --mode=<syncmem|sync|async> name=<site name site B>

if successful, start HANA

siteB # sapcontrol -nr <instance ID> -function StartSystem

This will start HANA on site B, it will connect to site A and start to sync, i.e., copy the first sync point from site A to site B and receive logs

2) Takeover

site A is running as primary, site B as secondary. As noted, executing a takeover on site B will not influence site A. Hence, you are responsible to only do a takeover if all transactions executed on site A have been replicated to site B and site A has been made unavailable for clients (e.g., re-assign the virtual IP to host A)

2.1 Make sure primary on Site A is completely stopped

2.2 Unbind virtual IP addresses (or change DNS entry)

2.3 On site B, run siteB # hdbnsutil -sr_takeover

2.4 Bind virtual IP addresses

2.5 Make sure backup is running on new primary

3) Re-enable system replication

After a takeover, site A has to be shut down, potentially existing problems with hardware etc. have to be fixed. Once site A is ready to run, it will be configured to run as secondary system, i.e., being the replication site for site B, i.e., executing a similar command as in step 1.

On site A, run

siteA # hdbnsutil -sr_register --remoteHost=<IP, DN or host name of site B> --remoteInstance=<instance ID> --mode=<sync|syncmem|async> name=<site name site A>

siteA # sapcontrol -nr <instance ID> -function StartSystem

As in step 1, start HANA, and site A will start to get in sync with site B.

4) Failback

If site A and site B are equal and there is no preferred site, the system may run in this configuration (site A as secondary, site B as primary). If site A is the preferred site, one may fail back to site A. For this, the same commands are executed.

Pre-condition: site B is running as primary, site A as secondary, SR is active and the systems are in sync.

first, site B is made unavailable for clients (e.g, re-assign the virtual IP to host B and shut down HANA on site B)

then, execute on host A:

siteA # hdbnsutil -sr_takeover

Once, HANA on host B is shut down, execute on host B:

siteB # hdbnsutil -sr_register --remoteHost=<IP, DN or host name of site A> --remoteInstance=<instance ID> --mode=<sync|syncmem|async> name=<site name site B>

and start HANA again

siteB # sapcontrol -nr <instance ID> -function StartSystem

5) Disable system replication

Especially if you want to get familiar with the SR commands, you may want to disable system replication and get two independent systems in sr_state none again. Assuming, site A is primary and site B is secondary, check if the primary HANA is up and the secondary is down. Then, execute on the secondary:

hostB # hdbnsutil -sr_unregister

Before SPS07, the primary needed to be shut down to disable system replication. Thus, if you are running HANA before SPS07, execute

siteA # sapcontrol -nr <instance ID> -function StopSystem

siteA # sapcontrol -nr <instance ID> -function WaitforStopped 999 0

then, for any version, execute

siteA # hdbnsutil -sr_disable

Then you can restart HANA again (you do not need to restart siteA if you did not shutdown it)

Demo scripts

Attached to this blog, you will find an archive containing some bash scripts. They are not intended for productive usage, but should give you an idea how to run through a scenario as sketched above. You have to copy the scripts to the master node of each site, and adopt the settings.sh accordingly. It is furthermore assumed, that <sid>adm can ssh to the other site. To run through the scenario as sketched above, run

1) Enable system replication

siteA # ./sr_enable.sh

siteB # ./sr_register.sh

2) Takeover

siteB # ./sr_takeover.sh

3) Re-enable system replication

siteA # ./sr_register.sh

4) Failback

siteA # ./sr_takeover.sh

siteB # ./sr_register.sh

5) Disable system replication

(on site A or site B:)

./sr_unregister.sh

Sripts are zipped here: http://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/80219c3c-eee7-3110-c7af-ba5bb697ced5

Authors

Former Member

Former Member

Former Member