Skip to Content

HOW TO SET UP SAPHanaSR IN THE COST OPTIMIZED SAP HANA SR SCENARIO - PART I

HOW TO SET UP SAPHanaSR IN THE COST OPTIMIZED SAP HANA SR SCENARIO

Beside this SCN document we have a more detailed best practice online now at Best Practices - Resource Library | SAP Applications | SUSE


1. How to use this Guide


To make this article a bit more easy to edit, I have split the article into 3 parts:

1) Customer has to describe his expectations exactly. In special he needs to understand the relationship of the productive system and the non productive system in takeover scenarios

2) Tests for the scenario have to be developed together with the customer to meet the expectations

3) Check SAP documentation for the cost optimized scenario

4) SUSE supports this scenario but likes to get informed about implementation projects

5) This scenario is limited to a two node cluster

6) This scenario is known to work with SAPHanaSR 0.151+ and SAP HANA SPS9+

7) In the following we name the productive system SLE and the non-productive system QAS

SAP HANA in System Replication mode plus non-productive SAP HANA on secondary (fail over) node also known as cost optimized scenario.

Please read the corresponding SAP documentation for example [SAP_HANA_Master_Guide_en.pdf | http://help.sap.com/hana/SAP_HANA_Master_Guide_en.pdf], section "Using Secondary Servers for Non-Productive systems", p. 46

2. The concept

SAP HANA productive database (SLE) is running on node 1 (suse01) of the cluster and is in SAP HANA System Replication with SAP HANA database (SLE) on node 2 (suse02) of the cluster.

SAP allows to run a non-productive instance of SAP HANA on the system replication site on node 2 (QAS).

In case of failure of the primary SAP HANA on node 1 the cluster first tries to restart the SAP HANA locally on this node. If the restart is not possible or if the complete node 1 is crashed, the takeover process will be triggered.

In case of a takeover the secondary (replica) of this SAP HANA on node 2 is started after the shutdown of the non-productive SAP HANA.


Alternatively you could also configure a different resource handling procedure, but we recommend to try to restart SAP HANA locally first, as a takeover with non preloaded tables could consume much time and also the needed stop of the QAS system will take additional time. Thus in many environments the local restart will be faster.

To achieve an automation of this resource handling process, we can utilize the SAP HANA resource agents included in SAPHanaSR. System replication of the productive database is done with SAPHana and SAPHanaTopology and handling of the non-productive database with the well known SAPDatabase resource agent. The idea behind is the following:

The SUSE pacemaker/openais framework is configured to run and monitor the productive SAP HANA database in system replication configuration as described in the [setup guide | https://www.suse.com/promo/sap/hana/replication.html] (you need to complete a form to read the document) coming with the SAPHanaSR resource agents. The operation of the non-productive SAP HANA instance is done by the SAPDatabase resource agent because it is a single database without replication.

The automatic shutdown of the non-productive SAP HANA database (QAS) is achieved by cluster rules (anti-collocation of SAP HANA prod vs. SAP HANA non-prod), i.e. if the primary SAP HANA system (SLE) fails the anti-collocation rules for SAP HANA non-prod (QAS) are triggered and the SAPDatabase resource agent shuts down the non-productive SAP HANA database. The takeover to node 2 takes a long period of time, because the non-productive database needs to be stopped gracefully prior to takeover the productive database. This prolonged takeover time is the main disadvantage of the cost optimization scenario.

3 Setup of the SAP HANA Database systems

3.1 Install and configure the OS

    1. Register your systems to get the actual packages of the needed resource agents and HA components
    2. Install both nodes (here suse01 and suse02) with SUSE Linux Enterprise Server for SAP Applications 11 SP3 or any newer version of this product validated for SAP HANA.
    3. Follow the [SAP Note 1310037 | http://service.sap.com/sap/support/notes/1310037] and all other SAP HANA specific SAP notes.

3.2 Install SAP HANA productive database on node 1

For the installation of SAP HANA please follow the instructions of the SAP HANA Master Guide and the SAP HANA installation guide downloadable from the

SAP Service Marketplace http://service.sap.com.

Make sure that you have installed a current version of the SAPHOSTAGENT.


3.3 Install SAP HANA productive database on node 2

The installation of the SAP HANA secondary system on node 2 is done in the same way as on node1, but some post-installation tasks need to be done.

Please keep in mind that parameters for memory usage of SAP HANA must be set according to the SAP HANA documentation. A very helpful SAP SCN document could be found at: http://scn.sap.com/docs/DOC-47702

3.4 Enable System Replication (SR)

Enable system replication between SAP HANA on node 1 and SAP HANA on node 2.

The concrete procedure to setup system replication is described by SAP. Here we provide the major steps:

    1. Backup the SAP HANA database on the first node (primary must be started for backup)
    2. Enable system replication source at the first node instances (primary must be started for enabling system replication)
    3. Register for system replication source at the second node (secondary must be down to register)
    4. Reduce the memory parameters in global.ini for the secondary instance, so the system resources are sufficient for the SLE and QAS database (see hints in this section below)
    5. Check if the SAP HANA database instances catch up an ACTIVE sync status after starting both sides.

Please keep in mind that parameters for memory allocation and table preload of SAP HANA must be set according to the SAP HANA documentation of SAP.

For example you need to adapt or insert the following parameters into the file global.ini:


[ha_dr_provider_srTakeover]
provider = srTakeover
path = /hana/shared/srHook
execution_order = 1

[memorymanager]
global_allocation_limit = <size-in-GB>

[system_replication]

preload_column_tables = false

 

3.5 Implementing the srTakeover-Hook

The parameters added to global.ini imply that there should be a srTakeover hook installed on the second node. Wwe provide a sample code which needs to be adapted for your environment. Currently you need to provide a user name / password combination inside the hook. We are trying to improve that in the future together with SAP at the LinuxLab.

The srTakeover-Hook is based on sample code of SAP as well as the hook provided by DELL in [SAP Note 2196941 | http://service.sap.com/sap/support/notes/2196941].

This sample hook is given "as-it-is" without any warranty. It must be installed at node 2 as /hana/shared/srHook/srTakeover.py to undo the changes to global_allocation_limit and preload_column_tables in case of a takeover. In your installation you should use a specific database user for the hook sql queries and you really should avoid to use a powerful database user.

"""

Sample for a HA/DR hook provider.

When using your own code in here, please copy this file to location on /hana/shared outside the HANA installation.

This file will be overwritten with each hdbupd call! To configure your own changed version of this file, please add

to your global.ini lines similar to this:

    [ha_dr_provider_<className>]

    provider = <className>

    path = /hana/shared/haHook

    execution_order = 1

For all hooks, 0 must be returned in case of success.

Set the following variables : dbinst Instance Number [e.g. 00 - 99 ]

                              dbuser Username [ e.g. SYSTEM ]

                              dbpwd  user password [ e.g. SLES4sap ]

                              dbport port where db listens for sql [e.g 30013 or 30015]

"""

dbuser="SYSTEM"

dbpwd="manager"

dbinst="00"

dbport="30013"

stmnt1 = "ALTER SYSTEM ALTER CONFIGURATION ('global.ini','SYSTEM') UNSET ('memorymanager','global_allocation_limit') WITH RECONFIGURE"

stmnt2 = "ALTER SYSTEM ALTER CONFIGURATION ('global.ini','SYSTEM') UNSET ('system_replication','preload_column_tables') WITH RECONFIGURE"

from hdb_ha_dr.client import HADRBase, Helper

import os, time, dbapi

class srTakeover(HADRBase):

    def __init__(self, *args, **kwargs):

        # delegate construction to base class

        super(srTakeover, self).__init__(*args, **kwargs)

    def about(self):

        return {"provider_company" :      "SUSE",

                "provider_name" :          "srTakeover", # provider name = class name

                "provider_description" :  "Replication takeover script to set parameters to default.",

                "provider_version" :      "1.0"}

    def startup(self, hostname, storage_partition, system_replication_mode, **kwargs):

        self.tracer.debug("enter startup hook; %s" % locals())

        self.tracer.debug(self.config.toString())

        self.tracer.info("leave startup hook")

        return 0

    def shutdown(self, hostname, storage_partition, system_replication_mode, **kwargs):

        self.tracer.debug("enter shutdown hook; %s" % locals())

        self.tracer.debug(self.config.toString())

        self.tracer.info("leave shutdown hook")

        return 0

    def failover(self, hostname, storage_partition, system_replication_mode, **kwargs):

        self.tracer.debug("enter failover hook; %s" % locals())

        self.tracer.debug(self.config.toString())

        self.tracer.info("leave failover hook")

        return 0

    def stonith(self, failingHost, **kwargs):

        self.tracer.debug("enter stonith hook; %s" % locals())

        self.tracer.debug(self.config.toString())

        # e.g. stonith of params["failed_host"]

        # e-g- set vIP active

        self.tracer.info("leave stonith hook")

        return 0

    def preTakeover(self, isForce, **kwargs):

        """Pre takeover hook."""

        self.tracer.info("%s.preTakeover method called with isForce=%s" % (self.__class__.__name__, isForce))

        if not isForce:

            # run pre takeover code

            # run pre-check, return != 0 in case of error => will abort takeover

            return 0

        else:

            # possible force-takeover only code

            # usually nothing to do here

            return 0

    def postTakeover(self, rc, **kwargs):

        """Post takeover hook."""

        self.tracer.info("%s.postTakeover method called with rc=%s" % (self.__class__.__name__, rc))

        if rc == 0:

            # normal takeover succeeded

        conn = dbapi.connect('localhost',dbport,dbuser,dbpwd)

            cursor = conn.cursor()

            cursor.execute(stmnt1)

            cursor.execute(stmnt2)

            return 0

        elif rc == 1:

            # waiting for force takeover

        conn = dbapi.connect('localhost',30013,'SYSTEM','manager')

            cursor = conn.cursor()

            stmnt = "ALTER SYSTEM ALTER CONFIGURATION ('global.ini','SYSTEM') UNSET ('memorymanager','global_allocation_limit') WITH RECONFIGURE"

            cursor.execute(stmnt)

            stmnt = "ALTER SYSTEM ALTER CONFIGURATION ('global.ini','SYSTEM') UNSET ('system_replication','preload_column_tables') WITH RECONFIGURE"

            cursor.execute(stmnt)

            return 0

        elif rc == 2:

            # error, something went wrong

            return 0

In the same directory (/hana/shared/srHook) you need to install some python files from the SAP HANA client software to enable the hook to run the database connect and sql queries.

dbapi.py, __init__.py, resultrow.py

After changing the global.ini on node 2 and implementing the srTakeover hook you should start the productive HANA database secondary and check if the parameters are working.

3.6 Manual SAP HANA takeover test

Test if takeover can be achieved by administrator interaction (i.e. manually). After the sr_takeover, check if the global.ini parameters on the node2 have been changed by the srTakeover hook call (for example with the SAP HANA Administration Console).

3.7 Manual re-establish SAP HANA SR to original state

Bring the systems back to the original state:

    1. takeover SLE to node 1
    2. wait till sync state is active
    3. stop SLE on node 2
    4. re-register node 2 as secondary
    5. reconfigure global,ini
    6. start SLE on node2

Please note that while you are switching back to original system replication direction you must also re-change the global.ini as the srTakeover hook should have deleted the parameters for memory limitation and table preload.

3.8 Install the non-productive SAP HANA QAS

Install the non-productive SAP HANA QAS database on node 2. Please keep in mind that parameters for memory usage of SAP HANA must be according with the SAP HANA documentation of SAP. Please provide separate storage for the non-productive SAP HANA System.

See also: [http://help.sap.com/hana/SAP_HANA_Administration_Guide_en.pdf | http://help.sap.com/hana/SAP_HANA_Administration_Guide_en.pdf] (Chapter 5).

A maybe feasible starting point to tune the resource limitation is that the secondary on the node2 should hold 10% of the resources, while the resources for the QAS system must also be limited not to get into conflict with the secondary PRD system. Exact values for resource optimization are heavily depending on the customers workload and project situation and should be discussed together with SAP.

These sizing is discussed in an SAP document available at http://scn.sap.com/docs/DOC-47702, pages 7ff.

Before installing the QAS system, we recommend you to stop the productive secondary to avoid resource bottle necks, because the memory consumption of the QAS system could only be reduced in a post installation task.

Post-installation tasks for the QAS database:

suse02:~> hdbsql -u system -i 10 -nlocalhost:31013 'CREATE USER SC PASSWORD L1nuxLab'

suse02:~> hdbsql -u system -i 10 -nlocalhost:31013 'GRANT MONITORING TO SC'

suse02:~> hdbsql -u system -i 10 -nlocalhost:31013 'ALTER USER SC DISABLE PASSWORD LIFETIME'

suse02:~> hdbsql -u sc -i 10 -nlocalhost:31013 'SELECT * FROM DUMMY'

suse02:~> hdbsql -u sc -p L1nuxLab -i 10 -nlocalhost:31013 'SELECT * FROM DUMMY'

suse02:~> hdbuserstore SET QASSAPDBCTRL localhost:31013 SC L1nuxLab

suse02:~> hdbsql -U QASSAPDBCTRL 'select * from Dummy'

  • Adapt global.ini to limit memory resources

[memorymanager]

global_allocation_limit = <size-in-GB>

  • Start QAS and secondary SAP HANA database

Check if everything works as configured and expected.

3.9 Shutdown SAP HANA database systems

To stop the SAP HANA database systems login on node1 as <sid-of-prd>adm and call "HDB stop". On node2 we need to stop the productive and the QAS system. So you need to login and run "HDB stop" as <sid-of-prd>adm and as <sid-of-qas>adm. In our setup we have SLE for the production and QAS for the non-productive database system.


suse01:~# su - sleadm -c "HDB stop"

suse02:~# su - sleadm -c "HDB stop"

suse02:~# su - qasadm -c "HDB stop"

Please follow up with part 2 of this article to set up the cluster and to integrate SAP HANA into the cluster  (http://scn.sap.com/docs/DOC-68633).

Former Member