SAP Business Suite Powered by SAP HANA High Availability with NEC EXPRESSCLUSTER
Cloud environments are now being used by the majority of companies, an increasing number of which are deploying SAP HANA on their cloud infrastructure services. Companies are using SAP HANA not only for fast analysis of big data but also for their mission-critical systems. This has led to a growing need to improve the availability of SAP HANA running on cloud infrastructure services.
Although SAP HANA has a high availability (HA) functionality, it is still necessary to manually switch servers if a failure occurs. This causes an outage in operations from failure detection to completion of server failover, which can potentially lead to lost business opportunities.
EXPRESSCLUSTER, NEC’s high availability infrastructure software, automatically detects failures in a system that uses SAP HANA running on Amazon Web Services (AWS) and switches to a standby server (performs failover). NEC wished to verify whether EXPRESSCLUSTER could shorten operational downtime and boost operational efficiency by cooperating with SAP HANA.
This article focuses on AWS deployments, however EXPRESSCLUSTER can also provide HA for SAP HANA on premise installations.
NEC created a SAP HANA cluster environment on AWS using EXPRESSCLUSTER.
Various types of failures were hypothesized on the created environment and it was verified that a cluster system could be restored by data synchronization using the EXPRESSCLUSTER automatic failover function and SAP HANA system replication function, and that operations could be continued without pause (that is, that SAP ERP Application Server automatically connected SAP HANA again and operations continued without stopping).
The system configuration used in this verification is shown in the figure below.
In this configuration, EXPRESSCLUSTER monitors failures and switches operations and SAP HANA synchronizes data.
- Availability on AWS
AWS has multiple data centers called Availability Zones in locations such as Tokyo and Singapore. Customers can select the Availability Zone that they want to use and freely determine the Availability Zone in which to allocate an EC2 instance. Availability Zones are connected via high-speed dedicated lines. A system can be created across multiple Availability Zones. To realize the high availability required by mission-critical systems, the two instances composing a cluster must be allocated to different Availability Zones.
- Failover on AWS
In cluster configuration, the connection destinations of the cluster must be able to be switched transparently. The virtual private cloud (VPC) of AWS can be used to set the network routing (Route Table), and the network routing can be operated by using an application program interface (API). Connection destinations can be switched by using this API and routing a virtual IP address (virtual IP in the above figure) to the elastic network interface (ENI) of the instance.
- Data synchronization (system replication)
The system replication function of SAP HANA can cause data loss when an actual failure occurs, even in Synchronous mode. The SAP Note 2063657 - HANA System Replication takeover decision guideline” provides criteria for takeover decision. Before executing the takeover, the operator must check these criteria.
NEC adopted the full sync option in synchronous mode. The possibility of data loss can be eliminated by using the full sync option together with EXPRESSCLUSTER. NEC recommends by NEC this setting.
The following figure shows an illustration of the system when Server 1 is running as the primary server and Server 2 is running as the secondary server. The SAP ERP application server is connected to the SAP HANA server by accessing a virtual IP address.
The next figure shows a failure occurs on the primary server. In this case EXPRESSCLUSTER stops SAP HANA on Server 1, and changes SAP HANA on Server 2 from the secondary server (which has been running until now in sync mode) to the primary server, allowing SAP HANA operations to continue. In addition, EXPRESSCLUSTER switches the virtual IP address of Server 1 to that of Server 2. The SAP ERP application server is connected to the new primary SAP HANA server by accessing its virtual IP address.
The following illustration shows a failure on the secondary server. Now EXPRESSCLUSTER stops SAP HANA on Server 2 and disables the full sync option on Server 1, allowing SAP HANA operations to continue.
Supported scenarios and requirements
Only the scenarios and parameters indicated below are supported for a successful cooperation between SAP HANA and EXPRESSCLUSTER. For general system replication requirements please follow SAP guidelines.
1. Two-node cluster consisting of a scale-up configuration (single node) x 2
2. Both nodes must belong to the same network segment.
3. Both nodes must run as a single instance. No quality assurance or development system is running.
4. SAP HANA SPS09 (revision 90) or later
5. The automatic startup attribute of SAP HANA must be set to “off.” (SAP HANA startup is managed by EXPRESSCLUSTER.)
NEC tested the availability of the SAP HANA cluster configuration running on AWS using EXPRESSCLUSTER when the following failures occurred:
|Failure type||Server||Component||Failure||Desired action||Result|
|Hardware failure||Primary||Server||Server down||Failover (to a standby server)||✓|
|Network||Network down||Failover (to a standby server)||✓|
|Secondary||Server||Server down||No Failover (disable full sync option)||✓|
|Network||Network down||No Failover (disable full sync option)||✓|
|Software failure||Primary||OS||OS hung||Failover (to a standby server)||✓|
|SAP HANA DB||Service down||Failover (to a standby server)||✓|
|Process down||Failover (to a standby server)||✓|
|Secondary||OS||OS hung||No Failover (disable full sync option)||✓|
|SAP HANA DB||Service down||No Failover (disable full sync option)||✓|
|Process down||No Failover (disable full sync option)||✓|
|Cloud failure||Primary||Availability Zone||Zone down||Failover (to a standby server)||✓|
|Secondary||Availability Zone||Zone down||No Failover (disable full sync option)||✓|
The following operations have been checked and verified when the above mentioned failures occurred:
- EXPRESSCLUSTER detected the failure and failed over SAP HANA.
- The connection from SAP ERP remained available, and operations could continue. (Data could be updated and referenced.)
Using SAP HANA's stock system replication settings servers must be switched manually when a failure occurs. In a configuration together with EXPRESSCLUSTER, EXPRESSCLUSTER automatically executes all operations from failure detection to failover when a failure occurs.
NEC has also verified that the potential for data loss can be eliminated by using the full sync option, and that operations can continue without stopping because EXPRESSCLUSTER automatically disables the full sync option when a failure occurs on the secondary server.
Whitepaper on SAP Business Suite Powered by SAP HANA HA with EXPRESSCLUSTER
EXPRESSCLUSTER 3.3 new features
SAP HANA Server Installation and Update Guide
SAP HANA Administration Guide
SAP Note 1656099 - SAP Applications on AWS: Supported DB/OS and AWS EC2 products
SAP Note 1964437 - SAP HANA on AWS: Supported AWS EC2 products
SAP Note 2063657 - HANA System Replication takeover decision guideline