cancel
Showing results for 
Search instead for 
Did you mean: 

What happend if the master index server fail in scale out?

Former Member
0 Kudos

Hi experts,

I have a doubt... what happend if the master index server fail in a landscape with scale out but without standby?

does the change from the fail master index server change automatically to another index server slave?

Thanks in advance,

Regards,

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

When no standby server is available and a node fails, the whole db instance fails.


In such a - non-supported - scenario, there won't be a node that would take care of the data that was handled by the failed node before.

Former Member
0 Kudos

I know, the whole db instance fails, but... how can solved? Can I remove the fail node and work with other nodes or can I change the master index server to other node?

Regards,

lbreddemann
Active Contributor
0 Kudos

It can be solved by replacing the failed node with one that works.

As each node should be working on it's own shard of the data, just assigning the piece of the failed node to any of the remaining nodes would mean to overload that node.

And if you had a node that would still have enough capacity to take over the work of a failed node, then this should be your standby-node anyhow.

The point here is not that the master name server node is affected. There are two other nodes that would take over the master name server role.

The point is that one of the nodes failed and if it cannot be brought up again, you're lacking a full node to run this distributed system.

You have to replace the node then with a working one.

That's how sharding works.

To mitigate the inherent single point of failure, we only support scale out with standby nodes.

Without that, you're out of luck.

- Lars

Former Member
0 Kudos

ok, I think the last one.

Suppose, I have only two nodes in scale out and no more... if one of them crash; the unique solution is reconfigure for work SAP HANA with only one node (that it can assume the resources), is it correct?

The reconfiguration is... backup and restore in one node?

Thanks in advance,

Regards,

lbreddemann
Active Contributor
0 Kudos

No. Why would you backup and restore in this case?

The storage is shared in a SAP HANA cluster.

It doesn't "go bad" when a server node fails.

What you need to have is another server with the SAP HANA software on it that you attach to the storage. Then you "introduce" the new node to the system and can continue to work with no committed transaction lost.

- Lars

Former Member
0 Kudos

No, I said in the case I have not other server and I want work only with one.

Regards,

lbreddemann
Active Contributor
0 Kudos

So you're asking if it is possible to reduce a scale-out system back to a single-node system?

This is a major system landscape design change and not the reaction to a server fault.

Also, it's only possible under the condition that the single-node system would be able to cope with the total load of the system.

In general yes, if you could squeeze all data to be processed by a single node, then you can "migrate" back to a single node system. And yes, that would be done via the restore of a backup.

As said above, this is not the scenario you asked about before - recovering from a failed node.

Remember, SAP HANA uses a shared nothing (SHARDING) approach. If resources fail and no redundant resources are available, then the system fails.

- Lars

henrique_pinto
Active Contributor
0 Kudos

If you have n active nodes of X TB of RAM, it means your total sizing requires n*X TB of RAM available for your system to properly operate. Also, a n-node scale out system will already have placed all tables/distributed all partitions across these n available nodes. In case of of those n nodes fail, you'll end up with (n-1)*X TB of available RAM. Also, the disk area equivalent to the tables/partitions of the failed node won't have available memory space to be loaded into, and that's why your whole instance fails.

You can of course redistribute the tables/partitions across your (n-1)-nodes scale out system, in case (n-1)*X TB of RAM is enough for all of your tables. But this is not without a considerable (> a few minutes) downtime.

The recommendation of having at least 1 stand-by node allows a near-zero downtime, since the failed node requests will be taken over by the stand by node (which will also mount the disk area of the tables/partitions of the failed node). In this case, your pending requests which were processed by the failed node might fail, but new requests will be properly processed (since the master node will already route them to the newly available former-stand-ny node).

In particular, it's a possible configuration to have a 2-node scale-out system, 1 active and 1 stand-by. However, you won't have 2*X TB available RAM, you'll have only X TB (it's active-passive, not active-active). This configuration is called host auto failover in the context of SOH and allows true HA (zero downtime) for single node systems.

Best,

Henrique.

Answers (0)