cancel
Showing results for 
Search instead for 
Did you mean: 

Deactivation HANA System Replication doesnt work

0 Kudos

Hello

We are setting up  Hana System Replication.

Our landscape is composed by 2 Appliances HANA SUSE 11.3:

Hana Revision: 93.

All tests relative to HSR (Hana System Replication) are working except one: deactivating Replication.

In detail:

- Activating Replication is working ok

- Registering secondary node and data replication (from node1 to node2) is well working

- Performing takeover from node1 to node2 is ok

- Registering node1 to node 2  and data replication (from node2 to node1 ) is ok

- Performing failback from node2 to node1 is ok

But the issue appears if I want deactivate replication:

- Unregistering secondary node (using hdbnsutil -sr_unregister or hana Studio) it semms ok and database node2 starts ok.

- But if I deactivate replication definitively ( using command hdbnsutil -sr_disable or using Hana Studio) , it seems is ok, but database  node 2 can be restarted anymore.

Always we see an error on trace file of Hana node2 saying :

“[104819]{-1}[-1/-1] 2015-07-16 12:09:25.203190 f NameServer       TREXNameServer.cpp(03338) : landscape ID mismatch between nameserver.ini/[landscape]/id (5570cb6a-22c4-8035-e100-00000a010154) and topology.ini (556eb908-bc3c-9cbf-e100-00000a010152)

-> stopping instance ...

It seems deactivation replication is not well working…

.. In this situation we only can do Activation replication from node 1, and register node 2 to 1, and then we establish again replication 1 –> 2. But we can’t deactivate it definitively.

Any suggestion?

Thanks and regards

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

Hi Iribarne,

We are currently on HANA SPS09 HANA Revision 96 (MDC)

There is a 3 Tier System Replication setup in our landscape and we just completed Replication failover and failback tests this month

Part of the failover and failback we have to disable the system replication on certain systems, I have not seen any specific error on our systems while working on them.

Your sequence of steps is not entirely clear to me so I am trying to put the steps together

Consider we have 2 HANA systems to be used for System Replication call them Tier1 and Tier2(I am exlcuding Tier3 to match your scenario)

1. Configure System Replication from Tier1-->Tier2

2. sr_takeover on Tier2, this confirms failover is working

3. Now sr_disable on Tier1 to make sure it can be configured as system replication secondary

4. Now if you restart Tier1 do you get the mismatch error for the System IDs?

Please attach the nameserver.ini file from both Tiers or copy paste the entries from the nameserver.ini under the section [landscape] so I can understand the cause of the error better

Sunil

0 Kudos

Hello Sunil

here entries

NODE1

[landscape]

id = 556eb908-bc3c-9cbf-e100-00000a010152

master = HOSTNAME1.FQDN:30001

worker = HOSTNAME1.FQDN

active_master = HOSTNAME1.FQDN:30001

idsr = 556eb908-bc3c-9cbf-e100-00000a010152

roles_HOSTNAME1.FQDN = worker

NODE2

[landscape]

id = 5570cb6a-22c4-8035-e100-00000a010154

master = HOSTNAME2.FQDN:30001

worker = HOSTNAME2.FQDN

active_master = HOSTNAME2.FQDN:30001

roles_HOSTNAME2.FQDN = worker

Steps I mentioned I think are clear.  Setting up replication, then takeover to node2, then configurate inverse replication and failback, and finally trying deactivate replica.

Thanks and regards

Javier

Former Member
0 Kudos

Thank you for nameserver.ini entries

I see the IDSR under Node1 and no more entry for IDSR on Node 2 so I am concluding the System replication is between Node2-->Node1 is that right?

I am breaking down your steps

Set up replication -->Success

Takeover to Node2--> Success

Configure Inverse replication and failback--> Success

After the above step Node2 is independent system after sr_takeover on Node1

Now if you execute sr_disable on Node2 do you see that error?

Sunil

0 Kudos

No, the current and last situation the replication was node1 --> node2  (after failback) and then I unregistered node2 , that was ok (node2 restarted just after this) , and then I tried last step ( disable replication defintively from node1)  and after that node1 can be restarted well, but node2 CANT be restarted any more.

No, after this

*****

Set up replication -->Success

Takeover to Node2--> Success

Configure Inverse replication and failback--> Success

*****

you must to return to situation normal, right? So you have to register node2 again to node1 and set up replication node1-->node2.  This would be the right steps.

So after THAT, you can be interested on deactivate replica (to maintainance for instance, or whatever). So to deactive definitively replica you have to do 3 things:

1-.stop secondary

2.-unregister secondary

3.-disable replication from primary

Former Member
0 Kudos

Javier,

Some questions you have two nodes 1 and 2.

Node 1 --> Node 2

Then Node 2 --> Node 1

and then finally Node 1 --> Node 2 right.

Then the steps would be first run sr_unregister from Node 2 and then sr_disable from Node 1

Thats how it should be done.

Please let me know if my understanding is correct.

Also if you really want to break it then probably you would have to use sr_cleanup but please be advised i think you should use it on primary only after sufficient instructions from SAP support.

0 Kudos

Right p517710

to disable there are 3 steps very clear on SAP Documentation.

1.-stop node2

2.- unregister node2

3-.disable node1

But at last step something it seems is not well working.

Former Member
0 Kudos

Ok I understand it better now

So to deactivate the replica you have to do 3 things:

1.- Stop secondary

2.- Unregister secondary

To complete the sr_unregister, it needs to be followed up with a restart of the HANA services

So can you try starting the Node2 after the sr_unregister and let me know the result without going to Step3?

You may have to redo the system replication setup first Node2-->Node1 and failback ton Node1 then implement the above 2 steps

Sunil

0 Kudos

Hi Sunil

After sr_unregister always I start database (is necessary). This start is ok.(without step 3)

And before this I re-did replica inversa ( I mentioned before) Node2--node1 and then failback . All this steps I have made them. And are well working.

Former Member
0 Kudos

Excellent...

So there is no error in starting up Node2 after Step2 but when you start Node2 after Step3 we see the mismatch error right?

Ideally the sr_disable on primary must not have any impact on Node2 that is no more a partner in system replication

Can you please execute the below command(with Node2 in started state) and share it

Would be interesting to see if any system tables have any reference to system replication

hdbnsutil -sr_state

Sunil

0 Kudos

Right

I was checked all the time sr_state. After step2 and database started it shows:

"hdbnsutil -sr_state"

checking for active or inactive nameserver...

System Replication State

---------------------------------

mode: none

done.

Former Member
0 Kudos

Ok that looks good

Can you please try the sr_disable on node2 first and restart it followed by the sr_disable on Node1 and again a restart on Node2 to see if the issue occurs?

Sunil

0 Kudos

Node2:

gcm-sht:/usr/sap/GCM/HDB00/exe/python_support> hdbnsutil -sr_disable

checking local nameserver:

checking for inactive nameserver ...

nameserver gcm-sht.domain.net:30001 not responding.

error: system must be running before system replication site can be disabled;

failed. trace file nameserver_gcm-sht.domain.net.00000.000.trc may contain more error details.

restart database node2 unsucessfully. Same behaviour

Then, on node1:

gcm-shp:/usr/sap/GCM/HDB00/exe/python_support> hdbnsutil -sr_disable

checking local nameserver:

checking for inactive nameserver ...

nameserver is running, proceeding ...

error: this site is not a source system;;

failed. trace file nameserver_gcm-shp.domain.net.00000.000.trc may contain mor error details.

gcm-shp:/usr/sap/GCM/HDB00/exe/python_support>

restart database node2 unsucessfully. Same behaviour

Former Member
0 Kudos

Thats little strange, the name server trace file is complaining about the mismatch in ID again right? No change in error?

it is not clear why would sr_disable on Node2 causes an error when the sr_unregister was successful and the system was already started after sr_unregister

We have done some good troubleshooting, I would suggest to open a OSS note with SAP for some support and share our findings

I would be happy to look at any other errors along with the mismatch error that you may see in the nameserver trace file on Node2

Sunil

0 Kudos

Yes,

Yes, Little strange.

I opened  ticket.

Thanks anyway

Former Member
0 Kudos

Hi Javier,

Please keep us posted with the feedback from SAP support

Sunil

Former Member
0 Kudos

If you are confident and checked thoroughly that secondary has been successfully unregistered then you can run sr_cleanup but its better to take SAP support advise before doing that since that step modifies the ini files.