on 07-16-2015 8:29 PM
Hello
We are setting up Hana System Replication.
Our landscape is composed by 2 Appliances HANA SUSE 11.3:
Hana Revision: 93.
All tests relative to HSR (Hana System Replication) are working except one: deactivating Replication.
In detail:
- Activating Replication is working ok
- Registering secondary node and data replication (from node1 to node2) is well working
- Performing takeover from node1 to node2 is ok
- Registering node1 to node 2 and data replication (from node2 to node1 ) is ok
- Performing failback from node2 to node1 is ok
But the issue appears if I want deactivate replication:
- Unregistering secondary node (using hdbnsutil -sr_unregister or hana Studio) it semms ok and database node2 starts ok.
- But if I deactivate replication definitively ( using command hdbnsutil -sr_disable or using Hana Studio) , it seems is ok, but database node 2 can be restarted anymore.
Always we see an error on trace file of Hana node2 saying :
“[104819]{-1}[-1/-1] 2015-07-16 12:09:25.203190 f NameServer TREXNameServer.cpp(03338) : landscape ID mismatch between nameserver.ini/[landscape]/id (5570cb6a-22c4-8035-e100-00000a010154) and topology.ini (556eb908-bc3c-9cbf-e100-00000a010152)
-> stopping instance ...
It seems deactivation replication is not well working…
.. In this situation we only can do Activation replication from node 1, and register node 2 to 1, and then we establish again replication 1 –> 2. But we can’t deactivate it definitively.
Any suggestion?
Thanks and regards
Hi Iribarne,
We are currently on HANA SPS09 HANA Revision 96 (MDC)
There is a 3 Tier System Replication setup in our landscape and we just completed Replication failover and failback tests this month
Part of the failover and failback we have to disable the system replication on certain systems, I have not seen any specific error on our systems while working on them.
Your sequence of steps is not entirely clear to me so I am trying to put the steps together
Consider we have 2 HANA systems to be used for System Replication call them Tier1 and Tier2(I am exlcuding Tier3 to match your scenario)
1. Configure System Replication from Tier1-->Tier2
2. sr_takeover on Tier2, this confirms failover is working
3. Now sr_disable on Tier1 to make sure it can be configured as system replication secondary
4. Now if you restart Tier1 do you get the mismatch error for the System IDs?
Please attach the nameserver.ini file from both Tiers or copy paste the entries from the nameserver.ini under the section [landscape] so I can understand the cause of the error better
Sunil
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Sunil
here entries
NODE1
[landscape]
id = 556eb908-bc3c-9cbf-e100-00000a010152
master = HOSTNAME1.FQDN:30001
worker = HOSTNAME1.FQDN
active_master = HOSTNAME1.FQDN:30001
idsr = 556eb908-bc3c-9cbf-e100-00000a010152
roles_HOSTNAME1.FQDN = worker
NODE2
[landscape]
id = 5570cb6a-22c4-8035-e100-00000a010154
master = HOSTNAME2.FQDN:30001
worker = HOSTNAME2.FQDN
active_master = HOSTNAME2.FQDN:30001
roles_HOSTNAME2.FQDN = worker
Steps I mentioned I think are clear. Setting up replication, then takeover to node2, then configurate inverse replication and failback, and finally trying deactivate replica.
Thanks and regards
Javier
Thank you for nameserver.ini entries
I see the IDSR under Node1 and no more entry for IDSR on Node 2 so I am concluding the System replication is between Node2-->Node1 is that right?
I am breaking down your steps
Set up replication -->Success
Takeover to Node2--> Success
Configure Inverse replication and failback--> Success
After the above step Node2 is independent system after sr_takeover on Node1
Now if you execute sr_disable on Node2 do you see that error?
Sunil
No, the current and last situation the replication was node1 --> node2 (after failback) and then I unregistered node2 , that was ok (node2 restarted just after this) , and then I tried last step ( disable replication defintively from node1) and after that node1 can be restarted well, but node2 CANT be restarted any more.
No, after this
*****
Set up replication -->Success
Takeover to Node2--> Success
Configure Inverse replication and failback--> Success
*****
you must to return to situation normal, right? So you have to register node2 again to node1 and set up replication node1-->node2. This would be the right steps.
So after THAT, you can be interested on deactivate replica (to maintainance for instance, or whatever). So to deactive definitively replica you have to do 3 things:
1-.stop secondary
2.-unregister secondary
3.-disable replication from primary
Javier,
Some questions you have two nodes 1 and 2.
Node 1 --> Node 2
Then Node 2 --> Node 1
and then finally Node 1 --> Node 2 right.
Then the steps would be first run sr_unregister from Node 2 and then sr_disable from Node 1
Thats how it should be done.
Please let me know if my understanding is correct.
Also if you really want to break it then probably you would have to use sr_cleanup but please be advised i think you should use it on primary only after sufficient instructions from SAP support.
Ok I understand it better now
So to deactivate the replica you have to do 3 things:
1.- Stop secondary
2.- Unregister secondary
To complete the sr_unregister, it needs to be followed up with a restart of the HANA services
So can you try starting the Node2 after the sr_unregister and let me know the result without going to Step3?
You may have to redo the system replication setup first Node2-->Node1 and failback ton Node1 then implement the above 2 steps
Sunil
Excellent...
So there is no error in starting up Node2 after Step2 but when you start Node2 after Step3 we see the mismatch error right?
Ideally the sr_disable on primary must not have any impact on Node2 that is no more a partner in system replication
Can you please execute the below command(with Node2 in started state) and share it
Would be interesting to see if any system tables have any reference to system replication
hdbnsutil -sr_state
Sunil
Node2:
gcm-sht:/usr/sap/GCM/HDB00/exe/python_support> hdbnsutil -sr_disable
checking local nameserver:
checking for inactive nameserver ...
nameserver gcm-sht.domain.net:30001 not responding.
error: system must be running before system replication site can be disabled;
failed. trace file nameserver_gcm-sht.domain.net.00000.000.trc may contain more error details.
restart database node2 unsucessfully. Same behaviour
Then, on node1:
gcm-shp:/usr/sap/GCM/HDB00/exe/python_support> hdbnsutil -sr_disable
checking local nameserver:
checking for inactive nameserver ...
nameserver is running, proceeding ...
error: this site is not a source system;;
failed. trace file nameserver_gcm-shp.domain.net.00000.000.trc may contain mor error details.
gcm-shp:/usr/sap/GCM/HDB00/exe/python_support>
restart database node2 unsucessfully. Same behaviour
Thats little strange, the name server trace file is complaining about the mismatch in ID again right? No change in error?
it is not clear why would sr_disable on Node2 causes an error when the sr_unregister was successful and the system was already started after sr_unregister
We have done some good troubleshooting, I would suggest to open a OSS note with SAP for some support and share our findings
I would be happy to look at any other errors along with the mismatch error that you may see in the nameserver trace file on Node2
Sunil
User | Count |
---|---|
89 | |
10 | |
9 | |
9 | |
9 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.