on 04-25-2015 2:59 AM
Hello Expert ,
After enabling replication - we found that after 2% replication is stuck and indexserver shows following errors :
TrexNet EndPoint.cpp(00260) : ERROR: failed to open channel <2ndry_HostIP>:33102! reason: (connection refused)
rexNet EndPoint.cpp(00260) : details:
TNS TNSClient.cpp(00671) : sendRequest dr_secondaryactivestatus to <2ndry_Hostname>:33102 failed with NetException. data=(S)host=<2ndry_Hostname>|service=statisticsserver|(I)drsender=1|port=33005|
TNS TNSClient.cpp(00671) : sendRequest dr_secondaryactivestatus to <2ndry_Hostname>:33001 failed with NetException. data=(S)host=<2ndry_Hostname>|service=statisticsserver|(I)drsender=1|port=33005|
sr_nameserver TNSClient.cpp(06915) : error when sending request 'dr_secondaryactivestatus' to <2ndry_Hostname>:33102: connection refused,location=<2ndry_Hostname>:33001
Same text appears for nameserver , indexserver etc...
End of trace shows the below text :
Stream NetworkChannelCompletion.cpp(00524) : NetworkChannelCompletionThread #0 NetworkChannel FD 173 [0x00007ff18449e158] {refCnt=6, idx=0} 172.28.90.185/33103_tcp-><2ndry_HostIP>/56197_tcp Connected,[-w--]
: Error in asynchronous stream event: exception 1: no.2110001 (Basis/IO/Stream/impl/NetworkChannelCompletion.cpp:450)
Generic stream error: getsockopt, Event=EPOLLERR - , rc=110: Connection timed out
$NetworkChannelBase$=
NetworkChannel FD 173 [0x00007ff18449e158] {refCnt=6, idx=0} <1ry_HostIP>/33103_tcp-><2ndry_HostIP>/56197_tcp Connected,[-w--]
exception throw location:
1: 0x00007ff6fed5d6ca in Stream::NetworkChannelCompletionThread::run(void*&)+0x686 at NetworkChannelCompletion.cpp:450 (libhdbbasis.so)
To me it looks like some network issue but i am not able to reach and conclusion , HANA rev used is Rev91.
Thanks
Dev
Hello All ,
We had a problem at the network switch - our N/W team has corrected it and replication seems to be fine now.
Thanks
Dev
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
Did you update HANA client on SLT & Source along with HANA server?
looks to me client compatibilty issue. Though HANA client is upward & downward compatible, it may cause issues in particular cases. Please check on these lines.
Best Regards
Sachin
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Devpriy,
As per the log it says that network issue. Can you please try reactivating the replication.
The below Note may not be suits you. But, the log in the note mentioned the same.
Regards,
Pavan Gunda
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Dev,
I will just assume that you have religiously followed the System Replication Setup guides and have also respected the network requirements for replication.
Please check your secondary.
As <sidadm> on secondary.
1. HDB status ( you should see running HDB processes)
2. hdbnsutil -sr_state (this will show you your replication setup and state)
3. Check on your secondary that port 33102 is being listened on. I believe it's the hdbpreprocessor that listens on this port.
netstat -an | grep 33102 | grep LISTEN
lsof -i :33102 | grep LISTEN
Kind Regards,
Amerjit
Hello Amerjit -
Thanks -
1. Steps and Network settings both were taken care as we had a working system which we had cleaned to work towards new system.
2. All the Processes are running fine at the secondary end.
3.hdbnsutil -sr_state had shown us to re-setup the replication but after 4 % again the same message appears as there is a timeout happening as per the below log :
sr_nameserver TNSClient.cpp(06915) : error when sending request 'dr_secondaryactivestatus' to <2ndryhostname>:33102: timeout occured,location=<2ndryhostname>:33102
4.hdbnameserver is listening on it.
Thanks
Dev
Hi Dev,
You spotted my error (preprocessor as opposed to nameserver).
Check your nameserver (and other) trace files on the secondary
Just as a side comment:
1. Make sure that column preload is false and that you have set a global allocation limit.
2. Your /etc/hosts is correct on both PRI and SECO nodes.
Amerjit
User | Count |
---|---|
84 | |
24 | |
12 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.