on 06-20-2016 7:39 PM
Hi Team,
We have a topology where we have four nodes, one of them being standby.
Now a failover occured last week from master to the standby node.
How do we know if it was not a manual failover.
When checking the indexserver logs and nameserver and daemon i am not able to find out if its manual or automatic.
daemon trace file
[2345]{-1}[-1/-1] 2016-05-25 23:15:09.909331 i Daemon | TrexDaemon.cpp(09084) : signo 3 SIGQUIT from user errno 0 code 0 |
[2345]{-1}[-1/-1] 2016-05-25 23:15:09.909375 i Daemon | TrexDaemon.cpp(09084) : sender pid 8899 real user id 50263 executable '/hana/data/shared/SID/exe/linuxx86_64/HDB<>/sapstart' |
[2345]{-1}[-1/-1] 2016-05-25 23:15:09.909414 i Daemon | TrexDaemon.cpp(14175) : got shutdown event (stop) |
[2345]{-1}[-1/-1] 2016-05-25 23:15:09.909436 i Daemon | TrexDaemon.cpp(14871) : comment file contains: |
[2345]{-1}[-1/-1] 2016-05-25 23:15:09.927401 i Daemon | TrexDaemon.cpp(14091) : all instances in runlevel 5 stopped |
[2345]{-1}[-1/-1] 2016-05-25 23:15:09.927408 i Daemon | TrexDaemon.cpp(11788) : stop process hdbwebdispatcher with pid 14707 |
[2345]{-1}[-1/-1] 2016-05-25 23:15:09.927587 i Daemon | TrexDaemon.cpp(08784) : stopped child with pid 14707 (14707) |
[2345]{-1}[-1/-1] 2016-05-25 23:15:12.617500 i Daemon | TrexDaemon.cpp(14397) : process hdbwebdispatcher with pid 14707 exited normally with status 0 |
[2345]{-1}[-1/-1] 2016-05-25 23:15:12.617529 i Daemon | TrexDaemon.cpp(14473) : all instances in runlevel 4 stopped |
[2345]{-1}[-1/-1] 2016-05-25 23:15:12.617533 i Daemon | TrexDaemon.cpp(11788) : stop process hdbindexserver with pid 60982 |
[2345]{-1}[-1/-1] 2016-05-25 23:15:12.617940 i Daemon | TrexDaemon.cpp(08784) : stopped child with pid 60982 (60982) |
[2345]{-1}[-1/-1] 2016-05-25 23:15:12.617951 i Daemon | TrexDaemon.cpp(11788) : stop process hdbxsengine with pid 60988 |
[2345]{-1}[-1/-1] 2016-05-25 23:15:12.618078 i Daemon | TrexDaemon.cpp(12345) : stopped child with pid 60988 (60988) |
[2345]{-1}[-1/-1] 2016-05-25 23:15:19.533758 i Daemon | TrexDaemon.cpp(14397) : process hdbxsengine with pid 60988 exited normally with status 0 |
[2345]{-1}[-1/-1] 2016-05-25 21:52:12.717629 i Daemon | TrexDaemon.cpp(11807) : kill process hdbindexserver with pid 60982 |
[2345]{-1}[-1/-1] 2016-05-25 21:52:12.718210 i Daemon | TrexDaemon.cpp(08803) : killed child with pid 60982 (60982) |
Hi,
Nameserver controls failover / takeover tasks so you might want to check that trace as well as the restart OS command from master node.
BRs,
Lucas de Oliveira
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
Of course. If the system is shutdown manually you'll be able to see all the processes it goes through until it detaches from the persistence.
Component Service_Shutdown will be used in that sense. Example on indexserver traces:
[52612]{-1}[-1/-1] 2016-06-20 15:15:12.224672 i Service_Shutdown TrexService.cpp(00803) : Preparing for shutting service down
[3373]{-1}[-1/-1] 2016-06-20 15:15:12.334555 i Service_Shutdown TREXIndexServer.cpp(04465) : Triggering timezone checker shutdown
[3373]{-1}[-1/-1] 2016-06-20 15:15:12.334585 i assign TREXIndexServer.cpp(04475) : unassign from volume 4
[3373]{-1}[-1/-1] 2016-06-20 15:15:12.334593 i Service_Shutdown TREXIndexServer.cpp(04478) : Preparing to shutdown
[3373]{-1}[-1/-1] 2016-06-20 15:15:12.334595 i Service_Shutdown TREXIndexServer.cpp(04486) : stopping statistics server worker threads
[...........]
[3373]{427086}[26/57973580] 2016-06-20 15:15:13.375085 i SQLSessionCmd Connection.cc(07739) : cancel requested for 400129 (logical connection id is 400129
[3373]{-1}[-1/-1] 2016-06-20 15:15:13.376586 i Service_Shutdown TREXIndexServer.cpp(04522) : stopping federation statistics collection
[3373]{-1}[-1/-1] 2016-06-20 15:15:13.376613 i Service_Shutdown TREXIndexServer.cpp(04526) : stopping extended storage heartbeat thread
[3373]{-1}[-1/-1] 2016-06-20 15:15:13.379852 i Service_Shutdown TREXIndexServer.cpp(04560) : Abort logreplay
[3373]{-1}[-1/-1] 2016-06-20 15:15:13.379870 i Service_Shutdown TREXIndexServer.cpp(04565) : Stopping LOBGarbageCollectorThread
[3373]{-1}[-1/-1] 2016-06-20 15:15:13.380129 i Service_Shutdown TREXIndexServer.cpp(04589) : Stopping EPM
[3373]{-1}[-1/-1] 2016-06-20 15:15:13.471506 i Service_Shutdown TREXIndexServer.cpp(04593) : Stopping Embedded Catalyst Services
[3373]{-1}[-1/-1] 2016-06-20 15:15:14.396296 i Service_Shutdown TREXIndexServer.cpp(04598) : Stopping PlanningEngine
[3373]{-1}[-1/-1] 2016-06-20 15:15:14.503468 i Service_Shutdown TREXIndexServer.cpp(04613) : Stopping SQL session service
[3373]{-1}[-1/-1] 2016-06-20 15:15:14.503520 i Service_Shutdown tcp_listener.h(00122) : Shutdown SqlListener service
[53750]{-1}[-1/-1] 2016-06-20 15:15:14.569328 i Service_Shutdown tcp_listener.cc(00636) : stop the SQL listening port: 30415
[3373]{-1}[-1/-1] 2016-06-20 15:15:14.684304 i Service_Shutdown TREXIndexServer.cpp(04617) : Stopping SQL plan cache
[..........]
[52612]{-1}[-1/-1] 2016-06-20 15:16:22.757531 i Service_Shutdown TrexService.cpp (00975) : Deleting pools
[52612]{-1}[-1/-1] 2016-06-20 15:16:22.757543 i Service_Shutdown TrexService.cpp (00985) : Deleting configuration
[52612]{-1}[-1/-1] 2016-06-20 15:16:22.757546 i Service_Shutdown TrexService.cpp (00989) : Removing pidfile
[52612]{-1}[-1/-1] 2016-06-20 15:16:22.758133 i Service_Shutdown TrexService.cpp (01026) : System down
A crash on the other hand would be very descriptive with possibly no actual shutdown. Example:
[40204]{-1}[10/54417462] 2016-05-20 00:36:06.342783 e Basis FaultProtectionImpl.cpp(01610) : SIGNAL 11 (SIGSEGV) caught, thread: 7887[thr=40204]: Assign, addr: 0x0000000000000000, time: 2016-05-20 00:36:06 000 Local
Instance LO2/08, OS Linux vanpghana05 3.0.101-68-default #1 SMP Tue Dec 1 16:21:37 UTC 2015 (ed01a9f) x86_64
----> Register Dump <----
rax: 0x0000000000000020 rbx: 0x00007fd87573f1b8
rcx: 0x0000000000000000 rdx: 0x0000000000000003
rsi: 0x0000000000000000 rdi: 0x00007fd9490a6620
rsp: 0x00007fd9490a5118 rbp: 0x00007fd8a518d910
r08: 0x0000000000000000 r09: 0x0000000000000007
[...]
[40204]{-1}[10/54417462] 2016-05-20 00:36:06.342963 i Basis Helper.cpp(00514) : Using 'x64_64 ABI unwind' for stack tracing
[40204]{-1}[10/54417462] 2016-05-20 00:36:06.342783 e Basis FaultProtectionImpl.cpp(01610) : NOTE: full crash dump will be written to /usr/sap/LO2/HDB08/vanpghana05/trace/indexserver_vanpghana05.30803.crashdump.20160520-003606.039630.trc
Here we had a signal 11 but could be any OS signal for terminating processes.
Hope that helps.
Regards,
Lucas de Oliveira
What excactly you mean by Manual or automatic FO, as ideally whenever any node goes down in HANA, thru automatic fail over mechanism it starts failing over on avaible free stand by node ??
Which version you are in?
Did you faced any errors/delay while this FO happened??
Go through nameserver***.trc & Indexserver.trc logs, you will get all related information in detail..
-Anand.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
94 | |
11 | |
11 | |
10 | |
9 | |
8 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.