cancel
Showing results for 
Search instead for 
Did you mean: 

Indexserver crashed on Master and Slave node

Former Member
0 Kudos

Experts

Some of the Services failed ( indexserver, name server ) today in our production system which is a multi node ( 4 nodes). lot of dumps in the trace directory and when navigating it found the following error

[CRASH_SHORTINFO]  exception short info: (2015-06-05 14:04:01 723 Local)

SIGNAL 6 (SIGABRT) caught, sender PID:  54853, PID: 54853, thread: 5808[thr=55061]: FileCompletionThread-DATA-0, value int: 1816, ptr: 0x0000000000000718, time: 2015-06-05 14:04:01 000 Local

Instance HDP/00, OS Linux hana1 3.0.34-0.7-default #1 SMP Tue Jun 19 09:56:30 UTC 2012 (fbfc70c) x86_64

[OK]

I see the same error on tall the failed nodes. What does this Signal 6 mean and what needed to be done to prevent in the future.

We are at 1.00.82.00.394270 (NewDB100_REL). I see some notes for Signal 10 but not many for Signal 6.

I am going to put a message to SAP as well.

Mahesh Shetty

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

I also see some errors similar to the one below and one note pointed "2062631 - high availability limitation for SAN storage"

15: 0x00007f6df0c3921d in Execution::Thread::staticMain(void*)+0x39 at Thread.cpp:545 (libhdbbasis.so)

exception  1: no.2000008  (Basis/IO/FileAccess/impl/LocalFileCallback.cpp:264)

    Error during asynchronous file transfer, rc=5: Input/output error; $fileCallback$=[W] , buffer= 0x00007f6b72b83000, offset= 149422080, size= 0/262144, file= "<root>/datavolume_0000.dat" (mode= RW, access= rwrwr-, flags= DIRECT|MUST_EXIST), factory= (root= "/HANA/IMDB-data/HDP/mnt00001/hdb00004/" (access= rwrwr-, flags= AUTOCREATE_DIRECTORY, usage= DATA, fs= nfs, config= (AsyncWriteSubmitActive=auto,AsyncWriteSubmitBlocks=new,AsynReadSubmit=off,#SubmitQueues=1,#CompletionQueues=1)))) {shortRetries= 0, fullRetries= 10 (10/10)}; $res$=Page[ConvLeafPage]@0x00007f6b73086a80

exception throw location:

This was exactly the same situation except one thing. Not only Master and Standby node, slave node index server also crashed. So the note is close but not the exactl. if anyone has seen similar  issue?

Former Member
0 Kudos

The "input/output error" is a problem on a layer below SAP HANA. SAP Note 2177064 that provides more detailed advice. In order to understand it exactly it is required to have a decent look into the traces (to check for time relations and whether it is ping-pong / sporadic or permanent).

Just today I have come across a similar situation ("resource temporarily unavailable") and it turned out that the crash was caused by the fact that the failover nameserver wanted to lock its DATA file that was still locked by the original nameserver. The overall root cause was a high system CPU consumption related to non-SAP HANA processes on the master node.


Former Member
0 Kudos

Thanks Martin

I have raised a message with SAP and attached all the Crashdump files.  Hope to hear back from you ASAP. Meanwhile I see the following errors in our Slave Indexserver.

81703]{602503}[-1/-1] 2015-06-08 10:01:26.961509 e StatementRouting Connection.cc(06374) : anchor is switched in runtime: last sync id=0, session anchor id=602502, session anchor host=hana2-data, session anchor port=30015, my conn id=2503, my volume id=6, global session id=602503, anchor global session id=602503

[81703]{602503}[-1/-1] 2015-06-08 10:01:26.961586 e StatementRouting Connection.cc(06384) : current session context: systemWatermark=65940518,slaveInitCount=-1,version=5,contextId=518148,sessionId=2503,volumeId=6,anchorsessionId=2503,anchorvolumeId=6,version=2,user=PRD2SLP,schema=PRD2SLP,locale=en_US,collate=BINARY,client=,ext_prio= ,dateformat=,reserveprefix=true,ddlautocommit=true,checkPasswordChangeNeeded=false,abapVarcharMode=true,largeNumberOfParametersSupport=true,totalRowCount=0,enableLobOperations=1,enableDeferredLobOperation=0,hasStatefulCtxBitmap=3,tmpTableCount=0

[81703]{602503}[-1/-1] 2015-06-08 10:01:26.961590 e StatementRouting Connection.cc(06387) : anchor is switched in runtime: previous anchor conn id=602503, new anchor conn id=602502

[81703]{602503}[-1/-1] 2015-06-08 10:01:26.961594 e StatementRouting Connection.cc(06391) : conn id=2503, logontime=2015-06-08 10:01:26.9440000

[81703]{602503}[-1/-1] 2015-06-08 10:01:26.961681 e StatementRouting Connection.cc(06565) : validation failure of session context: failed routed execution: anchor is switched in runtime

[81703]{602503}[-1/-1] 2015-06-08 10:01:26.961745 e StatementRouting sm_codec_newdb.cc(09131) : (newdb codec, tx flags) session will be closed due to the erorr: [600] failed routed execution: anchor is switched in runtime

[81703]{602503}[-1/-1] 2015-06-08 10:01:26.961751 e StatementRouting sm_codec_newdb.cc(09135) : (newdb codec, tx flags) action type: 7, codec string=, conn=0x00007f231aba0400, conn id=2503, stmt=0x0000000000000000, stmt id=18446744073709551615, stmt string=

I found a note and it says to upgrade HANA Client. We have dumps happening in our production system connecting to HANA DB. This is a production System and upgrading the client needs lot of approvals and outage windows. Do we have any alternate solution.

2090424 - Error -10108 (Session has been reconnected) after failover of master node

Mahesh Shetty