Solved: MaxDB 7.6.06.24 crashed and is unstartable even in...

urs_schuerer · ‎01-28-2014

Today our Database crashed and is now in a similar state like described here http://scn.sap.com/thread/1669772. It seems to keep crashing on log recovery during db_online. It for sure has at least one bad index, but since we are unable to bring it online, we seem to have no chance to fix this. I would be thankfull for any hints.

Here are some facts:

The first crash (somewhat shortened) looked something like:

VERSION 'X64/LIX86 7.6.06   Build 024-123-246-595'
[...]
ERR 53419 B*TREE   BD600: predsep.k > sep.k: 0
ERR 53000 B*TREE   Index Root 13113447
ERR 53000 B*TREE   bd600BuildSeparator: lef: 13553565
ERR 53000 B*TREE   bd600BuildSeparator: rig: 14665430
ERR          0 B*TREE   m_Node: 13553565
ERR          0 B*TREE   m_Node1: 14665430
ERR 11599 BTRACE   ----> Symbolic Stack Back Trace <----
ERR 11599 BTRACE      0: 0x0000000000f216ad eo670_UnixTraceStack +0x01ad
ERR 11599 BTRACE      1: 0x0000000000f22079 eo670_CTraceContextStackOCB +0x0009
ERR 11599 BTRACE      2: 0x0000000000f2216a vtracestack +0x003a
ERR 11599 BTRACE      3: 0x0000000000a64ecb _ZN19cbd502_ReorgContext13ErrorHandlingEPKci +0x002b
ERR 11599 BTRACE      4: 0x0000000000a72178 _ZN11cbd500_Tree14bd520_OverflowER19cbd502_ReorgContextPhiibbiRi +0x0908
ERR 11599 BTRACE      5: 0x0000000000a727ab _ZN11cbd500_Tree17bd520LeafOverflowEPhibiRi +0x03ab
ERR 11599 BTRACE      6: 0x0000000000a3de71 _Z17bd400AddToInvTreeR17cbd300_InvCurrentPhiS1_iRb +0x1181
ERR 11599 BTRACE      7: 0x00000000009a6a1f b03add_inv +0x01ff
ERR 11599 BTRACE      8: 0x0000000000b4538c _ZNK14Log_InvDescMap6AddInvER18tgg00_TransContextbbbRK12tgg00_FileIdPK9tgg00_Rec

+0x016c
ERR 11599 BTRACE      9: 0x0000000000b4345e _ZNK15Log_InvHandling6AddInvER18tgg00_TransContext +0x003e
ERR 11599 BTRACE     10: 0x000000000092f669 kb611inv_AddInv +0x0009
ERR 11599 BTRACE     11: 0x000000000092dbe1 kb61insert_rec +0x0201
ERR 11599 BTRACE     12: 0x000000000092e887 k61ins_del_upd +0x0267
ERR 11599 BTRACE     13: 0x00000000008dba3a k05functions +0x058a
ERR 11599 BTRACE     14: 0x000000000065fb97 a06lsend_mess_buf +0x0297
[...]
ERR 53250 INDEX    Bad Index 13113447 (Root)
ERR 53250 INDEX    Reason "System error: BD Invalid leave"
         12853 DBSTATE Caught signal 11(SIGSEGV)
ERR 11330 COREHAND ABORTING due to signal 11
ERR 11599 BTRACE   ----> Symbolic Stack Back Trace <----
ERR 11599 BTRACE      0: 0x0000000000f216ad eo670_UnixTraceStack +0x01ad
ERR 11599 BTRACE      1: 0x0000000000f22079 eo670_CTraceContextStackOCB +0x0009
ERR 11599 BTRACE      2: 0x0000000000f22097 eo670_CTraceContextStack +0x0017
ERR 11599 BTRACE      3: 0x0000000000f65718 en81_CrashSignalHandler +0x00e8
ERR 11599 BTRACE      4: 0x00007f40e100b6b0 __restore_rt +0x0000
ERR 11599 BTRACE      5: 0x0000000000a74f20 _ZNK11cbd600_Node14bd600LeafCountEii +0x00b0
ERR 11599 BTRACE      6: 0x0000000000a42728 _Z23bd401CalculatePageCountR17cbd300_InvCurrentPhiS1_ibRiS2_S2_ +0x0658
ERR 11599 BTRACE      7: 0x00000000009ab093 b03calculate_page_count +0x02b3
ERR 11599 BTRACE      8: 0x000000000065d9f1 a06eval_page_count +0x0171
ERR 11599 BTRACE      9: 0x0000000000878292 ak720indexeval +0x0142
ERR 11599 BTRACE     10: 0x0000000000878b5a ak720eval_one_index +0x016a
ERR 11599 BTRACE     11: 0x0000000000878e73 ak720index_decision +0x0183
+++++++++++++ Kernel Exit ++++++++++++++++++++++++++++

So whenever we try to now switch to db_online (LOCAL_REDO_LOG_BUFFER_SIZE is 0 as it was before) it immediately crashes with
ERR
-24994,ERR_RTE: Runtime environment error
4,connection broken server state 4

VERSION 'X64/LIX86 7.6.06 Build 024-123-246-595'
[...]

Log      0 queues, flushmode is 'MaximizeSafety', devstate is 'Okay'
Log      Oldest not saved is ioseq 296996736 @ off 775686
Log      First known on LogVolume is ioseq 295948220 @ off 775741
Log      Restart from ioseq 296996789 @ off 775739 to ioseq 296996790 @ off 775740
Log      Result after checking the log device: 'Ok'
Log      The number of active logging-queues has been increased to 4
OBJECT   Restarted Garbage coll: 1
Rst      968 redo transactions readable and 32 redo tasks available.
RESTART Previous restart was interrupted.
Restart recovering log from log_volume from IOSeq: '296996789'
Log      normal end of log found at off 775740 lastseq 296996790.
Log      last-redo-read empty errlist#13:TR907215147(11)[296996790]@775740.1608'Commit':20140128:135616
DBSTATE Caught signal 11(SIGSEGV)
ERR 11330 COREHAND ABORTING due to signal 11
[...]
ERR 11599 BTRACE   ----> Symbolic Stack Back Trace <----
ERR 11599 BTRACE      0: 0x0000000000f216ad eo670_UnixTraceStack +0x01ad
ERR 11599 BTRACE      1: 0x0000000000f22079 eo670_CTraceContextStackOCB +0x0009
ERR 11599 BTRACE      2: 0x0000000000f22097 eo670_CTraceContextStack +0x0017
ERR 11599 BTRACE      3: 0x0000000000f65718 en81_CrashSignalHandler +0x00e8
ERR 11599 BTRACE      4: 0x00007f1f4412f6b0 __restore_rt +0x0000
ERR 11599 BTRACE      5: 0x0000000000a3ac49 _Z20bd400_DeleteSubTreesR17cbd300_InvCurrentR11cbd600_Node+0x0109
ERR 11599 BTRACE      6: 0x0000000000a3b490 _Z16bd400DropInvTreeR17cbd300_InvCurrent +0x0370
ERR 11599 BTRACE      7: 0x00000000009a9a4e bd03ReleaseInvTree +0x013e
ERR 11599 BTRACE      8: 0x000000000096d7d3 bd01destroy_file +0x0233
ERR 11599 BTRACE      9: 0x000000000096e7b9 b01pdestroy_perm_file +0x00d9
ERR 11599 BTRACE     10: 0x0000000000b2c0a4 _ZNK14Log_ActionFile13RemoveGarbageER18tgg00_TransContextR20SAPDBErr_MessageList +0x02f4
ERR 11599 BTRACE     11: 0x0000000000b2c5bc _ZNK14Log_ActionFile7ExecuteER18tgg00_TransContextNS_13ExecutionTypeE +0x039c

ERR 11599 BTRACE     12: 0x0000000000b2d78e _ZNK14Log_ActionFile4RedoER18tgg00_TransContextR10Log_IImageR20SAPDBErr_MessageList +0x000
ERR 11599 BTRACE     13: 0x0000000000b39d2b _Z10RedoActionR15Log_TransactionR10Log_IImageR11Log_IActionR15Data_IBreakableR21Data_Split
ERR 11599 BTRACE     14: 0x0000000000b3bd11 _Z21Log_ActionExecuteRedoR15Log_TransactionR14Log_AfterImage16Log_IOSequenceNoR21Data_Spli
ERR 11599 BTRACE     15: 0x0000000000b544b5 _ZN15Log_Transaction4RedoER20SAPDBErr_MessageList +0x02e5
+++++++++++++ Kernel Exit ++++++++++++++++++++++++++++

If we try a consistency check in admin mode it stumbles over one bad index complaining on root node 13113447, but of course we cannot repair that in admin state or even find out which index is to blame (can we?):

>db_execute check data with update
ERR
-24988,ERR_SQL: SQL error
-9041,System error: BD Index not accessible
17,Servertask Info: because b01pverify_participant() failed
10,Job 0 (Check Data) [executing] WaitingT206 Result=OK
6,b01pverify_participant() failed, Error code 715 "index_not_accessible"
We even tried to recover our last backup (with AUTO_RECREATE_BAD_INDEXES set to YES) but as soon as we try to for instance drop the index we assume to be bad it immediately crashes and is unstartable again like above.

Is there any way to restore without indexes or to have all indexes recreated? Any hints how to still bring the original DB back online?

Cheers,

Urs

Former Member · ‎01-29-2014

Hi Urs,

you set the parameter AUTO_RECREATE_BAD_INDEXES set to YES

which means that if a corrupted index is detected this index is recreated during restart.

The stack back trace above tells us that the problem is on a subtree of an index which looks like a huge index.

I would try now to set the parameter AUTO_RECREATE_BAD_INDEXES to NO. Then the corrupted index stays in the system and won't be created implicitely during restart.

When the databas eis online you can check if there are any corrupted indexes in the system and you should get the information which index is corrupted. Don't try to drop this index - it crashes the database again.

If the database is in online mode the table which the index belongs to is read only - because the affected index is a uniqueue index.

The next step now is to upgrade the database to the newest 7.6.06. Build. to solve this error -> predsep.k > sep.k: 0

After this upgrade is done sucessfully you can try to drop the index.

Hope this will work as workaround.

Regards, Christiane

urs_schuerer · ‎06-07-2014

I am so sorry, to bring this up again:

After last time we did need to do a point in time recovery a few transactions before the fatal stuff, it seems to be we are stuck at the same point:

Database Version now is 7.6.06 Build 027-123-248-897

We again have a bad index:

ERR 53000 B*TREE   Index Root  13709821

There was a transaction using this index which made the DB crash with SIGSEGV
AUTO_RECREATE_BAD_INDEXES is set to NO

As soon as we now try to get into db_online the evil transaction seems to be read from the log

Rst      968 redo transactions readable and 32 redo tasks available.
Restart  recovering log from log_volume from IOSeq: '71017143'
Log      normal end of log found at off 764391 lastseq 71018647.
Log      last-redo-read empty errlist#3245:TR973675497(7)[71018647]@764391.864'Commit':20140607:164721
DBSTATE  Caught signal 11(SIGSEGV)
COREHAND ABORTING due to signal 11

So again now way to drop the bad Index or start the DB ...

Just to make sure: I know we need to move to 7.8 soon, but a DB that can not be started any more is the worst case scenario with all data being lost. Just to make this report contain some facts, here is (part off) the stack:

eo670_UnixTraceStack +0x01ad
eo670_CTraceContextStackOCB +0x0009
eo670_CTraceContextStack +0x0017
en81_CrashSignalHandler +0x00e8
__restore_rt +0x0000
_Z20bd400_DeleteSubTreesR17cbd300_InvCurrentR11cbd600_Node +0x0109
_Z16bd400DropInvTreeR17cbd300_InvCurrent +0x0370
bd03ReleaseInvTree +0x013e
bd01destroy_file +0x0233
b01pdestroy_perm_file +0x00d9
_ZNK14Log_ActionFile13RemoveGarbageER18tgg00_TransContextR20SAPDBErr_MessageList +0x02f4
_ZNK14Log_ActionFile7ExecuteER18tgg00_TransContextNS_13ExecutionTypeE +0x039c
_ZNK14Log_ActionFile4RedoER18tgg00_TransContextR10Log_IImageR20SAPDBErr_MessageList +0x000
_Z10RedoActionR15Log_TransactionR10Log_IImageR11Log_IActionR15Data_IBreakableR21Data_Split
SpaceReaderR20SAPDBErr_MessageList +0x008b
_Z21Log_ActionExecuteRedoR15Log_TransactionR14Log_AfterImage16Log_IOSequenceNoR21Data_Spli
tSpaceReaderRN31Data_ChainSplitSpaceForwardReadI12Rst_RedoPageE8IteratorER20SAPDBErr_MessageList +0x0ae1
_ZN15Log_Transaction4RedoER20SAPDBErr_MessageList +0x02e5
_ZN22Rst_RedoTrafficControl11ExecuteJobsEiR18tgg00_TransContextR20SAPDBErr_MessageList +0x
_ZN16SrvTasks_JobRedo13ExecuteInternER13Trans_Context +0x0100
_ZN12SrvTasks_Job15ExecuteDirectlyER13Trans_Context +0x006d

MaxDB 7.6.06.24 crashed and is unstartable even in restored Backup

Accepted Solutions (1)

Accepted Solutions (1)

Answers (1)

Answers (1)

Re: [BIG PROBLEM] SAP Host Agent cannot connect to...

Re: using Already availble XSUAA service to anothe...

Re: CDS Projection View parameters not allowed

using Already availble XSUAA service to another ap...

Re: How to show value help with values for Applica...