on 11-30-2015 9:39 PM
Hi
This past weekend our Main Data Center suffered a catastrophic power loss. All our ASE and Rep Servers shutdown
During recovery we were getting 6025 Error in Replication Server (Rep Server 15.5 )
E. 2015/11/28 18:52:21. ERROR #6025 RSI(GBL_RS_GRN2) - m/sqmext.c(14908)
Block consistency failed q 16777322:0, 7990567:29:0 23920 contents q 16777322:0, 7990564:29:0 23920. old=7990567.29
I. 2015/11/28 18:52:21. Replication Agent for GBL_TRD.statement_mgr connected in passthru mode.
I. 2015/11/28 18:52:21. RSI: connection to 'GBL_RS_LON103' is established and the route is active.
I. 2015/11/28 18:52:21. Trying to connect to server 'GBL_RS_GRN9' as user 'GBL_RS_GRN9_rsi' ......
E. 2015/11/28 18:52:21. ERROR #4044 RSI(GBL_RS_GRN2) - i/rsiint.c(341)
RSI for 'GBL_RS_GRN2': Shutting down due to an exception.
E. 2015/11/28 18:52:21. ERROR #6025 RSI(GBL_RS_LON103) - m/sqmext.c(14908)
Block consistency failed q 16777350:0, 5954863:36:0 26093 contents q 16777350:0, 5954861:36:0 26093. old=5954863.36
E. 2015/11/28 18:52:21. ERROR #4044 RSI(GBL_RS_LON103) - i/rsiint.c(341)
RSI for 'GBL_RS_LON103': Shutting down due to an exception.
E. 2015/11/28 18:52:24. ERROR #6025 SQT(2480:1 DIST GRN19DS.trim_prices) - m/sqmext.c(14908)
Block consistency failed q 2480:1, 4932057:39:0 27385 contents q 16777322:0, 7990565:39:0 27385. old=4932057.39
E. 2015/11/28 18:52:24. ERROR #30024 DIST(2480 GRN19DS.trim_prices) - xec/dist.c(6596)
The distributor for 'GRN19DS.trim_prices' failed while reading a transaction from it's stable queue.
W. 2015/11/28 18:52:24. WARNING #24049 SQT(2480:1 DIST GRN19DS.trim_prices) - t/sqtint.c(263)
sqt_wrap(2480:1 GRN19DS.trim_prices): Exiting because _sqt_service had an exception.
I. 2015/11/28 18:52:24. DIST for 'GBL_TRD.trim_prices' is waiting for SQM(s) to flush to outbound queue(s).
Rep server then crashed with following in stack trace:
T. 2015/11/28 18:57:52. (250): Thread DIST(2480 GRN19DS.trim_prices) infected with signal 11.
T. 2015/11/28 18:57:52. (250): Dumping memory trace.
M. <8> '18:57:52.785' - Block consistency failed q 2480:1, 4932057:39:0 27385 contents q 16777322:0, 799
M. <7> '18:57:52.785' - SQM (2480:1) Read -537295552 flags(0x306) hash(6af9)
M. <6> '18:52:24.142' - Block consistency failed q 2480:1, 4932057:39:0 27385 contents q 16777322:0, 799
M. <5> '18:52:24.142' - SQM (2480:1) Read -537295552 flags(0x306) hash(6af9)
M. <4> '18:52:21.221' - Block consistency failed q 16777350:0, 5954863:36:0 26093 contents q 16777350:0,
M. <3> '18:52:21.221' - SQM (16777350:0) Read -551811824 flags(0x306) hash(65ed)
M. <2> '18:52:21.090' - Block consistency failed q 16777322:0, 7990567:29:0 23920 contents q 16777322:0,
M. <1> '18:52:21.090' - SQM (16777322:0) Read -586117464 flags(0x306) hash(5d70)
We eventually recovered by bringing rep server up in single user mode and purging stable queue --
Then re-synced replication
Could we have done anything else to recover what was in queue?
Hi,
Moreover:
The stable device becomes inaccessible
recovery process overview
- Drop the corrupted partition
- add a new partition
- rebuild stable queues (signals all upstream connections to resend everything)
- Look in errorlog for recovery messages - If loss detected, you definitively lost transactions (use ignore loss to restart replication then resync replicate database)
-If necessary, recover lost
All documented here:Recovery from Partition Loss or Failure
Rgs
Antoine BALMOKOUN
Support Engineer, SAP Product Support
SAP Labs France, 35, rue d’Alsace 92300 Levallois Perret
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
With a such stack trace with "infected with signal 11", I suggest to open a ticket with SAP support.
How big are your replication server queues to know if you could have dump the queues data into files?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
83 | |
24 | |
12 | |
9 | |
7 | |
6 | |
5 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.