cancel
Showing results for 
Search instead for 
Did you mean: 

Error 6025 Causes Rep Server to Crash

james_morrison
Explorer
0 Kudos

Hi

This past weekend our Main Data Center suffered a catastrophic power loss.  All our ASE and Rep Servers shutdown

During recovery we were getting 6025 Error in Replication Server (Rep Server 15.5 )

E. 2015/11/28 18:52:21. ERROR #6025 RSI(GBL_RS_GRN2) - m/sqmext.c(14908)

        Block consistency failed q 16777322:0, 7990567:29:0 23920 contents q 16777322:0, 7990564:29:0 23920. old=7990567.29

I. 2015/11/28 18:52:21. Replication Agent for GBL_TRD.statement_mgr connected in passthru mode.

I. 2015/11/28 18:52:21. RSI: connection to 'GBL_RS_LON103' is established and the route is active.

I. 2015/11/28 18:52:21. Trying to connect to server 'GBL_RS_GRN9' as user 'GBL_RS_GRN9_rsi' ......

E. 2015/11/28 18:52:21. ERROR #4044 RSI(GBL_RS_GRN2) - i/rsiint.c(341)

        RSI for 'GBL_RS_GRN2': Shutting down due to an exception.

E. 2015/11/28 18:52:21. ERROR #6025 RSI(GBL_RS_LON103) - m/sqmext.c(14908)

        Block consistency failed q 16777350:0, 5954863:36:0 26093 contents q 16777350:0, 5954861:36:0 26093. old=5954863.36

E. 2015/11/28 18:52:21. ERROR #4044 RSI(GBL_RS_LON103) - i/rsiint.c(341)

        RSI for 'GBL_RS_LON103': Shutting down due to an exception.

E. 2015/11/28 18:52:24. ERROR #6025 SQT(2480:1  DIST GRN19DS.trim_prices) - m/sqmext.c(14908)

        Block consistency failed q 2480:1, 4932057:39:0 27385 contents q 16777322:0, 7990565:39:0 27385. old=4932057.39

E. 2015/11/28 18:52:24. ERROR #30024 DIST(2480 GRN19DS.trim_prices) - xec/dist.c(6596)

        The distributor for 'GRN19DS.trim_prices' failed while reading a transaction from it's stable queue.

W. 2015/11/28 18:52:24. WARNING #24049 SQT(2480:1  DIST GRN19DS.trim_prices) - t/sqtint.c(263)

        sqt_wrap(2480:1 GRN19DS.trim_prices): Exiting because _sqt_service had an exception.

I. 2015/11/28 18:52:24. DIST for 'GBL_TRD.trim_prices' is waiting for SQM(s) to flush to outbound queue(s).

Rep server then crashed with following in stack trace:

T. 2015/11/28 18:57:52. (250): Thread DIST(2480 GRN19DS.trim_prices) infected with signal 11.

T. 2015/11/28 18:57:52. (250): Dumping memory trace.

M. <8> '18:57:52.785' - Block consistency failed q 2480:1, 4932057:39:0 27385 contents q 16777322:0, 799

M. <7> '18:57:52.785' - SQM (2480:1) Read -537295552 flags(0x306) hash(6af9)

M. <6> '18:52:24.142' - Block consistency failed q 2480:1, 4932057:39:0 27385 contents q 16777322:0, 799

M. <5> '18:52:24.142' - SQM (2480:1) Read -537295552 flags(0x306) hash(6af9)

M. <4> '18:52:21.221' - Block consistency failed q 16777350:0, 5954863:36:0 26093 contents q 16777350:0,

M. <3> '18:52:21.221' - SQM (16777350:0) Read -551811824 flags(0x306) hash(65ed)

M. <2> '18:52:21.090' - Block consistency failed q 16777322:0, 7990567:29:0 23920 contents q 16777322:0,

M. <1> '18:52:21.090' - SQM (16777322:0) Read -586117464 flags(0x306) hash(5d70)

We eventually recovered by bringing rep server up in single user mode and purging stable queue --

Then re-synced replication

Could we have done anything else to recover  what was in queue?

Accepted Solutions (0)

Answers (2)

Answers (2)

Former Member
0 Kudos

Hi,

Moreover:

The stable device becomes inaccessible

recovery process overview

- Drop the corrupted partition

- add a new partition

- rebuild stable queues (signals all upstream connections to resend everything)

- Look in errorlog for recovery messages - If loss detected, you definitively lost transactions (use ignore loss to restart replication then resync replicate database)

-If necessary, recover lost

All documented here:Recovery from Partition Loss or Failure

http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc32518.1571100/doc/html/sa...

Rgs

Antoine BALMOKOUN

Support Engineer, SAP Product Support

SAP Labs France, 35, rue d’Alsace 92300 Levallois Perret

antoine.balmokoun@sap.com

Former Member
0 Kudos

With a such stack trace with "infected with signal 11", I suggest to open a ticket with SAP support.


How big are your replication server queues to know if you could have dump the queues data into files?