cancel
Showing results for 
Search instead for 
Did you mean: 

-24994 connection broken server state 4

Former Member
0 Kudos

Hello!

I'm running MaxDB 7.7.06 BUILD 009-121-202-944 on CentOS 5.3.

After power loss today I cannot start database with error "-24994 connection broken server state 4".

I have a software RAID-1 (mdadm), and during disk resync (sdb=>sda) i've got some multiple errors on sda.

=======================================================================

Here is part of my converted KnlMsg file:

=======================================================================

 
Thread  0xA91 Task      -  2009-10-15 11:27:15 ERR Messages       7:  Begin of dump of registered messages,_FILE=Msg_List.cpp,_LINE=3539
Thread  0xA91 Task      -  2009-10-15 11:27:11 ERR Converter  20034:  Bad converter page - read page is no index page,_FILE=Converter_IndexPage.cpp,_LINE=78
                           2009-10-15 11:27:09 ERR Converter  20034:  Bad converter page - read page is no index page,_FILE=Converter_IndexPage.cpp,_LINE=78
Thread  0xA91 Task      -  2009-10-15 11:27:13 ERR Converter  20034:  Bad converter page - read page is no index page,_FILE=Converter_IndexPage.cpp,_LINE=78
                           2009-10-15 11:27:11 ERR Converter  20034:  Bad converter page - read page is no index page,_FILE=Converter_IndexPage.cpp,_LINE=78
                           2009-10-15 11:27:09 ERR Converter  20034:  Bad converter page - read page is no index page,_FILE=Converter_IndexPage.cpp,_LINE=78
Thread  0xA91 Task      -  2009-10-15 11:27:15 ERR IOMan      20032:  Bad converter page,_FILE=IOMan_DataFileSystem.cpp,_LINE=2429
                           2009-10-15 11:27:15 ERR IOMan          5:  Failed consistency check on read page,BLOCK_NO=14274,VOLUME_ID=1,VOLUME_TYPE=data,_FILE=IOMan_Volume.cpp,_LINE=412
                           2009-10-15 11:27:13 ERR Converter  20034:  Bad converter page - read page is no index page,_FILE=Converter_IndexPage.cpp,_LINE=78
                           2009-10-15 11:27:11 ERR Converter  20034:  Bad converter page - read page is no index page,_FILE=Converter_IndexPage.cpp,_LINE=78
                           2009-10-15 11:27:09 ERR Converter  20034:  Bad converter page - read page is no index page,_FILE=Converter_IndexPage.cpp,_LINE=78
Thread  0xA91 Task      -Thread  0xA91 Task      -  2009-10-15 11:27:00     RTEHSS     13951:  No hot standby node configured -> No HotStandby configuration
Thread  0xA91 Task      -  2009-10-15 11:27:09 ERR Converter  20034:  Bad converter page - read page is no index page,_FILE=Converter_IndexPage.cpp,_LINE=78
Thread  0xA91 Task      -  2009-10-15 11:27:15 ERR Messages       8:  End of the message list registry dump,_FILE=Msg_List.cpp,_LINE=3567
Thread  0xA91 Task      -  2009-10-15 11:27:15     RTEKernel    117:  Calling AK dump for T72
Thread  0xA91 Task      -  2009-10-15 11:27:15 ERR AK CACHE   51105:  DUMPING SESSION CATCACHE: 1
Thread  0xA8A Task      -  2009-10-15 11:27:15     TASKING    12825:  state 30 before shutkill(1)
Thread  0xA8A Task      3  2009-10-15 11:27:15     Trace      20000:  Start flush kernel trace
Thread  0xA91 Task      -  2009-10-15 11:27:15     RTEKernel    111:  Tracewriter resumed
Thread  0xA91 Task      -  2009-10-15 11:27:15     RTEKernel     94:  Waiting for tracewriter to finish work
Thread  0xA91 Task      -  2009-10-15 11:27:15     RTEKernel    116:  Tracewriter termination timeout: 60 seconds
Thread  0xA8A Task      3  2009-10-15 11:27:15     Trace      20001:  Stop flush kernel trace
Thread  0xA8A Task      3  2009-10-15 11:27:15     Trace      20002:  Start flush kernel dump
Thread  0xA8A Task      3  2009-10-15 11:27:15     Trace      20003:  Stop flush kernel dump
Thread  0xA8A Task      3  2009-10-15 11:27:15     RTEKernel    110:  Releasing tracewriter
Thread  0xA91 Task      -  2009-10-15 11:27:15     TASKING    12822:  Thread 2687 joining
Thread  0xA91 Task      -  2009-10-15 11:27:15     TENANT     13008:  Requestor for tenant database MYDATABASE has stopped
Thread  0xA91 Task      -  2009-10-15 11:27:15     RTEThread     13:  The thread LegacyRequestor is finished
Thread  0xA7D Task      -  2009-10-15 11:27:15     RTE        20214:  CONSOLE thread stopped
Thread  0xA91 Task      -  2009-10-15 11:27:15     TASKING    12822:  Thread 2686 joining
Thread  0xA91 Task      -  2009-10-15 11:27:15     TASKING    12822:  Thread 2688 joining
Thread  0xA91 Task      -  2009-10-15 11:27:15     TASKING    12822:  Thread 2685 joining
Thread  0xA91 Task      -  2009-10-15 11:27:15 WNG CONNECT    12464:  Releasing  T72 kernel abort
Thread  0xA91 Task      -  2009-10-15 11:27:15     TASKING    12822:  Thread 2681 joining
Thread  0xA91 Task      -  2009-10-15 11:27:15     TASKING    12822:  Thread 2682 joining
Thread  0xA91 Task      -  2009-10-15 11:27:15     RTEKernel     58:  Backup of diagnostic files will be forced at next restart
Thread  0xA91 Task      -  2009-10-15 11:27:15     RTEKernel    118:  SERVERDB MYDATABASE has stopped
                           2009-10-15 11:27:15     RTEKernel     14:  Kernel version: Kernel    7.7.06   Build 009-121-202-944
Thread  0xA91 Task      -  2009-10-15 11:27:15     RunTime        3:  State changed from ABORT to STOPPED
Thread  0xA91 Task      -  2009-10-15 11:27:15     TENANT     13005:  Tenant database MYDATABASE has stopped
Thread  0xA90 Task      -  2009-10-15 11:27:15                12768:  UKT7 stopped
Thread  0xA8F Task      -  2009-10-15 11:27:15                12768:  UKT6 stopped
Thread  0xA8B Task      -  2009-10-15 11:27:15                12768:  UKT2 stopped
Thread  0xA8D Task      -  2009-10-15 11:27:15                12768:  UKT4 stopped
Thread  0xA8C Task      -  2009-10-15 11:27:15                12768:  UKT3 stopped
Thread  0xA92 Task      -  2009-10-15 11:27:15                12768:  UKT9 stopped
Thread  0xA8E Task      -  2009-10-15 11:27:15                12768:  UKT5 stopped
Thread  0xA6E Task      -  2009-10-15 11:27:15     RTEKernel     64:  rtedump written by running kernel already (start time kernel: 2009-10-15 11:27:00, last access time rtedump: 2009-10-15 11:27:15)
Thread  0xA6E Task      -  2009-10-15 11:27:15 ERR RTEKernel    102:  Kernel exited without core and exit status 0x6,_FILE=RTEKernel_Termination.cpp,_LINE=634
Thread  0xA6E Task      -  2009-10-15 11:27:15 ERR RTEKernel     98:  Kernel exited due to signal 6 (SIGABRT),_FILE=RTEKernel_Termination.cpp,_LINE=680
Thread  0xA6E Task      -  2009-10-15 11:27:15     RunTime        3:  State changed from STOPPED to OFFLINE


=======================================================================

smarctl --all /dev/sda

=======================================================================



=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3250310AS
Serial Number:    9RY01C4W
Firmware Version: 3.AAA
User Capacity:    250u2002059u2002350u2002016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Oct 15 11:31:03 2009 NOVST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   106   100   006    Pre-fail  Always       -       11304615
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       87
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   067   060   030    Pre-fail  Always       -       5805208
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1223
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       87
187 Reported_Uncorrect      0x0032   082   082   000    Old_age   Always       -       18
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   064   057   045    Old_age   Always       -       36 (Lifetime Min/Max 33/36)
194 Temperature_Celsius     0x0022   036   043   000    Old_age   Always       -       36 (0 22 0 0)
195 Hardware_ECC_Recovered  0x001a   068   064   000    Old_age   Always       -       2613598
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 18 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 18 occurred at disk power-on lifetime: 1222 hours (50 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 49 4c 40 e4  Error: UNC at LBA = 0x04404c49 = 71322697

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 47 4c 40 e4 00      00:08:40.753  READ DMA
  27 00 00 00 00 00 e0 00      00:08:40.750  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      00:08:40.750  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      00:08:40.747  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      00:08:37.186  READ NATIVE MAX ADDRESS EXT

Error 17 occurred at disk power-on lifetime: 1222 hours (50 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 49 4c 40 e4  Error: UNC at LBA = 0x04404c49 = 71322697

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 47 4c 40 e4 00      00:08:40.753  READ DMA
  27 00 00 00 00 00 e0 00      00:08:40.750  READ NATIVE MAX ADDRESS EXT
  ec 00 00 00 00 00 a0 00      00:08:40.750  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00      00:08:40.747  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00      00:08:37.186  READ NATIVE MAX ADDRESS EXT

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Edited by: Alex Petrov on Oct 15, 2009 7:47 AM

Edited by: Alex Petrov on Oct 15, 2009 7:47 AM

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

> I'm running MaxDB 7.7.06 BUILD 009-121-202-944 on CentOS 5.3.

> After power loss today I cannot start database with error "-24994 connection broken server state 4".

> I have a software RAID-1 (mdadm), and during disk resync (sdb=>sda) i've got some multiple errors on sda.

>

> Thread 0xA91 Task - 2009-10-15 11:27:15 ERR Messages 7: Begin of dump of registered messages,_FILE=Msg_List.cpp,_LINE=3539

> Thread 0xA91 Task - 2009-10-15 11:27:11 ERR Converter 20034: Bad converter page - read page is no index

Hello Alex,

time to get your backup!

As it seems the "software RAID" did not work as reliable as you may have wished for...

Anyhow, the converter - the single most important data structure for db restart - has been corrupted.

There's no way to repair this.

All you can do is to restore the last data backup and recover from there.

Sorry, but that's like it is.

regards,

Lars

Former Member
0 Kudos

Thanks, Lars!

I've restored data from last backup, and everything works fine now.

Edited by: Alex Petrov on Oct 19, 2009 7:32 AM

Answers (0)