on 03-06-2010 7:08 PM
Hi all,
my installation of Maxdb 7.6.06.03 32 bit has been running happily on OpenSuse 11.1. for more than one year.
Kernel: Linux 2.6.27.45-0.1-pae #1 SMP 2010-02-22 16:49:47 +0100 i686 i686 i386 GNU/Linux.
However, four days ago, the database failed to startwith the following message:
2010-03-01 07:55:01 0x000013a6 ERR -24700 DBMSrv ERR_DBMSRV_NOSTART: Could not start DBM server.
0x000013a6 ERR -24832 DBMSrv ERR_SHMNOTAVAILABLE: Shared memory not available
0x000013a6 ERR -24686 DBMSrv ERR_SHMNOCLEANUP: Could not cleanup the DBM server Shared Memory
0x000013a6 ERR -24740 DBMSrv ERR_SHMSHIFTERROR: Could not change size of Shared Memory (b77422ea, -16777216)
0x000013a6 ERR -24827 DBMSrv ERR_SHMALLOCFAILED: ID /var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm, requested size 4278190898
0x000013a6 ERR 9 RTEIPC Mapping of the shared memory file /var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm of length 4278194176 into the address space of the ca
lling process failed, rc = Cannot allocate memory.
2010-03-01 07:55:01 0x000013a6 INF 226 DBMSrv DBM Server client disconnected: PID 5022 on computer columbus.s.netic.de
2010-03-01 18:52:47 0x00003767 ERR -24700 DBMSrv ERR_DBMSRV_NOSTART: Could not start DBM server.
0x00003767 ERR -24832 DBMSrv ERR_SHMNOTAVAILABLE: Shared memory not available
0x00003767 ERR -24686 DBMSrv ERR_SHMNOCLEANUP: Could not cleanup the DBM server Shared Memory
0x00003767 ERR -24827 DBMSrv ERR_SHMALLOCFAILED: ID /var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm, requested size 4278190898
0x00003767 ERR 9 RTEIPC Mapping of the shared memory file /var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm of length 4278194176 into the address space of the ca
lling process failed, rc = Cannot allocate memory.
I appreciate any pointers that you might have to find out what is happening.
-walt
> my installation of Maxdb 7.6.06.03 32 bit has been running happily on OpenSuse 11.1. for more than one year.
> Kernel: Linux 2.6.27.45-0.1-pae #1 SMP 2010-02-22 16:49:47 +0100 i686 i686 i386 GNU/Linux.
> However, four days ago, the database failed to startwith the following message:
What happened prior to this error? Did the system crash? Or did you change any parameters?
The system tries apparently to allocate 4 GB of memory which you can't do on a 32bit OS.
Markus
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I did some more research since I posted this query.
From what I could see, it seems to appear that on Feb 26 and Feb 27I did a routine Patch of the Kernel in the context of a normal update from the opensuse repository.
There were no crashes or similar that I am aware of. I enclose the corresponding lines from my
knldiag.err file.
The fact that all of a sudden, shared memory can no longer be allocated while it could be allocated
for almost one year without a hitch makes me wonder whether this could be a linux kernel issue.
KNLDIAG.ERR **************
2010-02-27 17:07:52 ___ Stopping GMT 2010-02-27 16:07:52 7.6.06 Build 003-121-202-135
2010-02-27 19:29:27 --- Starting GMT 2010-02-27 18:29:27 7.6.06 Build 003-121-202-135
2010-02-27 21:54:59 ___ Stopping GMT 2010-02-27 20:54:59 7.6.06 Build 003-121-202-135
2010-02-28 11:57:13 --- Starting GMT 2010-02-28 10:57:13 7.6.06 Build 003-121-202-135
2010-02-28 12:00:28 ___ Stopping GMT 2010-02-28 11:00:28 7.6.06 Build 003-121-202-135
2010-02-28 18:34:39 --- Starting GMT 2010-02-28 17:34:39 7.6.06 Build 003-121-202-135
2010-02-28 18:49:15 ___ Stopping GMT 2010-02-28 17:49:15 7.6.06 Build 003-121-202-135
2010-03-01 07:54:55 --- Starting GMT 2010-03-01 06:54:55 7.6.06 Build 003-121-202-135
2010-03-01 07:55:01 5006 ERR 20125 RTE Database automatic restart failed
2010-03-01 18:52:57 ___ Stopping GMT 2010-03-01 17:52:57 7.6.06 Build 003-121-202-135
2010-03-02 08:17:57 --- Starting GMT 2010-03-02 07:17:57 7.6.06 Build 003-121-202-135
2010-03-02 08:18:04 5016 ERR 20125 RTE Database automatic restart failed
2010-03-02 18:29:48 ___ Stopping GMT 2010-03-02 17:29:48 7.6.06 Build 003-121-202-135
KNLDIAG.ERR **************
-walt.
> cat /proc/sys/kernel/shmmax
> 4294967295
That is only a (theroretical) upper limit.
A 32bit process can't allocate as default more than 2 GB of RAM (2^32 bit).
Maybe the new patch introduced a different internal buffer calculation and you're exceeding those limits.
How big is your CACHE_SIZE? If it's more than 200,000 pages I would decrease it to 150,000 and try to restart the database with that amount.
Markus
That is only a (theroretical) upper limit.
A 32bit process can't allocate as default more than 2 GB of RAM (2^32 bit).
Maybe the new patch introduced a different internal buffer calculation and you're exceeding those limits.
How big is your CACHE_SIZE? If it's more than 200,000 pages I would decrease it to 150,000 and try to restart the database with that amount.
From my knldiag-file: I guess I am well below the 200000 pages you mention.
2010-03-06 18:58:43 4494 20235 RTE CACHE_SIZE=2500
2010-03-06 18:58:43 4494 20235 RTE CALLSTACKLEVEL=0
2010-03-06 18:58:43 4494 20235 RTE CATCACHE_MINSIZE=262144
2010-03-06 18:58:43 4494 20235 RTE CAT_CACHE_SUPPLY=3264
How (where) is the amount of requested shm computed ?
-w.
True - you have a VERY small cache (2500 * 8 kb = 10 MB).
What's the output of
ipcs -m
free
total used free shared buffers cached
Mem: 4018436 2117200 1901236 0 249556 1403724
-/+ buffers/cache: 463920 3554516
Swap: 4194296 0 4194296
ipcs -m
-
-
Gemeinsamer Speicher: Segmente -
-
Schlüssel shmid Besitzer Rechte Bytes nattch Status
0x00005d8b 32768 root 777 404 1
0x00000000 98305 root 600 67108864 7 zerstört
0x44000000 131074 sdb 660 1507328 2
0x4400118e 163843 sdb 600 6684672 2
0x44000001 196612 sdb 666 131208 1
0x44000002 229381 sdb 666 262360 1
0x44000003 262150 sdb 666 1311576 1
0x00000000 12648455 walter 600 393216 2 zerstört
0x00000000 12976136 walter 600 4 2 zerstört
0x00000000 13008905 walter 600 4 2 zerstört
0x00000000 13041674 walter 600 4 2 zerstört
0x00000000 13631499 walter 600 99600 2 zerstört
0xcbc384f8 3145740 walter 600 64528 1
-walt.
-
-
Gemeinsamer Speicher: Segmente -
-
Schlüssel shmid Besitzer Rechte Bytes nattch Status
0x00005d8b 32768 root 777 404 1
0x00000000 98305 root 600 67108864 7 zerstört
0x44000000 131074 sdb 660 1507328 2
0x4400118e 163843 sdb 600 6684672 2
0x44000001 196612 sdb 666 131208 1
0x44000002 229381 sdb 666 262360 1
0x44000003 262150 sdb 666 1311576 1
0x00000000 12648455 walter 600 393216 2 zerstört
0x00000000 12976136 walter 600 4 2 zerstört
0x00000000 13008905 walter 600 4 2 zerstört
0x00000000 13041674 walter 600 4 2 zerstört
0x00000000 13631499 walter 600 99600 2 zerstört
0xcbc384f8 3145740 walter 600 64528 1
So as you can see the shared memory segments for the database are still allocated.
The easiest way to solve this problem would be to reboot your machine and after it's back started rename the file
/var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm
and then try to start your database again.
You can also remove all the keys related to user sdb using ipcrm -M
Markus
So as you can see the shared memory segments for the database are still allocated.
The easiest way to solve this problem would be to reboot your machine and after it's back started rename the file
/var/lib/sdb/dbm/ipc/MAXDB1.dbm.shm
and then try to start your database again.
You can also remove all the keys related to user sdb using ipcrm -M <key>
Markus, I have tried to do just that. I renamed MAXDB1.dbm.shm to
MAXDB1.dbm.shm-RENAMED.
This gets mysterious:
When after rebooting the db was restarted it crashed again, this time leaving a directory
named rtedump_dir containing a file named rtedump:
ERR 11112 CONSOLE Incompatible version of running kernel and console!
ERR 11112 CONSOLE Running kernel-version is:
ERR 11112 CONSOLE Actual console-version is: X32/LINUX 7.6.06 Build 003-121-202-135
ERR 11108 CONSOLE console: kernel shared segment attach error 2 key 0x44000000
After the restart attempt, there is NO new MAXDB1.dbm.shm ! So this time it must have died before
it attempted to allocate shm.
knldiag.err:
2010-03-07 17:54:39 5188 ERR 11000 di_main semop error: Identifier removed
2010-03-07 17:54:39 --- Starting GMT 2010-03-07 16:54:39 7.6.06 Bui
ld 003-121-202-135
2010-03-07 17:54:40 0 ERR 12006 DBCRASH Kernel exited without core and exit status 0x6
2010-03-07 17:54:40 0 ERR 12009 DBCRASH Kernel exited due to signal 6(SIGABRT)
2010-03-07 17:54:40 ___ Stopping GMT 2010-03-07 16:54:40 7.6.06 Bui
ld 003-121-202-135
2010-03-07 17:54:54 21191 ERR 20125 RTE Database automatic restart failed
2010-03-07 17:56:39 ___ Stopping GMT 2010-03-07 16:56:39 7.6.06 Bui
ld 003-121-202-135
2010-03-07 17:58:03 --- Starting GMT 2010-03-07 16:58:03 7.6.06 Bui
ld 003-121-202-135
2010-03-07 17:58:22 5312 ERR 20125 RTE Database automatic restart failed
thanks,
-w.
It seems your database was not completely updated/upgraded with the new version, some of the tools seem to have still older versions.
Since the supported way of upgrading databases is using SDBUPD (and not rpm packets) there's nothing left as to
- forcely reinstall the RPM again
- contact the RPM package maintainer to solve this problem.
Markus
Markus,
This was a completely NEW install, I never upgrade neither software nor instance.
One of the first things I did yesterday was to run sdbverify which told me:
VERIFICATION SUMMARY:
*********************
INVALID PACKAGES: 0
VALID PACKAGES: 12
INCONSISTENT PACKAGES: 0
TOTAL FILES: 380
MISSED FILES: 0
MODIFIED FILES: 0
FILES WITH MODIFIED PERMISSIONS: 0
I'm stumped.
-w.
Hello ,
1. Please update with output of the following commands:
ps u2013efe | grep dbmsrv
ls u2013l /var/lib/sdb/dbm/ipc
ipcs -m | wc u2013l
sysctl -a | grep kernel.shmmni
dbmcli inst_enum
dbmcli db_enum u2013s
2. Please post the dbmsrv*.err located in /sapdb/data/wrk
3. Check first<!> that you have not active dbmrfc/dbmsrv processes,
Stop the x_server .
- kill all these processes manually, if they where not release after
Closing DBMGUI, dbmcli sessions, stoping the application server.
- try to check/remove the shared memory
/sapdb/MAXDB1/db/pgm/dbmshm CHECK /var/lib/sdb/dbm/ipc MAXDB1
/sapdb/MAXDB1/db/pgm/dbmshm DELETE /var/lib/sdb/dbm/ipc MAXDB1
- Check in /var/lib/sdb/dbm/ipc if you have files MAXDB1.dbm.shi and MAXDB1.dbm.shm
< rename both files to MAXDB1.dbm.shi.old and MAXDB1.dbm.shm.old>
- Try to connect to the database using dbmcli tool & post results.
4. Are you SAP customer?
Thank you and best regards, Natalia Khlopina
Edited by: Natalia Khlopina on Mar 8, 2010 9:26 PM
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
87 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.