on 07-20-2009 8:44 AM
Hi,
We are having repeated issues with "-9400 AK Cachedirectory full". Our DB version is 7.7.06.09 on x86_64. The Logs look like this
Thread 0xFBC Task 120 2009-07-17 13:45:15 ERR AK CACHE 51105: DUMPING SESSION CATCACHE: 1
Thread 0xFBC Task 120 2009-07-17 13:45:15 ERR SYSERROR 51080: -9400 AK Cachedirectory full
Thread 0xFBC Task 120 2009-07-17 13:45:15 ERR AK CACHE 51105: DUMPING SESSION CATCACHE: 2
Thread 0xFBC Task 120 2009-07-17 13:45:15 ERR SYSERROR 51080: -9400 AK Cachedirectory full
Thread 0xFBC Task 120 2009-07-17 13:46:06 ERR AK CACHE 51105: DUMPING SESSION CATCACHE: 3
Thread 0xFBC Task 120 2009-07-17 13:46:06 ERR SYSERROR 51080: -9400 AK Cachedirectory full
This seems to happen at least once a week and I don't know how to fix it. Restarting the instance has always "fixed" it. If we didn't restart different things happened in the past like the instance becoming unresponsive.
The logs are here
[KnlMsgArchive.txt|http://www.invoca.ch/pub/KnlMsgArchive.txt]
[KnlMsg_20090717_15-47-04.txt|http://www.invoca.ch/pub/KnlMsg_20090717_15-47-04.txt]
Please note that we still don't know how to trigger the situation, it just happens.
What can we do to get rid of this? Any help is much appreciated.
BTW: We get a lot of files (which may be created by DUMPING SESSION CATCACHE) with names like AK00001.dmp + AK00001.stm + AK00002.dmp + AK00002.stm in the wrk dir. I don't know what to do with them.
Regards,
Simon
Hi,
oops, sorry, you are using 7.7.06. I mixed it with 7.6.06. Therefore no newer version available.
-9205 comes from the same component. Nevertheless, I do not believe that these two problems really belong together.
In the cases we had before (and corrected), we saw that -9205 came because some statements (a 'little' bit more complex than others) reached a state of confusion (perhaps because some lock-problem (locking not possible at the moment) or other things like this) where they believed to have access to info in the cache they do not have or vice versa.
But I do not remember a problem, where -9205 showed the root cause which afterwards caused -9400.
Thank you for being so patient.
Elke
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hallo,
I know that you will not like my text. But nevertheless....
Up to now it is not clear, which problem may cause your -9400. Therefore we cannot recommend to use version xyz with a fix in. If you think about upgrading to a newer version anyway, please do it now and let us check afterwards, if the problem remains.
Unfortunately (for us, but good for you), the error only occurs once a week or something like this. In case it occurs, this one single session has to be closed. Otherwise the error will re-occur as long as this session lives.
Up to now we do not have an idea which sql-statement causes the session to reach that strange state.
What we can do, if you and your database-user allow us to do:
switch on (set to YES) installation parameter CheckTaskSpecificCatalogCache. This will cause a check of the catalog cache after each command. This will, of course, slow down the system. And it will cause a crash in case of a problem !!!
But in this case, we see the problem as soon as possible, after the command causing the trouble. Or we know, that the last one done is THE ONE with the problem because the one before checked the cache and found everything quite fine.
switch on (set to YES) installation parameter TraceDefault. This will show the first around 120 bytes of your statement and several intermediate steps of the database work. If there are many statements looking same in the first 120 bytes, switching on of TraceOrder should be done, too.
As the trace is internally stored in a buffer, this has to have a nice size. Otherwise we will not see the interesting part.
TraceBufferSizeUSTask should be set to a value, that TraceBufferSizeUSTask * MaxUserTasks (Did I understand correctly, that this is set to 50 ?) should be 10000 to 20000 -> TraceBufferSizeUSTask to 200 .. 400.
This trace will be written into knltrace, can be converted to human-readable form using DBStudio and should help together with the dump prepared at that state.
I do not know, if this is acceptable for you and your db-users. Then perhaps the newest version would be another option for you.
I would not like such an answer either, but maybe this disadvantage will help (I cannot promise anything) to find the root cause.
Elke
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Elke,
> I know that you will not like my text. But nevertheless....
I consider everything a progress.
> Up to now it is not clear, which problem may cause your -9400. Therefore we cannot recommend to use version xyz with a fix in. If you think about upgrading to a newer version anyway, please do it now and let us check afterwards, if the problem remains.
I'm afraid we are already using the latest version available for us. We are running the 7.7.06.09 community release, is there a newer version available somehow?
>
> Unfortunately (for us, but good for you), the error only occurs once a week or something like this. In case it occurs, this one single session has to be closed. Otherwise the error will re-occur as long as this session lives.
OK, that means we should be able to recover from the error by disconnecting the client connections instead of restarting the instance, right? Unfortunately we can not do this at the moment without stopping the application server because of how the db access is implemented.
> switch on (set to YES) installation parameter CheckTaskSpecificCatalogCache. This will cause a check of the catalog cache after each command. This will, of course, slow down the system. And it will cause a crash in case of a problem !!!
We have to discuss this internally. Of course we are also interested to find the root cause of the problem.
> I would not like such an answer either, but maybe this disadvantage will help (I cannot promise anything) to find the root cause.
BTW, I've just found this one in the logs, could it be related?
Thread 0x494 Task 120 2009-08-02 18:44:44 ERR AK CACHE 51105: 0000000000003415006100330000000000000000
Thread 0x494 Task 120 2009-08-02 18:44:44 ERR SYSERROR 51080: -9205 AK Catalog information not found:0
Thread 0x494 Task 120 2009-08-02 18:45:08 ERR AK CACHE 51105: 0000000000003415006100330000000000000000
Thread 0x494 Task 120 2009-08-02 18:45:08 ERR SYSERROR 51080: -9205 AK Catalog information not found:0
Thread 0x494 Task 120 2009-08-02 18:45:11 ERR AK CACHE 51105: 0000000000003415006100330000000000000000
Thread 0x494 Task 120 2009-08-02 18:45:11 ERR SYSERROR 51080: -9205 AK Catalog information not found:0
Apart from that everything seems to be fine at the moment.
Regards,
Simon
Edited by: Simon Matter on Aug 4, 2009 2:59 PM
Refer note 1297446
Thanks
Anish
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Currently, the only solution is to increase the parameter CAT_CACHE_SUPPLY to ensure that the memory available for the relevant user task is sufficient.
Increase the parameter CAT_CACHE_SUPPLY to a much higher value (for example, 900000 ).
The memory is allocated dynamically only if required. Therefore, it does not actually have to exist.
To activate the parameter, you must restart the database.
You must rename the user manually. To do this, use the Database Studio, the SQL Studio, sqlcli, or dbmcli.
Log on as SYSDBA (superdba).
The command is as follows: "rename user SAP<SrcSID> to SAP<TRGT_SID>"
If you executed the command successfully, reset the parameter to its previous value and restart the database once again.
I have to ask again because it's not clear to me what I should do.
I understand to increase CAT_CACHE_SUPPLY.
I understand I have to restart the instance for the change to get active.
But now this:
You must rename the user manually. To do this, use the Database Studio, the SQL Studio, sqlcli, or dbmcli.
Log on as SYSDBA (superdba).
The command is as follows: "rename user SAP<SrcSID> to SAP<TRGT_SID>"
If you executed the command successfully, reset the parameter to its previous value and restart the database once again.
Why do I have to rename a user? And why do I have to revert the "parameter to its previous value" (CAT_CACHE_SUPPLY??)
We are using only one user with our database. Do we have to do something with it?
We had the same -9400 errors yesterday. It's starts suddenly and then reoccurs repeatedly until we restart the instance. I think if we don't restart it result in another failure sooner or later but we don't want to come to that point usually.
Regards,
Simon
Yes, we did a db copy (with our own Java based copy tool) recently when upgrading from our instance from 7.6 to 7.7. Unfortunately doing an inplace upgrade failed because db_migratecatalog failed to run so we decided to start with a new instance and migrate data on the application level.
Regards,
Simon
Hi Anish,
As stated before we are using MaxDB in a non SAP environment but with our own application server. So I guess the renamed user does not affect us, right?
Apart from that, do we still have to increase the CAT_CACHE_SUPPLY parameter. Is there a bug in MaxDB we can workaround by increasing CAT_CACHE_SUPPLY?
Regards,
Simon
>
> This problem is caused by an error in the database kernel. It does not always occur .
Really? Which one? You have a bug number (PTS) for that, do you?
> By the way where did u picked this logs from application server - knldiag file ?
>
> Does your app server sits on a separate physical server ?
What has this to do with the error message?
Do you actually know what the error means?
regards,
Lars
>
> Incraese the value of CAT_CACHE_SUPPLY
And then ... what?
To what value do you think the catalog cache supply should be increased?
What do you think will happen when you change the value?
Are you sure to know how the catalog cache works?
What if the error reoccurs?
Don't get me wrong on this, but just guessing is not fixing the problem.
regards,
Lars
Hi Simon,
please tell us a bit more about your application.
How do you connect to the database?
How are the sessions handled?
How are cursors/resultsets are handled?
Here are two statements that you may use to monitor the catalog cache usage:
select timestamp, sessionid, own, taskid, username, catalogcacheusedsize
, usedtemporarysize, pagingfileusedsize
from sessions
and
select timestamp, sessionid, root, type, filestate, treeleavessize+treeindexsize SIZE_IN_KB
from files
where type not in ('TABLE', 'INDEX', 'SHORT COLUMN FILE')
You may run them regularly e.g. every five minutes and save the result.
BTW: do you have the DBAnalyzer running?
Does it print out any warnings?
regards,
Lars
Hi Lars,
I have now enabled DBAnalyzer.
Here are the answers to you other questions:
=> How do you connect to the database?
We connect via JDBC to the database.
=> How are the sessions handled?
We usually use ony two connections at the same time to handle statements. Most statements are called by one of these two connections.
=> How are cursors/resultsets handled?
We typically use PreparedStatements created this way:
Connection c = ...;
c.prepareStatement("select ...", ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
(For JDBC, the default ResultSet Types are TYPE_FORWARD_ONLY and CONCUR_READ_ONLY, as this part of the documentation states for Connection.prepareStatement(String sql): "Result sets created using the returned PreparedStatement object will by default be type TYPE_FORWARD_ONLY and have a concurrency level of CONCUR_READ_ONLY." But we use TYPE_SCROLL_INSENSITIVE/CONCUR_READ_ONLY ResultSets.)
Regards,
Simon
> I have now enabled DBAnalyzer.
Very good. Are there any warnings in the DBAN.PRT? (You may just check this with the DB Studio)
> Here are the answers to you other questions:
> => How do you connect to the database?
>
> We connect via JDBC to the database.
>
> => How are the sessions handled?
>
> We usually use ony two connections at the same time to handle statements. Most statements are called by one of these two connections.
Hmm... just two database connections...
Although you configured up to 80 user tasks (if I remember correctly).
What's the rationale behind this?
> => How are cursors/resultsets handled?
> We typically use PreparedStatements created this way:
>
> Connection c = ...;
> c.prepareStatement("select ...", ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
Do you really need the TYPE_SCROLL_INSENSITIVE?
Or could you just scroll to through the result set up to its end?
What are the results of the check statements I send you?
regards,
Lars
Hi Lars,
> Very good. Are there any warnings in the DBAN.PRT? (You may just check this with the DB Studio)
There are some warnings in the file(s) but I don't know how serious they are. I have posted a tarball of the current analyzer directory here [analyzer.tar.gz|http://www.invoca.ch/pub/analyzer.tar.gz]
BTW, viewing analyzer logs somehow doesn't work for me with dbstudio on Linux. I get the following error:
ERROR Runtime environment error
1,OS error: 'No such file or
Component UIDBM
Plug-in com.sap.sdb.tools.ui.dbm
Location com.sap.sdb.tools.ui.dbm.Logger.log
Detail Message
:com.sap.sdb.tools.ui.dbm.Logger.log(Logger.java:36][N:1248681070385:296][DB:null][U:null]
com.sap.sdb.tools.databasemanagement.DM_Exception: Runtime environment error
1,OS error: 'No such file or directory'
at com.sap.sdb.tools.databasemanagement.internal.DM_Session.execute(DM_Session.java:207)
at com.sap.sdb.tools.databasemanagement.internal.commands.DM_BaseCommand.doExecute(DM_BaseCommand.java:139)
at com.sap.sdb.tools.databasemanagement.internal.commands.DM_DatabaseCommand.doExecute(DM_DatabaseCommand.java:54)
at com.sap.sdb.tools.databasemanagement.internal.commands.DM_BaseCommand.execute(DM_BaseCommand.java:74)
at com.sap.sdb.tools.databasemanagement.internal.DM_AnalyzerLog.open(DM_AnalyzerLog.java:66)
at com.sap.sdb.tools.databasemanagement.internal.DM_AnalyzerLogItems.readItems(DM_AnalyzerLogItems.java:99)
at com.sap.sdb.tools.databasemanagement.internal.DM_AnalyzerLogItems.refresh(DM_AnalyzerLogItems.java:89)
at com.sap.sdb.tools.ui.dbm.editors.pages.RefreshablePage$1.run(RefreshablePage.java:170)
at com.sap.sdb.tools.dbm.jobs.DBMJob$1.run(DBMJob.java:81)
> Hmm... just two database connections...
> Although you configured up to 80 user tasks (if I remember correctly).
> What's the rationale behind this?
Our current setting is 50 user tasks which simply is the default when installing MaxDB. While we mostly have only 2 connections this number can jump to 10 or 20 so I guess it's safe not lower it too much.
> Do you really need the TYPE_SCROLL_INSENSITIVE?
> Or could you just scroll to through the result set up to its end?
I have forwarded this question as I can not answer it mysqlf.
> What are the results of the check statements I send you?
I have posted a tarball with those results here [queries.tar.gz|http://www.invoca.ch/pub/queries.tar.gz]
One thing I saw in the DBAN.prt is that some of our tables need statistics update. My question is do you suggest us to enable 'automatic statistics update'?
I saw in the docs the this is not recommended for system running SAP but is it recommended for us? I guess SAP has it's own maintenance jobs which is why updating statistics is not enable, is that right?
What could be the drawback of running autoupdate? I understand it could run it the wrong moment and slow down other tasks?
BTW, as I stated before we only saw the 9400 errors ~ once a week. If it goes on like before maybe we will see errors by the end of the week. I let DBAnalyzer and the cronjobs run so maybe we will see something then.
Regards,
Simon
Edited by: Simon Matter on Jul 27, 2009 11:04 AM
>
> One thing I saw in the DBAN.prt is that some of our tables need statistics update. My question is do you suggest us to enable 'automatic statistics update'?
If you don't have any other thing setup for this - yes. It's not the most flexible way to gather stats, but it works and it much better then not updating statistics.
One of the major drawbacks is that it does only update statistics for those tables that had been entered into the SYSUPDATESTATWANTED table. And this happens when the optimizer, while running a query recognizes that it got bad data before - so it's a reactive measure for a problem that should be handled proactive.
> I saw in the docs the this is not recommended for system running SAP but is it recommended for us? I guess SAP has it's own maintenance jobs which is why updating statistics is not enable, is that right?
> What could be the drawback of running autoupdate? I understand it could run it the wrong moment and slow down other tasks?
For SAP systems there is a functionality implemented that should find tables that need new stats before any query is running slow on it.
Since typical SAP instances have >35000 tables, >95000 indexes and some tables in the hundreds GB size, it's very important not to have to update stats for all tables regularly.
If your application is much smaller and a full update stats runs through quickly then I would go for a regular (say once weekly) update stats job scheduled via cron or something like that.
Anyhow, this has nothing to do with your -9400 problem.
> BTW, as I stated before we only saw the 9400 errors ~ once a week. If it goes on like before maybe we will see errors by the end of the week. I let DBAnalyzer and the cronjobs run so maybe we will see something then.
Ok. I already had a look into some of the files but it would be interesting to see all logfiles after the error occured.
best regards,
Lars
> BTW, viewing analyzer logs somehow doesn't work for me with dbstudio on Linux. I get the following error:
> ERROR Runtime environment error
> 1,OS error: 'No such file or
>
Sounds like the bug from thread
No idea when this is going to be fixed...
What DB Studio do you use?
As long as this is not fixed, please read the DBAN.PRT via a text editor or a spreadsheet application.
regards,
Lars
> For SAP systems there is a functionality implemented that should find tables that need new stats before any query is running slow on it.
> Since typical SAP instances have >35000 tables, >95000 indexes and some tables in the hundreds GB size, it's very important not to have to update stats for all tables regularly.
>
> If your application is much smaller and a full update stats runs through quickly then I would go for a regular (say once weekly) update stats job scheduled via cron or something like that.
Understood, I'll have a look into it.
>
> Anyhow, this has nothing to do with your -9400 problem.
>
> > BTW, as I stated before we only saw the 9400 errors ~ once a week. If it goes on like before maybe we will see errors by the end of the week. I let DBAnalyzer and the cronjobs run so maybe we will see something then.
>
> Ok. I already had a look into some of the files but it would be interesting to see all logfiles after the error occured.
OK, I'll post new logs once we get the same error condition again.
Regards,
Simon
> Ok. I already had a look into some of the files but it would be interesting to see all logfiles after the error occured.
>
> best regards,
> Lars
Hello Lars,
We have seen another lockup last night but this time with a segfault and not with a -9400. Because we have seen such different behaviour in the past I believe it could be caused by the same bug.
I have posted the collection of logs here [logs.tar.gz|http://www.invoca.ch/pub/logs.tar.gz]
If anything is missing just let me know.
It also appears that a knldump file has been written. Do you want me to preserve the file?
Thanks,
Simon
Hi Simon,
thanks for the output files.
The most recent crash is "new" - not only did your instance not crash with this error stack before, but generally we don't know this one. I informed development about this.
Concerning the -9400 errors it really looks like that the way your application keeps cursors open leads to the issue.
There are sessions that keep multi-megabyte resultssets open for very long times, e.g.:
Timestamp SESSIONID TYPE SIZE_IN_KB
00:00:01 3589055 USERRESULT 45104
00:00:01 3589055 USERRESULT 7976
[...]
01:10:01 3589055 USERRESULT 45104
01:15:01 3589055 USERRESULT 45104
That's about 44 MB of a resultset that is kept open for over an hour!
And there are more like this.
For SAP systems we don't have situations like this and we also don't see the AK -9400 very often.
Since I currently cannot say, what causes the AK error here (looks like a kind of pointer corruption, if you ask me. And this could also explain the crash as well...), I can only propose that you try to change your cursor handling in such a way, that cursors are closed as soon as possible.
Do you think that this is possible for you?
Since this is a non-SAP installation the options to analyze this are pretty limited.
So this might be as good as it gets for a while...
regards,
Lars
>
> Hi Simon,
>
> thanks for the output files.
> The most recent crash is "new" - not only did your instance not crash with this error stack before, but generally we don't know this one. I informed development about this.
Hm, yes that may be true. We have seen many segfaults with our old 7.6 instance. But you may be right, the segfaults we've seen with 7.7 were while doing our first steps with 7.7. The issue was a known one and we were able to configure the workarounds as decumented in the corresponding PTS.
>
> Concerning the -9400 errors it really looks like that the way your application keeps cursors open leads to the issue.
>
> There are sessions that keep multi-megabyte resultssets open for very long times, e.g.:
>
Timestamp SESSIONID TYPE SIZE_IN_KB > 00:00:01 3589055 USERRESULT 45104 > 00:00:01 3589055 USERRESULT 7976 > [...] > 01:10:01 3589055 USERRESULT 45104 > 01:15:01 3589055 USERRESULT 45104
>
> That's about 44 MB of a resultset that is kept open for over an hour!
> And there are more like this.
>
> For SAP systems we don't have situations like this and we also don't see the AK -9400 very often.
>
> Since I currently cannot say, what causes the AK error here (looks like a kind of pointer corruption, if you ask me. And this could also explain the crash as well...), I can only propose that you try to change your cursor handling in such a way, that cursors are closed as soon as possible.
Thanks for this information. I understand that we must be doing something that doesn't happen with SAP systems and therefore is not detected by most users of MaxDB.
>
> Do you think that this is possible for you?
Now that we know where the problem may be, I'm sure we'll find a solution.
>
> Since this is a non-SAP installation the options to analyze this are pretty limited.
> So this might be as good as it gets for a while...
>
> regards,
> Lars
Thanks for now, I'll report back later.
Simon
User | Count |
---|---|
80 | |
24 | |
11 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.