cancel
Showing results for 
Search instead for 
Did you mean: 

What can we so against Error "-9400 AK Cachedirectory full"?

simon_matter
Participant
0 Kudos

Hi,

We are having repeated issues with "-9400 AK Cachedirectory full". Our DB version is 7.7.06.09 on x86_64. The Logs look like this

Thread  0xFBC Task    120  2009-07-17 13:45:15 ERR AK CACHE   51105:  DUMPING SESSION CATCACHE: 1
Thread  0xFBC Task    120  2009-07-17 13:45:15 ERR SYSERROR   51080:  -9400 AK Cachedirectory full
Thread  0xFBC Task    120  2009-07-17 13:45:15 ERR AK CACHE   51105:  DUMPING SESSION CATCACHE: 2
Thread  0xFBC Task    120  2009-07-17 13:45:15 ERR SYSERROR   51080:  -9400 AK Cachedirectory full
Thread  0xFBC Task    120  2009-07-17 13:46:06 ERR AK CACHE   51105:  DUMPING SESSION CATCACHE: 3
Thread  0xFBC Task    120  2009-07-17 13:46:06 ERR SYSERROR   51080:  -9400 AK Cachedirectory full

This seems to happen at least once a week and I don't know how to fix it. Restarting the instance has always "fixed" it. If we didn't restart different things happened in the past like the instance becoming unresponsive.

The logs are here

[KnlMsgArchive.txt|http://www.invoca.ch/pub/KnlMsgArchive.txt]

[KnlMsg_20090717_15-47-04.txt|http://www.invoca.ch/pub/KnlMsg_20090717_15-47-04.txt]

Please note that we still don't know how to trigger the situation, it just happens.

What can we do to get rid of this? Any help is much appreciated.

BTW: We get a lot of files (which may be created by DUMPING SESSION CATCACHE) with names like AK00001.dmp + AK00001.stm + AK00002.dmp + AK00002.stm in the wrk dir. I don't know what to do with them.

Regards,

Simon

Accepted Solutions (0)

Answers (3)

Answers (3)

Former Member
0 Kudos

Hi,

oops, sorry, you are using 7.7.06. I mixed it with 7.6.06. Therefore no newer version available.

-9205 comes from the same component. Nevertheless, I do not believe that these two problems really belong together.

In the cases we had before (and corrected), we saw that -9205 came because some statements (a 'little' bit more complex than others) reached a state of confusion (perhaps because some lock-problem (locking not possible at the moment) or other things like this) where they believed to have access to info in the cache they do not have or vice versa.

But I do not remember a problem, where -9205 showed the root cause which afterwards caused -9400.

Thank you for being so patient.

Elke

Former Member
0 Kudos

Hallo,

I know that you will not like my text. But nevertheless....

Up to now it is not clear, which problem may cause your -9400. Therefore we cannot recommend to use version xyz with a fix in. If you think about upgrading to a newer version anyway, please do it now and let us check afterwards, if the problem remains.

Unfortunately (for us, but good for you), the error only occurs once a week or something like this. In case it occurs, this one single session has to be closed. Otherwise the error will re-occur as long as this session lives.

Up to now we do not have an idea which sql-statement causes the session to reach that strange state.

What we can do, if you and your database-user allow us to do:

switch on (set to YES) installation parameter CheckTaskSpecificCatalogCache. This will cause a check of the catalog cache after each command. This will, of course, slow down the system. And it will cause a crash in case of a problem !!!

But in this case, we see the problem as soon as possible, after the command causing the trouble. Or we know, that the last one done is THE ONE with the problem because the one before checked the cache and found everything quite fine.

switch on (set to YES) installation parameter TraceDefault. This will show the first around 120 bytes of your statement and several intermediate steps of the database work. If there are many statements looking same in the first 120 bytes, switching on of TraceOrder should be done, too.

As the trace is internally stored in a buffer, this has to have a nice size. Otherwise we will not see the interesting part.

TraceBufferSizeUSTask should be set to a value, that TraceBufferSizeUSTask * MaxUserTasks (Did I understand correctly, that this is set to 50 ?) should be 10000 to 20000 -> TraceBufferSizeUSTask to 200 .. 400.

This trace will be written into knltrace, can be converted to human-readable form using DBStudio and should help together with the dump prepared at that state.

I do not know, if this is acceptable for you and your db-users. Then perhaps the newest version would be another option for you.

I would not like such an answer either, but maybe this disadvantage will help (I cannot promise anything) to find the root cause.

Elke

simon_matter
Participant
0 Kudos

Hello Elke,

> I know that you will not like my text. But nevertheless....

I consider everything a progress.

> Up to now it is not clear, which problem may cause your -9400. Therefore we cannot recommend to use version xyz with a fix in. If you think about upgrading to a newer version anyway, please do it now and let us check afterwards, if the problem remains.

I'm afraid we are already using the latest version available for us. We are running the 7.7.06.09 community release, is there a newer version available somehow?

>

> Unfortunately (for us, but good for you), the error only occurs once a week or something like this. In case it occurs, this one single session has to be closed. Otherwise the error will re-occur as long as this session lives.

OK, that means we should be able to recover from the error by disconnecting the client connections instead of restarting the instance, right? Unfortunately we can not do this at the moment without stopping the application server because of how the db access is implemented.

> switch on (set to YES) installation parameter CheckTaskSpecificCatalogCache. This will cause a check of the catalog cache after each command. This will, of course, slow down the system. And it will cause a crash in case of a problem !!!

We have to discuss this internally. Of course we are also interested to find the root cause of the problem.

> I would not like such an answer either, but maybe this disadvantage will help (I cannot promise anything) to find the root cause.

BTW, I've just found this one in the logs, could it be related?

Thread 0x494 Task 120 2009-08-02 18:44:44 ERR AK CACHE 51105: 0000000000003415006100330000000000000000

Thread 0x494 Task 120 2009-08-02 18:44:44 ERR SYSERROR 51080: -9205 AK Catalog information not found:0

Thread 0x494 Task 120 2009-08-02 18:45:08 ERR AK CACHE 51105: 0000000000003415006100330000000000000000

Thread 0x494 Task 120 2009-08-02 18:45:08 ERR SYSERROR 51080: -9205 AK Catalog information not found:0

Thread 0x494 Task 120 2009-08-02 18:45:11 ERR AK CACHE 51105: 0000000000003415006100330000000000000000

Thread 0x494 Task 120 2009-08-02 18:45:11 ERR SYSERROR 51080: -9205 AK Catalog information not found:0

Apart from that everything seems to be fine at the moment.

Regards,

Simon

Edited by: Simon Matter on Aug 4, 2009 2:59 PM

Former Member
0 Kudos

Refer note 1297446

Thanks

Anish

simon_matter
Participant
0 Kudos

Thanks, but I have no access to the support notes, we are using MaxDB in a non SAP environment and SAP support notes require a login. An chance you tell me what can be found there?

Regards,

Simon

Former Member
0 Kudos

Currently, the only solution is to increase the parameter CAT_CACHE_SUPPLY to ensure that the memory available for the relevant user task is sufficient.

Increase the parameter CAT_CACHE_SUPPLY to a much higher value (for example, 900000 ).

The memory is allocated dynamically only if required. Therefore, it does not actually have to exist.

To activate the parameter, you must restart the database.

You must rename the user manually. To do this, use the Database Studio, the SQL Studio, sqlcli, or dbmcli.

Log on as SYSDBA (superdba).

The command is as follows: "rename user SAP<SrcSID> to SAP<TRGT_SID>"

If you executed the command successfully, reset the parameter to its previous value and restart the database once again.

simon_matter
Participant
0 Kudos

Thanks, I'll try that and see how it goes.

simon_matter
Participant
0 Kudos

I have to ask again because it's not clear to me what I should do.

I understand to increase CAT_CACHE_SUPPLY.

I understand I have to restart the instance for the change to get active.

But now this:

You must rename the user manually. To do this, use the Database Studio, the SQL Studio, sqlcli, or dbmcli.

Log on as SYSDBA (superdba).

The command is as follows: "rename user SAP<SrcSID> to SAP<TRGT_SID>"

If you executed the command successfully, reset the parameter to its previous value and restart the database once again.

Why do I have to rename a user? And why do I have to revert the "parameter to its previous value" (CAT_CACHE_SUPPLY??)

We are using only one user with our database. Do we have to do something with it?

We had the same -9400 errors yesterday. It's starts suddenly and then reoccurs repeatedly until we restart the instance. I think if we don't restart it result in another failure sooner or later but we don't want to come to that point usually.

Regards,

Simon

Former Member
0 Kudos

HI Simon

Did u do a system copy recently ?

Anish

simon_matter
Participant
0 Kudos

Yes, we did a db copy (with our own Java based copy tool) recently when upgrading from our instance from 7.6 to 7.7. Unfortunately doing an inplace upgrade failed because db_migratecatalog failed to run so we decided to start with a new instance and migrate data on the application level.

Regards,

Simon

Former Member
0 Kudos

Hi Simon

As part of a system copy, the system renamed the ABAP SQL schema in the MaxDB from SAP<source_SID> to SAP<TARGET_SID>.

You will have to rename this user manually ... as described

Thanks

Anish

simon_matter
Participant
0 Kudos

Hi Anish,

As stated before we are using MaxDB in a non SAP environment but with our own application server. So I guess the renamed user does not affect us, right?

Apart from that, do we still have to increase the CAT_CACHE_SUPPLY parameter. Is there a bug in MaxDB we can workaround by increasing CAT_CACHE_SUPPLY?

Regards,

Simon

Former Member
0 Kudos

This problem is caused by an error in the database kernel. It does not always occur .

By the way where did u picked this logs from application server - knldiag file ?

Does your app server sits on a separate physical server ?

Thanks

Anish

Edited by: Anish John on Jul 24, 2009 12:52 PM

simon_matter
Participant
0 Kudos

Those file get deleted from time to time.

I've uploaded a fresh one from yesterday.

[KnlMsg.txt|http://www.invoca.ch/pub/KnlMsg.txt]

Regards,

Simon

Former Member
0 Kudos

Incraese the value of CAT_CACHE_SUPPLY

Thanks

Anish

lbreddemann
Active Contributor
0 Kudos

>

> This problem is caused by an error in the database kernel. It does not always occur .

Really? Which one? You have a bug number (PTS) for that, do you?

> By the way where did u picked this logs from application server - knldiag file ?

>

> Does your app server sits on a separate physical server ?

What has this to do with the error message?

Do you actually know what the error means?

regards,

Lars

lbreddemann
Active Contributor
0 Kudos

>

> Incraese the value of CAT_CACHE_SUPPLY

And then ... what?

To what value do you think the catalog cache supply should be increased?

What do you think will happen when you change the value?

Are you sure to know how the catalog cache works?

What if the error reoccurs?

Don't get me wrong on this, but just guessing is not fixing the problem.

regards,

Lars

lbreddemann
Active Contributor
0 Kudos

Hi Simon,

please tell us a bit more about your application.

How do you connect to the database?

How are the sessions handled?

How are cursors/resultsets are handled?

Here are two statements that you may use to monitor the catalog cache usage:

select timestamp, sessionid, own, taskid, username, catalogcacheusedsize
, usedtemporarysize, pagingfileusedsize 
from sessions

and

select timestamp, sessionid, root, type, filestate, treeleavessize+treeindexsize SIZE_IN_KB 
from files 
where type not in ('TABLE', 'INDEX', 'SHORT COLUMN FILE')

You may run them regularly e.g. every five minutes and save the result.

BTW: do you have the DBAnalyzer running?

Does it print out any warnings?

regards,

Lars

simon_matter
Participant
0 Kudos

Hi Lars,

Thanks for you answer. I have created a cron job which runs the suggested queries evey 5 minutes.

DBAnalyzer is not running.

For the other questions about our application I have to forward the question to the person who knows it.

Regards,

Simon

lbreddemann
Active Contributor
0 Kudos

> DBAnalyzer is not running.

Please change that. DBAnalyzer logs a whole lot of general information that are usually quiet usefull.

regards,

Lars

simon_matter
Participant
0 Kudos

Hi Lars,

I have now enabled DBAnalyzer.

Here are the answers to you other questions:

=> How do you connect to the database?

We connect via JDBC to the database.

=> How are the sessions handled?

We usually use ony two connections at the same time to handle statements. Most statements are called by one of these two connections.

=> How are cursors/resultsets handled?

We typically use PreparedStatements created this way:

Connection c = ...;

c.prepareStatement("select ...", ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);

(For JDBC, the default ResultSet Types are TYPE_FORWARD_ONLY and CONCUR_READ_ONLY, as this part of the documentation states for Connection.prepareStatement(String sql): "Result sets created using the returned PreparedStatement object will by default be type TYPE_FORWARD_ONLY and have a concurrency level of CONCUR_READ_ONLY." But we use TYPE_SCROLL_INSENSITIVE/CONCUR_READ_ONLY ResultSets.)

Regards,

Simon

lbreddemann
Active Contributor
0 Kudos

> I have now enabled DBAnalyzer.

Very good. Are there any warnings in the DBAN.PRT? (You may just check this with the DB Studio)

> Here are the answers to you other questions:

> => How do you connect to the database?

>

> We connect via JDBC to the database.

>

> => How are the sessions handled?

>

> We usually use ony two connections at the same time to handle statements. Most statements are called by one of these two connections.

Hmm... just two database connections...

Although you configured up to 80 user tasks (if I remember correctly).

What's the rationale behind this?

> => How are cursors/resultsets handled?

> We typically use PreparedStatements created this way:

>

> Connection c = ...;

> c.prepareStatement("select ...", ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);

Do you really need the TYPE_SCROLL_INSENSITIVE?

Or could you just scroll to through the result set up to its end?

What are the results of the check statements I send you?

regards,

Lars

simon_matter
Participant
0 Kudos

Hi Lars,

> Very good. Are there any warnings in the DBAN.PRT? (You may just check this with the DB Studio)

There are some warnings in the file(s) but I don't know how serious they are. I have posted a tarball of the current analyzer directory here [analyzer.tar.gz|http://www.invoca.ch/pub/analyzer.tar.gz]

BTW, viewing analyzer logs somehow doesn't work for me with dbstudio on Linux. I get the following error:

ERROR Runtime environment error
1,OS error: 'No such file or

Component UIDBM
Plug-in com.sap.sdb.tools.ui.dbm
Location com.sap.sdb.tools.ui.dbm.Logger.log

Detail Message
:com.sap.sdb.tools.ui.dbm.Logger.log(Logger.java:36][N:1248681070385:296][DB:null][U:null]
com.sap.sdb.tools.databasemanagement.DM_Exception: Runtime environment error
1,OS error: 'No such file or directory'
    at com.sap.sdb.tools.databasemanagement.internal.DM_Session.execute(DM_Session.java:207)
    at com.sap.sdb.tools.databasemanagement.internal.commands.DM_BaseCommand.doExecute(DM_BaseCommand.java:139)
    at com.sap.sdb.tools.databasemanagement.internal.commands.DM_DatabaseCommand.doExecute(DM_DatabaseCommand.java:54)
    at com.sap.sdb.tools.databasemanagement.internal.commands.DM_BaseCommand.execute(DM_BaseCommand.java:74)
    at com.sap.sdb.tools.databasemanagement.internal.DM_AnalyzerLog.open(DM_AnalyzerLog.java:66)
    at com.sap.sdb.tools.databasemanagement.internal.DM_AnalyzerLogItems.readItems(DM_AnalyzerLogItems.java:99)
    at com.sap.sdb.tools.databasemanagement.internal.DM_AnalyzerLogItems.refresh(DM_AnalyzerLogItems.java:89)
    at com.sap.sdb.tools.ui.dbm.editors.pages.RefreshablePage$1.run(RefreshablePage.java:170)
    at com.sap.sdb.tools.dbm.jobs.DBMJob$1.run(DBMJob.java:81)

> Hmm... just two database connections...

> Although you configured up to 80 user tasks (if I remember correctly).

> What's the rationale behind this?

Our current setting is 50 user tasks which simply is the default when installing MaxDB. While we mostly have only 2 connections this number can jump to 10 or 20 so I guess it's safe not lower it too much.

> Do you really need the TYPE_SCROLL_INSENSITIVE?

> Or could you just scroll to through the result set up to its end?

I have forwarded this question as I can not answer it mysqlf.

> What are the results of the check statements I send you?

I have posted a tarball with those results here [queries.tar.gz|http://www.invoca.ch/pub/queries.tar.gz]

simon_matter
Participant
0 Kudos

One thing I saw in the DBAN.prt is that some of our tables need statistics update. My question is do you suggest us to enable 'automatic statistics update'?

I saw in the docs the this is not recommended for system running SAP but is it recommended for us? I guess SAP has it's own maintenance jobs which is why updating statistics is not enable, is that right?

What could be the drawback of running autoupdate? I understand it could run it the wrong moment and slow down other tasks?

BTW, as I stated before we only saw the 9400 errors ~ once a week. If it goes on like before maybe we will see errors by the end of the week. I let DBAnalyzer and the cronjobs run so maybe we will see something then.

Regards,

Simon

Edited by: Simon Matter on Jul 27, 2009 11:04 AM

lbreddemann
Active Contributor
0 Kudos

>

> One thing I saw in the DBAN.prt is that some of our tables need statistics update. My question is do you suggest us to enable 'automatic statistics update'?

If you don't have any other thing setup for this - yes. It's not the most flexible way to gather stats, but it works and it much better then not updating statistics.

One of the major drawbacks is that it does only update statistics for those tables that had been entered into the SYSUPDATESTATWANTED table. And this happens when the optimizer, while running a query recognizes that it got bad data before - so it's a reactive measure for a problem that should be handled proactive.

> I saw in the docs the this is not recommended for system running SAP but is it recommended for us? I guess SAP has it's own maintenance jobs which is why updating statistics is not enable, is that right?

> What could be the drawback of running autoupdate? I understand it could run it the wrong moment and slow down other tasks?

For SAP systems there is a functionality implemented that should find tables that need new stats before any query is running slow on it.

Since typical SAP instances have >35000 tables, >95000 indexes and some tables in the hundreds GB size, it's very important not to have to update stats for all tables regularly.

If your application is much smaller and a full update stats runs through quickly then I would go for a regular (say once weekly) update stats job scheduled via cron or something like that.

Anyhow, this has nothing to do with your -9400 problem.

> BTW, as I stated before we only saw the 9400 errors ~ once a week. If it goes on like before maybe we will see errors by the end of the week. I let DBAnalyzer and the cronjobs run so maybe we will see something then.

Ok. I already had a look into some of the files but it would be interesting to see all logfiles after the error occured.

best regards,

Lars

lbreddemann
Active Contributor
0 Kudos

> BTW, viewing analyzer logs somehow doesn't work for me with dbstudio on Linux. I get the following error:

> ERROR Runtime environment error

> 1,OS error: 'No such file or

>

Sounds like the bug from thread

No idea when this is going to be fixed...

What DB Studio do you use?

As long as this is not fixed, please read the DBAN.PRT via a text editor or a spreadsheet application.

regards,

Lars

simon_matter
Participant
0 Kudos

> For SAP systems there is a functionality implemented that should find tables that need new stats before any query is running slow on it.

> Since typical SAP instances have >35000 tables, >95000 indexes and some tables in the hundreds GB size, it's very important not to have to update stats for all tables regularly.

>

> If your application is much smaller and a full update stats runs through quickly then I would go for a regular (say once weekly) update stats job scheduled via cron or something like that.

Understood, I'll have a look into it.

>

> Anyhow, this has nothing to do with your -9400 problem.

>

> > BTW, as I stated before we only saw the 9400 errors ~ once a week. If it goes on like before maybe we will see errors by the end of the week. I let DBAnalyzer and the cronjobs run so maybe we will see something then.

>

> Ok. I already had a look into some of the files but it would be interesting to see all logfiles after the error occured.

OK, I'll post new logs once we get the same error condition again.

Regards,

Simon

simon_matter
Participant
0 Kudos

OK, yes that's exactly our issue.

Regards,

Simon

simon_matter
Participant
0 Kudos

> Ok. I already had a look into some of the files but it would be interesting to see all logfiles after the error occured.

>

> best regards,

> Lars

Hello Lars,

We have seen another lockup last night but this time with a segfault and not with a -9400. Because we have seen such different behaviour in the past I believe it could be caused by the same bug.

I have posted the collection of logs here [logs.tar.gz|http://www.invoca.ch/pub/logs.tar.gz]

If anything is missing just let me know.

It also appears that a knldump file has been written. Do you want me to preserve the file?

Thanks,

Simon

lbreddemann
Active Contributor
0 Kudos

Hi Simon,

thanks for the output files.

The most recent crash is "new" - not only did your instance not crash with this error stack before, but generally we don't know this one. I informed development about this.

Concerning the -9400 errors it really looks like that the way your application keeps cursors open leads to the issue.

There are sessions that keep multi-megabyte resultssets open for very long times, e.g.:

Timestamp    SESSIONID    TYPE           SIZE_IN_KB                      
00:00:01     3589055       USERRESULT     45104
00:00:01     3589055       USERRESULT     7976
[...]
01:10:01     3589055       USERRESULT     45104
01:15:01     3589055       USERRESULT     45104

That's about 44 MB of a resultset that is kept open for over an hour!

And there are more like this.

For SAP systems we don't have situations like this and we also don't see the AK -9400 very often.

Since I currently cannot say, what causes the AK error here (looks like a kind of pointer corruption, if you ask me. And this could also explain the crash as well...), I can only propose that you try to change your cursor handling in such a way, that cursors are closed as soon as possible.

Do you think that this is possible for you?

Since this is a non-SAP installation the options to analyze this are pretty limited.

So this might be as good as it gets for a while...

regards,

Lars

simon_matter
Participant
0 Kudos

>

> Hi Simon,

>

> thanks for the output files.

> The most recent crash is "new" - not only did your instance not crash with this error stack before, but generally we don't know this one. I informed development about this.

Hm, yes that may be true. We have seen many segfaults with our old 7.6 instance. But you may be right, the segfaults we've seen with 7.7 were while doing our first steps with 7.7. The issue was a known one and we were able to configure the workarounds as decumented in the corresponding PTS.

>

> Concerning the -9400 errors it really looks like that the way your application keeps cursors open leads to the issue.

>

> There are sessions that keep multi-megabyte resultssets open for very long times, e.g.:

>

Timestamp    SESSIONID    TYPE           SIZE_IN_KB                      
> 00:00:01     3589055       USERRESULT     45104
> 00:00:01     3589055       USERRESULT     7976
> [...]
> 01:10:01     3589055       USERRESULT     45104
> 01:15:01     3589055       USERRESULT     45104

>

> That's about 44 MB of a resultset that is kept open for over an hour!

> And there are more like this.

>

> For SAP systems we don't have situations like this and we also don't see the AK -9400 very often.

>

> Since I currently cannot say, what causes the AK error here (looks like a kind of pointer corruption, if you ask me. And this could also explain the crash as well...), I can only propose that you try to change your cursor handling in such a way, that cursors are closed as soon as possible.

Thanks for this information. I understand that we must be doing something that doesn't happen with SAP systems and therefore is not detected by most users of MaxDB.

>

> Do you think that this is possible for you?

Now that we know where the problem may be, I'm sure we'll find a solution.

>

> Since this is a non-SAP installation the options to analyze this are pretty limited.

> So this might be as good as it gets for a while...

>

> regards,

> Lars

Thanks for now, I'll report back later.

Simon