Solved: Checkpoint not complete

former_member84399 · ‎08-10-2007

Dear all,

After four Oracle 10 upgrades, the only remaining warning I have on db13 in all databases I upgraded is about "checkpoint not complete" . This only happens when there is a lot of activity in the database . It never happened before though, even when there was a lot of activity in the database.

I found note #79341 and #1068186 and I have set DISABLESELFTUNE_CHECKPOINTING

to FALSE, but this didn't particularly help, still get the warnings

My questions are:

The log buffer size is increased by the upgrade to 14M (instead of 1M that is used to be). I have 2 groups of redo log files, with 2 files each. Each file is 20M on size.

I was thinking of increasing them to 30M each. Do you think that this is a good thing to do?

Under /oracle/SID/saptrace/background, I see in the log writer trace the messages below. Do you know why I get these messages and are they relevant to the issue I have with the checkpoint?

SERVICE NAME:() 2007-08-09 09:56:38.059
SESSION ID:(316.1) 2007-08-09 09:56:38.059

Maximum redo generation record size = 197120 bytes

Maximum redo generation change vector size = 190204 bytes

tkcrrsarc: (WARN) Failed to find ARCH for message (message:0x10)

tkcrrpa: (WARN) Failed initial attempt to send ARCH message (message:0x10)

2007-08-09 12:55:29.062

LGWR: Archivelog for thread 1 sequence 3662 will NOT be compressed

2007-08-09 14:56:22.902

LGWR: Archivelog for thread 1 sequence 3663 will NOT be compressed

2007-08-09 15:09:29.136

LGWR: Archivelog for thread 1 sequence 3664 will NOT be compressed

2007-08-09 18:25:38.287

Many thanks

Andreas

Former Member · ‎09-06-2007

Support team,

There are many different kind of checkpoints. The checkpoint that is issued with a logswitch is named "log switch checkpoint" in the alert log. If you issue manually an "alter system checkpoint" command, then you will see in the alert log that a "global checkpoint" is issued, which is executed immediately and turns the status of all ACTIVE redo logs to INACTIVE. These INACTIVE redo logs are no longer needed for recovery and can be dropped.

Now Oracle 10 does not trigger a full (global) checkpoint on logswitch (I think Oracle 9 used to do that after 2 consecutive log switches). Oracle 10 issues a global checkpoint when it tries to switch to an already ACTIVE redo log. The log switch checkpoint that is triggered by a log switch appears to have low priority and the database writer doesn't pick it up immediately like it does for the global checkpoint. I have seen checkpoints like this taking 3-4 minutes to complete while the data to be saved is only 50M so I can't believe that it is because of slow db writer.

Andreas

Former Member · ‎09-07-2007

Hi Andreas,

I getting abit confuse. What I understand is that a checkpoint is issue whenever there is a log switch. My understanding of log switch is from log 11 to log 12 (assuming i had 4 log group 11 - 14 ). But base on your explanation, a checkpoint is issue when a complete switch log of group 11 to 12 to 13 to 14 then back to 11. Do highlight me if my understanding of the log switch theory is all the while wrong.

Regards

Lauran

Former Member · ‎09-07-2007

Hi Andreas,

Thanks for the infomation. 1 more question which will clear my doubt on this checkpoint issue. I understand that increasing the redo log size will help because it will reduce the dbwR writing frequency. Increasing the number of dbwR also aid in the process. How about adding a new redo log grp ? How does this contribute to the "checkpoint not complete" issue ?

Regards

Lauran

Former Member · ‎09-06-2007

Hi All,

Just to side track abit. Can i said that when "checkpoint not complete" occured, it acutally mean that oracle is not able to read the data due to checkpoint is logging the process. "Checkpoint not complete" is actually not an error targeting the checkpoint issue. Or i can rephase the error as "read error due to checkpoint is in progress" Kindly correct me.

Regard

Lauran

former_member84399 · ‎09-03-2007

Thank you for your answers, I awarded some more points

I think I have my problem solved, 4 groups of 100M each has done the trick. I noticed a strange behaviour while resizing the redo logs, which might be the cause of the problem.

I understand that a log switch is always triggering a checkpoint (and the opposite is not true). When I was dropping redo log groups, I got two different kind of errors. One was the the redo log group to be dropped can not be dropped because it is current. This makes sense and I easily issued an "alter system switch logfile" statement.

SQL> ALTER DATABASE DROP LOGFILE GROUP 13;

ALTER DATABASE DROP LOGFILE GROUP 13

*

ERROR at line 1:

ORA-01623: log 13 is current log for instance BWT (thread 1) - cannot drop

ORA-00312: online log 13 thread 1: '/oracle/BWT/origlogA/log_g13m1.dbf'

ORA-00312: online log 13 thread 1: '/oracle/BWT/mirrlogA/log_g13m2.dbf'

SQL> alter system switch logfile;

System altered.

Then, when I tried to drop the same redo log group (that was not current any more, as I had just switched to the next one) I got the message

SQL> ALTER DATABASE DROP LOGFILE GROUP 13;

ALTER DATABASE DROP LOGFILE GROUP 13

*

ERROR at line 1:

ORA-01624: log 13 needed for crash recovery of instance BWT (thread 1)

ORA-00312: online log 13 thread 1: '/oracle/BWT/origlogA/log_g13m1.dbf'

ORA-00312: online log 13 thread 1: '/oracle/BWT/mirrlogA/log_g13m2.dbf'

So despite the switch, a checkpoint has not been issued yet. When I executed,

alter system checkpoint;

I could then drop log froup 13

I had similar issues with other log groups . What I do not understand is why I had to issue a manual checkpoint, I believed that the log switch should have issued a checkpoint itself.

Many thanks

Andreas

former_member84399 · ‎08-29-2007

Hello all, thank you for your replies so far, I awarded some points

I have had 9 systems upgraded so far. I am playing with different combinations of redo size, redo groups numbers and DISABLESELFTUNE_CHECKPOINTING. Here are my findings

Changing the DISABLESELFTUNE_CHECKPOINTING parameter hasn't made much difference anywhere. Whether set to true, false or reset (not sure what the default value is then as I can not see it in db03 or using show parameter from sqlplus), I have loads of checkpoint not completed messages in my busy systems

I had 4 redo log groups initially consisting of 1 redo log each of 20 M

Initially, I increased each redo log files size to 50M, the warnings were reduced (from 8-10 per day to 2-3). I then added 2 more groups of 1 redo log each (50M again). I now have none or 1 warning per day there

Vinod, if you can copy and paste me the extract from the database upgrade guide that is talking about the size of the redologs I will be grateful. I tried to look for it both in the SAP Oracle 10 upgrade guide and the Oracle guide from Metalink but I couldn't find it

Eric, or anybody else that understands the term "the database is fluctuating" from note 1068186, could you please explain it here?

Many thanks

Andreas

former_member84399 · ‎08-22-2007

Dear all,

Thank you for your replies so far. In the meanwhile, I increased the size of my redologs from 20M to 50M. The checkpoint warnings still exist but they are significally reduced. I plan to put two more redo log groups (I only have two at the moment) and see what happens

Anyone knows why this behaviour in Oracle 10? The systems I am working on haven't had any increase in the user load and never had checkpoint warnings. I found notes 79341 and 1068186 but it is not clear to me what exactly happens for Oracle not to issue checkpoints as before. As I said, I have this issue directly after the upgrade to Oracle 10 and without any additional load to the system.

Many thanks

Andreas

Former Member · ‎08-14-2007

Dear Team,

A Checkpoint is a database event which synchronizes the modified data blocks in memory with the datafiles on disk.

Checkpoint not complete indicates that Oracle wants to reuse a redo log file, but

the current checkpoint position is still in that log. In this case, Oracle must

wait until the checkpoint position passes that log

this situation may be encountered if DBWR writes too slowly, or if a log switch happens before the log is completely full, or if log file sizes are too small.

I would even recommend sizing them a few hundred Mbytes, you can also consider adding an extra group because the checkpoint of the log about to be overwritten is not yet complete.

Vinod

Former Member · ‎08-13-2007

Hi,

you should set DISABLESELFTUNE_CHECKPOINTING

to TRUE not to FALSE as stated in OSS note 1068186 to reduce the number of checkpoints.

You should also consider increasing the number of DB writers (parameter db_writer_processes)

Thanks

Former Member · ‎08-10-2007

SAP Note 830576 has this for buffer size: LOG_BUFFER 1048576

I had the same problem with checkpoints and reduced the log_buffer size to 1048576 and the issue went away.

Former Member · ‎08-10-2007

Hi Andreas

You can ignore the LGWR not compressed messages. If needed i can post the corresponding oracle metalink doc here, it is 284618.1. I don't hope you are on Oracle 10.1.0.X

Checkpoint not complete mostly happens, when the DBWR cannot keep up writing dirty blocks from the buffer cache to the database files. Every logswitch (redo log full, switch to next redo group) triggers a log checkpoint. Here is a general recommondation, please apply careful as this is depending on your system:

- if possible go to four redo log groups

- 20mb per redo log is tiny. if possible extend to 50mb.

I have seen systems with more than 20mb redo generated per second, they needed redo log sizes around 1gb. Redo log switches should occur once per minute under load. If log_checkpoints_to_alert is set to true you can see log switches in the alert log.

>> Beginning log switch checkpoint up to RBA [0x13d2.2.10], SCN: 44944154

With 4 groups and 50mb, there should be less pressure on the dbwr. Please do not hesitate to come back if you need further clarification.

Regards

Michael

Former Member · ‎08-10-2007

Hi

OSS Note 79341 suggestions are the best.

Regards

Ganesh

Checkpoint not complete

Accepted Solutions (1)

Accepted Solutions (1)

Answers (11)

Answers (11)

Goods receipt error

Re: I can't login to my ABAP system as an administ...

Has the SAP Master Data Integration service been d...

Re: systemVars in SAP BUILD apps

Retreiving AppGyver project