on 06-01-2009 12:52 PM
I'm dealing with a huge table (71GB) that exhibits ORA-00600 data corruption during archiving.
ORA-00600 [kdimod0: key order], [16], [16], [], [], [], [], []
We tried running
analyze table sapr3.s111 validate structure online;
however it takes over a week and the customer has to kill it in order to take an offline backup.
Next we tried running the scripts at the end of note 765613 however they fail with ORA-01555 because the table is so large and the select takes so long.
Then we tried doing an online reorg of the table.
That gave us ORA-01452 during the rebuild of the primary index. I think that's a fair indication of identical rows on the table.
Next we tried the same think after setting event 1452, but unfortunately the field values didn't show up in the trace file. We think this is because the parallel slave process was running before setting the event, and the rebuild ran in parallel.
So next we are in the process of trying a create online for a unique index. We are expecting this to reproduce the 1452 error and hopefully with all the details we need in the trace file (we are setting event 1452).
Does anybody out there have a smarter way of approaching this problem?
Kind regards,
Peter Strauss
Just as additional info:
Table S111 is a statistics table built by UP2 processes during update.
The content of that table can be rebuilt out of the data existing in the "real" tables:
Note 159800 - Recompile document index for purchasing documents
Markus
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Okay, the system was a little too busy there for a while but we've finally got to the bottom of the problem.
Here's what we did:
1) set the event for errorstack from SQL*Plus
SQL> alter system set events '1452 trace name errorstack level 3';
2) connect to BR*Tools and proceed to Reorganize tables then specify
the table and continue to BRSPACE.
3) Specify the option 4 Table/index parallel degree (degree) into 1
like below;
(example)
BR0280I BRSPACE time stamp: 2009-06-06 04.17.38
BR0657I Input menu 353 - please check/enter input values
Options for reorganization of tables: SAPSR3.TEST1 (degree 1)
1 ~ New destination tablespace (newts) ........ []
2 ~ Separate index tablespace (indts) ......... []
3 - Parallel threads (parallel) ............... [1]
4 ~ Table/index parallel degree (degree) ...... [1] <==this line
5 - Create DDL statements (ddl) ............... [yes]
6 ~ Category of initial extent size (initial) . []
7 ~ Sort by fields of index (sortind) ......... []
8 - Table reorganization mode (mode) .......... [online]
Standard keys: c - cont, b - back, s - stop, r - refr, h - help
4) try reorg to reproduce ORA-1452.
5) set the event for errostack off from SQL*Plus
SQL> alter system set events '1452 trace name errorstack off';
During the above error ORA-01452 occurred, and this time the data from the duplicate records was written to the trace file.
.010. .000.01.E.M.006 Y0010001343401100000001000297706
.20070927.4515030100.001410.200709.00000000.000000.000000
.010. .000.01.E.M.006 Y0010001343401100000001000297706
.20070927.4515030100.001410.200709.00000000.000000.000000
I created a select using the above primary key info and identified the two ROWIDs with duplicate primary keys. Data from all fields was identical so all we have to do is delete one based on the ROWID as per note 23237
delete from sapr3.T_TAB where rowid = 'xxxxxxxx.xxxx.xxxx'
Kind regards and thanks for your help.
Peter
> I'm dealing with a huge table (71GB) that exhibits ORA-00600 data corruption during archiving.
> ORA-00600 [kdimod0: key order], [16], [16], [], [], [], [], []
>
> We tried running
>
> analyze table sapr3.s111 validate structure online;
>
> however it takes over a week and the customer has to kill it in order to take an offline backup.
Have you tried to export the table (exp/expdp) ?
> Next we tried running the scripts at the end of note 765613 however they fail with ORA-01555 because the table is so large and the select takes so long.
As you do have a offline copy of the database you've to opportunity to perform the analysis of the error on a copy instance.
There you won't face runtime restrictions nor ORA-1555.
Even better would be to have a backup that is known to be OK and recover this to the current state and check this one.
> Then we tried doing an online reorg of the table.
> That gave us ORA-01452 during the rebuild of the primary index. I think that's a fair indication of identical rows on the table.
Likely, yes - but to be precise: it's an indication of duplicates in the data source for the index creation.
This can well be another already existing index.
Try
EXPLAIN PLAN FOR
CREATE INDEX ...
to find out what was the source for your CREATE INDEX statement.
If it was another index you may be lucky and the curruption is just in the index.
> So next we are in the process of trying a create online for a unique index. We are expecting this to reproduce the 1452 error and hopefully with all the details we need in the trace file (we are setting event 1452).
>
> Does anybody out there have a smarter way of approaching this problem?
Hmm... as I don't have any experiences with the output of this event I can't tell whether it's clever or not to use it.
Anyhow, as usual the most important thing right now is to figure out where the corruption really is.
If the creation of a unique index fails, try to create a non-unique one and figure out the duplicates with this one.
A
select rowid, <indexed columns>
from sapr3.s111
where (<indexed columns>) in (select <indexed columns>
from sapr3.s111
group by <indexed column>
having count(*) > 2);
should do exactly that.
As soon as the duplicates are identified, we can decide on how to proceed.
regards,
Lars
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello
Thank you for your excellent advice.
Have you tried to export the table (exp/expdp) ?
So the plan would be to export the table and import it to a test database. Then we could test at our leisure?
Likely, yes - but to be precise: it's an indication of duplicates in the data source for the index creation.
This can well be another already existing index.
EXPLAIN PLAN FOR... gave
TABLE ACCESS FULL S111
So it looks like it is the table that holds duplicate records.
select rowid, <indexed columns>
from sapr3.s111
where (<indexed columns>) in (select <indexed columns>
from sapr3.s111
group by <indexed column>
having count(*) > 2);
Might we suffer from ORA-01555 for this select?
Kind regards,
Peter Strauss
Hi peter
Check with dbverify identify block..If it is index you can rebuild to anohter block .
http://download-uk.oracle.com/docs/cd/B10501_01/server.920/a96652/ch13.htm
Its oracle internal error.You can fix it .I faced same problem and it resolved.
see below link u can repair or skip
http://download.oracle.com/docs/cd/B10501_01/server.920/a96521/repair.htm#8282
Also check tablespace
Check thread some one raised and got solution.
SurendraJain
Edited by: Surendrajain2003 on Jun 2, 2009 2:07 PM
>the plan would be to export the table and import it to a test database. Then we could test at our leisure?
Not exactly - the export would just be one additional check for the corruption.
Neither ANALYZE VALIDATE STRUCTURE nor DBV/RMAN do check for all errors.
If you're able to read (=>export) the whole table that would proof technical readability of the table.
> EXPLAIN PLAN FOR... gave
>
>
TABLE ACCESS FULL S111
>
> So it looks like it is the table that holds duplicate records.
Ok, at least now we know...
>
select rowid, <indexed columns>
> from sapr3.s111
> where (<indexed columns>) in (select <indexed columns>
> from sapr3.s111
> group by <indexed column>
> having count(*) > 2);
> Might we suffer from ORA-01555 for this select?
Possible, yes. It's less likely since this should only use index blocks but it still can fail due to ORA-1555.
So here you either have to decide to go OFFLINE (which really is a sensible thing when facing corruptions for which the root cause is unknown yet) or you copy the system and perform the analysis on the copy.
Please don't use the DBMS_REPAIR package advertised here in this thread.
It's not supported to be used with SAP-databases - mainly because it just tells Oracle to ignore corrupt blocks (yes, blocks, not just rows!) and thereby produces data loss.
Also the missing data needs "clean-up" in the rest of the application data to get it consistent again from the application point of view.
If you don't get any further, you should open a support message with SAP.
They do have tools and experience to help you with this situation (although this may not be included in your standard maintenance fee...).
regards,
Lars
So here you either have to decide to go OFFLINE (which really is a sensible thing when facing corruptions for which the root cause is unknown yet) or you copy the system and perform the analysis on the copy.
I see.
The table corruption came up when they tried to archive data, so I guess the customer feels it isn't needed.
The customer has more than one copy of the database. These copies are nearly always offline, however right now one is up for testing. I might wait till the test is over and then run get them to hold on for a while so we can run:
spool analyze<table>.log
analyze table sapr3."<table>" validate structure cascade;
spool off
We won't need to use the ONLINE option because there won't be any updates. I'm expecting this to be a lot faster.
I assume this will work. The two database are synchronized at disk level, so any logical corruptions should be present in the backup.
Please don't use the DBMS_REPAIR package advertised here in this thread.
I was kind of thinking along the same lines.
If you don't get any further, you should open a support message with SAP.
I already have. I've been getting good support; however it's sometimes a little easier to breathe out here in the forums.
Kind regards,
Peter
Edited by: Peter Strauss on Jun 2, 2009 9:29 AM
> I see.
> The table corruption came up when they tried to archive data, so I guess the customer feels it isn't needed.
> The customer has more than one copy of the database. These copies are nearly always offline, however right now one is up for testing. I might wait till the test is over and then run get them to hold on for a while so we can run:
>
>
> spool analyze<table>.log
> > analyze table sapr3."<table>" validate structure cascade;
> > spool off
>
Hmm... actually I wouldn't do that.
You already know, that there is a kind of data inconsistency.
ANALYZE VALIDATE STRUCTURE will only give you exact this information.
It does not tell you whether there are further inconsistencies nor does it tell you (with the CASCADE option) where the inconsistency was found.
It's merely a "traffic light test" for corruptions...
If it were my system I'd focus now to figure out what rows are affected -> use the count(*) statement I posted.
> We won't need to use the ONLINE option because there won't be any updates. I'm expecting this to be a lot faster.
Sorry - the ONLINE/OFFLINE option is no speed trigger. It's just about the locking behaviour of the ANALYZE command...
> I assume this will work. The two database are synchronized at disk level, so any logical corruptions should be present in the backup.
Good to know!
> I already have. I've been getting good support; however it's sometimes a little easier to breathe out here in the forums.
User | Count |
---|---|
101 | |
13 | |
13 | |
11 | |
11 | |
7 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.