cancel
Showing results for 
Search instead for 
Did you mean: 

Mass delete and performances

Former Member
0 Kudos

Hi,

we use a few tables to log access to our various e-commerce sites.

Now some of these tables are very large (20-50 millions row) so we decided to remove old data after having consolidated it.

However deleting a large number of rows takes a long time (it took about 18 minutes to remove 2millions).

We're removing all rows with primary key less than x (key is an integer).

Does a method exists to mass delete with better performances (maybe disabling logging or ....) ?

If not, can I send a delete statement that will run on background (i.e. disconnecting the client) possibly at a lower priority ?

Rows are always inserted with primary key greater than x (it's a db sequence) so delete and insert never conflicts.

Thank you

Accepted Solutions (1)

Accepted Solutions (1)

markus_doehr2
Active Contributor
0 Kudos

> However deleting a large number of rows takes a long time (it took about 18 minutes to remove 2millions).

> We're removing all rows with primary key less than x (key is an integer).

> Does a method exists to mass delete with better performances (maybe disabling logging or ....) ?

Well - without an example how you delete the data it's very difficult to give you a "better" method

> If not, can I send a delete statement that will run on background (i.e. disconnecting the client) possibly at a lower priority ?

You can put the statement in the background using sqlcli and an input (and output) file.

Markus

Former Member
0 Kudos

We're doing

SELECT MAX(ID) FROM TABLE WHERE DATE < the_first_date_we_preserve 

then

DELETE FROM TABLE WHERE ID < x

where x is the result of the select above.

markus_doehr2
Active Contributor
0 Kudos

Does that table fit completely in the data cache? (CACHE_SIZE)

Did you try to delete a smaller amount of data (e. g. a month) and do this in parallel?

Markus

Answers (2)

Answers (2)

Former Member
0 Kudos

Data Cache is 512MB. This db is only for storing website access so this table it's mostly "insert only" 24/24h and a few queries are done at late night after midnight to consolidate the previous day into another table so all query results are on the cache, they are the last inserted rows.

The data size is now 17GB (we have 6 identical databases, one for each site, this is the bigger).

The "delete" operation is done every 2-3 years so the db isn't optimized to do that.

I'll try to delete in batches.

I'm thinking that we should change our "delete" policy and everyday after the consolidation, remove the info with date < (today - 3 years). This way each day a few data will be removed and it should be faster.

What I was really asking is if there's a way to execute a mass delete that is faster than traditional delete with some compromises (for example can't be rolled back).

Thank you

lbreddemann
Active Contributor
0 Kudos

Hi Andrea,

I know that this might come a bit too late for your scenario, but how about this?

For your logging table you could implement something like a poor-mans-partitioniong yourself.

Instead of one big table, you create a table for each, say quarter yourself.

You also create a synonym to he currently used "write-into-this" partition, so that your logging trigger doesn't have to be changed when the new partition should be used.

On top of that you create a UNION view over all partitions.

With that your requirements for logging everything should be fulfilled and (since you don't have to take care about updates) you can keep the active partition small and get rid of old data very fast (drop table/truncate table).

If you go for the "truncate table" (which is pretty much the same as a "drop table" for MaxDB) then you can also think about having a fixed set of tables (Q1, Q2,... Q7, Q8) that you write to in a round-robin fashion. This would then be a kind of moving-window into your logging history.

The only thing to take care of here is that you'll need to have a small time-slice in which you change the synonym every quarter.

Just my two pence on this...

regards,

Lars

Former Member
0 Kudos

Thanks for all replies, we're going to delete a number of rows each day and then implement the delete-every-day procedure described in my previous post.

Thank you

Former Member
0 Kudos

Hi,

another idea is, that the lock-list may cause the problem, when so many locked rows are handled in one transaction.

Ok, WE (you told us, therefore we know) know, that there will be no conflicts, as the old rows will not be used and new rows will receive different keys, but the system does not know. therefore the normal locking is done.

Perhaps it therefore would help, to use several transactions, meaning:

delete where ... and try to specify smaller amount, perhaps 1000 or 10000, to name some number.

And after this amount, do a commit, then the next delete, then commit and so on.

Good luck,

Elke