cancel
Showing results for 
Search instead for 
Did you mean: 

Incorrect order in inserted rows when using insert select

Former Member
0 Kudos

When executing the following sql statement, the inserted rows are not in the correct order.

I'm using IQ 16.0 SP8

insert into kfrans.tmpIPCH

(encWID,

ichWID,

ichChangeType,

ichEffectiveDtm,

ichNewProWID,

ichStartDtm,

ichEndDtm)

select distinct DBA.SencToPRocessIpCensus.ichEncWID,

DBA.SencToPRocessIpCensus.ichWID,

DBA.SencToPRocessIpCensus.ichChangeType,

DBA.SencToPRocessIpCensus.ichEffectiveDtm,

case when DBA.SencToPRocessIpCensus.ichNewAttendingProWID is null then DBA.SencToPRocessIpCensus.ichOldAttendingProWID

else DBA.SencToPRocessIpCensus.ichNewAttendingProWID end,

DBA.SencToPRocessIpCensus.ichEffectiveDtm,

DBA.Nencounter.encEndDtm

from DBA.SencToPRocessIpCensus,

kfrans.tmpEncIncluded,

DBA.Nencounter

where DBA.SencToProcessIpCensus.ichEncWID = kfrans.tmpEncIncluded.psfEncWID

and DBA.SencToProcessIpCensus.ichChangeType in ('A', 'T', 'D')

and DBA.Nencounter.encWID = DBA.SencToProcessIpCensus.ichEncWID

order by DBA.SencToProcessIpCensus.ichEncWID, DBA.SencToProcessIpCensus.ichEffectiveDtm, DBA.SencToProcessIpCensus.ichChangeType

If I only execute the select part, the rows are always returned in the right order.

I'd already create all the indexes recommended by Index Advisor and rebuild all the existing indexes with no luck.

Do I missing some option configurations?

Thanks a lot.

Uvernes

Accepted Solutions (1)

Accepted Solutions (1)

markmumy
Advisor
Advisor
0 Kudos

Uvernes,

IQ does not guarantee data ordering when loading (do any databases actually guarantee data ordering?).  IQ can't due to the parallel load engine and the multi-threaded processing of the data.  It may work in some cases and not in others.  But it is never guaranteed.

The same is true of queries.  The only way to guarantee data order on output to the client is by using the ORDER BY clause.  Hence the reason that just executing the SELECT statement returns properly ordered data.

Both loading and queries can be parallel so that processing time is decreased and performance is improved.  When parallel processing data on loading, data gets put into the table in the order that each thread can process the data and put it into the FP structures.  On queries, parallel processing without an order by will allow the optimizer and query engine to return results in any order at any time from any thread.  Completely random.

Mark

Former Member
0 Kudos

Thanks a lot Mark.

I also notice that the rowid generated during the insert have gaps. In this case, for the first 8276 rows the rowid goes from 1 to 8276, and then jump to really high number for the rest of the rows.

Is this due to the same reason?

Thanks a lot,

Uvernes

markmumy
Advisor
Advisor
0 Kudos

Yes, this was due to a chance in IQ 16 for loading.  It used to be that row ids were always sequential (+1) when data was loaded.  But the parallel load engine of IQ 16 no longer guarantees that.  Each row will have a unique number, but there is no guarantee on the ordering (+1) nor on gaps, or lack thereof.

Mark

Former Member
0 Kudos

We are using some processes in IQ 15.2 that do the following:

1. Create a temporary table and insert the rows in a specific order.

2. for each row after the first one, identify the previous row;  using (rowid and rowid -1) or (identiyColumn and identityColumn - 1; where identityColumn is an IQ identity column) 

3.- Use a value for the immediate previous row to update the a value on the current row.

With this change in behavior look like this algorithm is not going to work anymore.  Any suggestion about how to implement this behavior with the minimum impact?

Thanks a lot!

markmumy
Advisor
Advisor
0 Kudos

If you are using an INSERT SELECT or a SELECT INTO, you can simply add a new column and use the NUMBER() function.  This will generate a sequential number always incrementing by 1.  The NUMBER() should be applied after the ORDER BY so even if our load scrambles the ordering, the number column will be preserved.

Mark

Answers (2)

Answers (2)

Former Member
0 Kudos

Thanks a lot everyone!!

c_baker
Employee
Employee
0 Kudos

Apparently, ANSI rules only require a database to return rows in sorted order when the ORDER BY is added.  Internal storage should not dictate the order of rows returned.

A similar change was made in the ASE days when 11.9.2 came out (row-level locking) and even when 11.5 came out (parallel worker processes).  At that time, we told customers who expected results in clustered index order to ensure they had 'order by' added to the query as results returned in clustered index order (storage-based) were no longer guaranteed.

Your issue is similar - results in insert order are not guaranteed.

Chris

markmumy
Advisor
Advisor
0 Kudos

That's correct Chris.  The only way to guarantee order on results is with the ORDER BY.  Data loading is the same way.  The ANSI standards don't dictate how to store the data or the order in which the data is stored.

It was quite easy to rely on this "feature" in ASE 11 and even is some versions of IQ.  Ordering was not specified a lot because data came back in the expected order.  Until we implemented parallelism which then threw all native ordering out of whack and forced people to put in ANSI compliant code with ORDER BY statements.

Mark

simon_ogden
Participant
0 Kudos

FWIW I had this discussion with the ASE guys some time ago. IMO, insert with order by should be guaranteed up to the point where the data is physically inserted by the access layer, if the storage layer changes the order then fine it changes the order (clustered index may determine this etc). Primarily you may want order on insert a number of reasons, 4 that spring to mind are:

A) insert with order by under rowcount - you expect (demand?) the first row as defined by the order by to be inserted

B) insert with order by whilst generating identity values - you expect the identity valuyes to be incremented as per the order by columns.

C) insert with order by and NOT selecting the order by columns - OK, for this one you are going to make an assumption that rows are inserted into a given table in the order defined by the order by column and of course you would then ensure you don't have anything in the table or index structures that could change the order.

D) Concurrency, so using an index that provides the order.

The purpose of the order by with insert is to present the rows to the access/storage layer in the order defined by the order by clause. If the access layer for any number of reasons determines the data needs to be physically stored in an alternate order then so be it, but A) and B) on the list above have to work as required.

The access->storage layer then effectively becomes the client of the select query, so the select with order by has to be preserved from the query processing perspective.

I think the ASE CR was 620806 if my memory serves me correctly.

In essence it is not a debate about storage, it is about query processing.

markmumy
Advisor
Advisor
0 Kudos

Simon,

I think we are saying the same thing: "order by should be guaranteed up to the point where the data is physically inserted by the access layer, if the storage layer changes the order then fine it changes the order"

That's exactly what IQ is doing and that is exactly what the ANSI standard dictates.  The order by is preserved all the way up to the point where the data is handed over to the storage layer.  That way you can use things like NUMBER() to generate ORDER BY specific identities.  But once the order by was run the data is handed over to the storage layer for insertion into the IQ indexes.  This is a massively parallel process that will munge through the ORDER BY result set in parallel to push data into the FP and other indexes.  It is only during this phase where the ordering set by the ORDER BY is not preserved.

On query access, the data will come out in any order that the RDBMS engine chooses if one does not specify an ordering of the result set.  Haven't found and engine yet that doesn't do this.  With most engines going parallel to increase speed, data ordering is thrown aside so that parallel processing can churn through data that much faster.  Unless you specify and order.  If you don't specify the order, you are telling the engine to send you data in any order that it sees fit.  It could be random, it could be based on the storage order.  One thing is for sure, though, it is guaranteed to be random without an order by.

Mark

simon_ogden
Participant
0 Kudos

Yep, I think we're agreed, I wasn't saying IQ was doing necessarily doing anything wrong, just pointing out that sometimes the stock answer of 'well the order of insert is not guaranteed' doesn't always cut it until you dissect the problem.

That was at least the ASE mindset that I was trying to shift when that CR was logged 🙂

markmumy
Advisor
Advisor
0 Kudos

It is a mindset change, isn't it?    And dissecting the issue does help to understand the why behind things.  Sometimes we have to learn new tricks.  Though, I tend to kick and scream when asked to do so.  LOL.

Mark