cancel
Showing results for 
Search instead for 
Did you mean: 

Temporary table creation during insert is way higher than source table size?

Former Member
0 Kudos

Hi Experts,

I saw the following behavior while trying to debug an out-of-memory issue, and hence wanted some inputs form experts.

1. Create a table as " create table ISSUEDEBUG like "SCHEMAXYZ"."COBK" "

2. insert into "EPANTRA"."ISSUEDEBUG" (select * from "LND_F08CLNT400_R"."COBK" )

Size of table COBK : 17GB

3.Size of table ISSUEDEBUG during the insert operation>> 44GB+

So, it seems some temporary table is getting generated during insert operation and its size is considerably larger than source table size.

4. Select count(*) query returns 0 count

Questions:

1. The size of temporary table goes way higher than source table size (even much higher than source system table size too).Why?

(Is delta memory/uncompression the reason or there is something more to it, as the size on source system is also 20GB)!!

3. This issue becomes critical for implementation if a Data Services job is trying to do this activity(moving data from a base to higher schema) with some transformation?

Regards,

Rahul

Accepted Solutions (1)

Accepted Solutions (1)

former_member184768
Active Contributor
0 Kudos

Hi Rahul,

Couple of points:

1) Can you please merge the delta for the COBK table before the insert

2) Can you please try with CREATE TABLE ... like .. WITH DATA option.

You have 358.5 million records. Insert into .. as select .. may not be a good choice here. Please try with CREATE TABLE .. WITH DATA option. But have the delta merge done before.

Alternatively, you can split the data volume into manageable chunks and try the insert.

Regards,

Ravi

Former Member
0 Kudos

Hi Ravi,

Thanks for the inputs, however the reason I was testing this way was.. the data services job is actually doing this work of reading a COBK table(in HANA's SLT schema) and then writing into another target table (within HANA) after applying some transformations.

Regarding delta merge, the size is 2.2MB so do you think it should be a problem

Also as per the last status the temporary table size went upto 60GB!! and then gave 'Memory Allocation Failed'

I can think of doing load via DS in chunks, but the question also is, is the temporary table size expected to grow to such levels.

Regards,

Rahul

lbreddemann
Active Contributor
0 Kudos

Hey Rahul,

whenever you try to modify data in a column store table this will lead to the entries being stored in the delta store.

Whether you use CREATE TABLE LIKE WITH DATA, CREATE TABLE LIKE + INSERT or CREATE TABLE AS SELECT doesn't really make any difference here.

Where it actually *does* make a difference is that CREATE TABLE LIKE will also re-create all concat attributes that are present on the source table.

Therefore it might be a good idea to ensure that this is not happening.

But this really shouldn't be the big problem here.

You try to insert 350 Mio records into a new/empty table. These records take up 17GB when completely compressed and 60+ GB when just stored uncompressed in the delta store.

Actually this compression factor doesn't seem to be too good...

Anyhow, as the delta merge can't happen while a transaction is currently open, you will have to perform the data loading chunk wise in this case.

Load some data, trigger a merge (maybe just an in-memory merge), load some more ... etc.

Finally, you trigger a persistent delta merge and the table copy is done.

Easy. Tons of threads on that topic here on SCN.

What remains is the question: what kind of wicked transformation requires that you actually do this sort of data copying? Maybe you could avoid this altogether by performing the transformation on the fly.

- Lars

former_member184768
Active Contributor
0 Kudos

With the 358.5 million records with transformations on them, the size of uncompressed data is likely to grow big. Hence splitting in DS process should help.

May be, you can also check on BODS forum about such a high volume data load.

Regards,

Ravi

Former Member
0 Kudos

Hi Lars,

Thanks for the detailed explanation. To clear my understanding here:

Data will always be uncompressed totally and then compressed if we trying to do operation of copying table data from one schema to another.

Now, to give few more details about the reasons why a 'copy'/'transformed copy' is needed. We have a HANA Landing layer (source system specific schemas) and then atomic layer(schema), where the transformed data will be created.

Loading into Atomic layer Schema from Landing layer is done using Data Services job and here the 'initial load' (like the job above) is creating the described challenges.

Plenty of business reasons/justifications why we are having multiple layers, hence I do not want to go there. But at a technical level, will I be right in assuming that such scenarios need to be optimized at Data Services level( one approach is as you describe using chunks to load and then manually merge asnd so on..)

Or is there anything else we can do for HANA optimization. 

Another question: can we really see the data that is getting created in 'Delta' while the operation is going on.

Regards,

Rahul

lbreddemann
Active Contributor
0 Kudos

Yes, the data has to be decompressed and then compressed again.

See, the data compression is based on how the internal storage works, not on SQL data.

But you work on SQL level here, so the data has to be made available for SQL (materialized) and then needs to pass through the SQL layer again to be stored in the target data structure.

There the data can be compressed again.

That' the same whatever you do with compressed data. If you want to copy a ZIP archive but modify what's inside the archive, you need to uncompress it first.

Anyhow, nothing to say against a layered data loading approach (although you might consider a delta loading option when you deal with this kind of data size... ). However, when this really is just a movement from staging to target table, then you got all options to split up the data transfer in multiple chunks. That way you could even parallelize the loading by a higher factor.

And sure enough, that is something you could model with DataServices, correct.

And nope - the counters for the memory sizes of delta and main memory are only updated after a transaction ended. There's no point in "watching the filling of the delta store" or something like that - it's a shared resource anyhow.

So, rather chunk up your data load in manageable sizes and don't worry about the delta store anymore.

Alternatively: by a larger machine + license

Former Member
0 Kudos

Hi Lars,

This helps. The best approach hence is (atleast for the initial load phase) is to break into chunks and load and merge..probably a DS job which at the end of transaction also triggers a in-memory merge.

On your suggestion of larger machine/license:

Frankly, Its really not the concern today, as the current system has almost 3+ TB of free space(but each nodes is of 512GB only). My analysis was that, the above table was although partitioned, but it was reaching the cut-off limit in one of the nodes and hence erroring out.

So I think if the nodes were of 1TB each we would not have faced this issue as all required memory would be available in 1 node or the other. Please correct my analysis here...

Regards,

Rahul

former_member184768
Active Contributor
0 Kudos

Rahul Pant wrote:

So I think if the nodes were of 1TB each we would not have faced this issue as all required memory would be available in 1 node or the other. Please correct my analysis here...

Regards,

Rahul

They have 2 TB boxes also . So tomorrow if some other table comes up for replication with 600 - 700 million, you can think of that too ..

I think breaking in small chunks will be a good strategy.

Ravi

Former Member
0 Kudos

Yes Ravi.

Breaking into small chunks is the way to go..I was just clarifying if my understanding is right.

Btw, among the many other tasks, I have a scenario to load 2 billion records 'tables', using Data Services(like above scenario). Once I succeed I will try updating my findings as a blog, if it can help the vibrant community.

Thanks a lot.

Rahul 

Answers (0)