Solved: Inserting millions of records

Former Member · ‎05-15-2006

Hello,

I have a scenario wherein i will be getting a file with more than 7 million records everyday and i will have to compare this file to my custom table to check for modifications and then accordingly modify my custom table.I cant loop thru the records as it will create performance issues, can i use parallel processing here?..If yes how exactly and if no then plzzz let me know if there is any other efficient way??..

Thnaks in advance,

David.

former_member181966 · ‎05-15-2006

If you split up your file in 4 files and read it in a one shot in different internal tables . Then compare internal tables one by and then update the custom table.

OR

Load the file into internal table at one shot and after loading split it up to other internal tables and then compare with custom tables.

You can also download your custom table through SE16 and load the both files in to software ( WINMERGE )and it’ll tell you the difference in one shot . (if you only validate the data )

We use this software heavily to validate the data. Most of our end user and functional consultant use this to verify the data with SE16.

Hope this’ll give you idea!!

P.S award the points.

Good luck

Thanks

Saquib Khan

"Some are wise and some are otherwise"

Former Member · ‎05-15-2006

Hi,

You should be able to read the data in smaller amounts from the file. However, this takes longer time to process.

You can multiple programs running trying to read the file data at different locations of the file (1 - 100, 1000 - 2000 ...).

You can use field-groups instead of internal tables, which can handle data efficiently.

Regards,

Ravi

Note : Please mark the helpful answers

former_member181966 · ‎05-15-2006

If you split up your file in 4 files and read it in a one shot in different internal tables . Then compare internal tables one by and then update the custom table.

OR

Load the file into internal table at one shot and after loading split it up to other internal tables and then compare with custom tables.

You can also download your custom table through SE16 and load the both files in to software ( WINMERGE )and it’ll tell you the difference in one shot . (if you only validate the data )

We use this software heavily to validate the data. Most of our end user and functional consultant use this to verify the data with SE16.

Hope this’ll give you idea!!

P.S award the points.

Good luck

Thanks

Saquib Khan

"Some are wise and some are otherwise"

FredericGirod · ‎05-15-2006

Hi David,

Parallel processing is not a solution, you will take all the memory of your SAP server.

Best maybe is to work with package size in the SELECT statement.

Or you could simply download the table in a flat file and just compare the two flat files.

Comparing flat file is not too hard, because they don't be store in the SAP memory, you could make the job line by line.

That will take a long time, but you will not kill your server.

Rgd

Frédéric

former_member927251 · ‎05-15-2006

Hi David,

Here is a rough idea for your problem.

You can make batches of 1000 records from the internal table and then you can start processing each batch parellely. It is possible.

Write a function module and call it in a DO LOOP and with the clause CALL FUNCTION 'ZCRM_TRIGGER_PH_STREAM'

STARTING NEW TASK gv_task_name

It will start a new task and go to the next iteration.

Please reward points if it helps.

Regards,

Amit Mishra

Former Member · ‎05-15-2006

Hello Saquib,

How can i use an external software within SAP program. I am telling this bcoz i have to compare the data and then modify my custom table in SAP as well. So even if i compare using the tool how can i update my custom table and if there are new records in the file i will have to insert those records..I would really appreciate if u can giveme more details..

Thanks in advance,

Henry.

FredericGirod · ‎05-15-2006

Before comparing all your file with SAP, can't you make a diff between the file and the previous last update ?

So you will have only the difference to compare ..

diff is an utility software available on all the real operating system (I mean Unix).

Rgd

Frédéric

former_member181966 · ‎05-15-2006

In this case, You have to download the difference another file and then upload that file using you SAP program. You have to do this process manually. Yes, there’s a one option, but its up to you want use it or not as it may require you to consume a time on a software called "QM (Quick macro) ..

I used in one project to create 2000 printers in SAP from excel sheet.

What it does basically , like a BDC , it record the code in QM and you can call excel sheets cell and SAP program , transaction in the same QM code .

You can record macro in it , that it opens file , save it in the new file and then call SAP and your program upload the file and execute the program .

You can not run it in background , you only run this in foreground .

Now choice is yours " I know its difficult and sounds strange but I’m just suggesting you an option "

Hope this’ll give you idea!!

P.S award the points.

Good luck

Thanks

Saquib Khan

"Some are wise and some are otherwise"

former_member181962 · ‎05-15-2006

HI david,

Try this way.

At the end of the day your z table will anyhow should have the latest file data.

So, refresh the contents of the table before the upload.

Reload the data into the z table from the file.

1) delete from ztab.

2) call function GUI_UPLOAD/ open dataset/tranfer/close dataset to upload file data into internal table.

3) modify ztab from itab.

Regards,

Ravi

Former Member · ‎05-15-2006

Hello Frederick,

What do you mean by use package size in select statement..Can you tell me exactly how it will help in my case??..

Thanks,

David.

Former Member · ‎05-15-2006

David,

This thread is a duplicate of an earlier thread by you.

What ever became of this solution?

data: TheLine(100).

data: op(100).

data: ctr type i.

data: begin of data_tbl occurs 10000,

TheCustomer like CUST-kunnr,

SalesForWeek(10) type n,

end of data_tbl.

open dataset p_filename for input in text mode encoding default.

if sy-subrc <> 0.

concatenate 'File: ' p_filename ' was not found. Exiting.'

into op separated by space.

write:/ op.

exit.

endif.

ctr = 0.

do.

read dataset p_filename into TheLine.

if sy-subrc <> 0.

exit.

endif.

ctr = ctr + 1.

Do your DB work here.

move TheLine+0(10) to data_tbl-TheCustomer.

move TheLine+10(10) to data_tbl-SalesForWeek.

append data_tbl.

if ctr > 10000.

update CUST from table data_tbl.

clear data_tbl. refresh data_tbl.

ctr = 0.

endif.

enddo.

close dataset p_filename.

Former Member · ‎05-15-2006

Hello John,

By breaking the file i wont be able to improve performance as i will still be comparing all the million records and i guess that will be a performance issue. U suggested to break the file but still we will be looping all of them in the end..

Ne other advise plzzz as i need this to be done urgently. I would really appretiate any good option here..

Thanks in advance,

David

Former Member · ‎05-15-2006

David,

No... my advice was not to "table up" the data into an internal first. But rather to process each record as it is physically read from the file (in the DO... ENDDO loop).

In short though, you WILL have some performance issues IF you need to process each record. It is simply a fact!!

Your Basis team and Operations staff can probably assist from an execution side to identify a server that has available processor time, etc.

Your job as a developer is maximize the efficiency the code. And if you need to read EACH record (and it sounds like you do), you MUST loop through it!!

The only real gains can be made in the processing AS IT IS READ from the file, and then using an UPDATE from an internal table (as cited in my example).

Former Member · ‎05-15-2006

Hello John,

My question to you is by read record by record within the DO...Enddo how are we doing nething different..In effect we will still be reading all the records and comparing them with the custom table..So how is it different from taking it in internal table and then looping it..I just want to know how we are improving performance and efficiency in the code..

Thanks,

David.

Former Member · ‎05-15-2006

David,

You are improving efficiency because you are only looping thru the records INSTEAD of twice.

If you read the file with a DO loop and add to an internal table, you have made one full pass of the data.

Now... you must loop thru the internal table a SECOND TIME and process the same data. Hence, you have wasted execution time.

In addition, appending to an internal table (especially a large one) takes a great deal of work on the processor's side. Again, you are removing extra work on the system.

Former Member · ‎05-15-2006

David,

I am hopeful that my explanation above has clarified the processing of 14M records vs 7M records (when an internal table is not used).

Please let me know if it has not clarified the concept.

Former Member · ‎05-15-2006

Hello John,

I have one more question..If i read directly within Do..Enddo then to compare to the database i have to do another read.. Also there will be some records missing in the file that have been disabled, i have to identify those records and disable the flag for those records in the custom table. how can i satisfy all 3 conditions modify changed data, insert new records and disable records not in file.

Thanks in advance,

David.

Former Member · ‎05-15-2006

"how can i satisfy all 3 conditions modify changed data, insert new records and disable records not in file."

Same as you would if you used an internal table for the main processing loop.

In the DO loop, determine what operation (change, insert, or delete) needs to be performed in each record and then process accordingly. IF statements should get you through that.

In terms of "another read"... if you need to validate against the custom table, then YES... you must perform a read against the DB server (not the file server that holds the flat file that you are reading).

You have 2 types of "reads" here... A flat file read and DB read using a SELECT SINGLE (hopefully). If the flat file contains all of the custom table's primary key, a SELECT SINGLE will process with VERY reasonable speed.

Former Member · ‎05-15-2006

Hello John,

All the records in custom table may not be present in the file(the disabled ones) so i guess we cant use select single. The approach with the loop was to process each record in input file then read the database and if a match is found then check other fields and modify accordingly, if no match found means its not in table so simply insert the record and then delete the record from internal table for custom table. In the end we will have in internal table for custom table only those records that are there in custom table but not in file. I can just loop thru them and disable them. This works fine but performance is the issue, i dunno how ur read will satisfy the 3 conditions.

Thanks for all ur help so far

David.

Former Member · ‎05-15-2006

David,

So in short:

IF the record does not exist in the custom DB table, then insert/add it.

IF the record does exist in the custom DB table, then modify it.

IF a record in the custom DB table was NOT read during this processing run, mark it as DISABLED.

Is that all correct?

Former Member · ‎05-15-2006

Yes thats true. There is only one primary key, 8 non-primary fields and 2 flags in custom table. Flags are not there in input file.

RichHeilman · ‎05-15-2006

Wow Dave, that's a lot of records to process everyday. Unfortunatly, I don't see any way to get around the performance issue. You will need to run in background, and it will run for a while.

Possible solutions.

1) Disable all records in your custom table. Write your program to check for the existance of the record, if it is there just update it and enable it. If the record is not there, then just create it as enabled. This will save you from having to check each field, instead just update the record. I'm not sure if your system will be able to handle it, but you could retrieve all records from the custom table and put them into an internal table, then you can use the READ TABLE statement WITH BINARY SEARCH extension to find your records, this will speed up the program if it can handle it. Doing so will be one hit to the DB, instead of millions.

Regards,

Rich Heilman

Former Member · ‎05-15-2006

Hello Rich,

Thanks for your reply and possible solution. But if i disable all records in custom table how would i be able to find records that are already there in custom table but not in file. I have to disable such records which i guess cant be done then. Do you have any idea how long should it take if we loop it and compare it by reading DB with binary search(approximately)??..

Thanks in advance,

David.

former_member186741 · ‎05-16-2006

Hi David,

Rich has suggested disabling all records on the custom table to start with. Then enabling records that are added or modified as a result of being on the file.

Therefore at the end of the process the records in the custom table but not on the file WILL BE THE ONLY ONES NOT ENABLED. Rich's suggestion satisfies your requirement!

Going one step beyond this, why not use a date/timestamp field to act as the enabled/disabed switch? In that way you will not have to update 7 million records at the start of your program, they will already be considered disabled as they will start off with an old date/timestamp value.

Programs which access your custom table will need to refer to the date/time stamp field to determine if the records are current (or enabled) by checking that the date/timestamp matches todays date (or yesterday's depending on when the refresh job gets run). You also could create an index on this date/time field which should speed up access to enabled records.

Former Member · ‎05-15-2006

Just a thought - if the table should just represent what's in the latest file, why not delete all records at the start of processing and do a series of array inserts. This might be faster than reading each record and updating only when needed. I think you can use native SQL to drop and recreate the table to save time on the delete.

Then again, this might take longer, but with so many records, explore all options.

Needless to say, do this off-cycle.

Rob