Solved: How to find out the optimum Package size for the S...

Former Member · ‎04-23-2010

Hello Folks,

Does anybody have an idea on as to how to determine the appropriate Package size to be specified in the select query so as to enhance the performance of the program.

I have 1.5 million records to be read from table MARA and I do a for all entries on MBEW which have more records then MARA in it. I have tried giving 200,000 and then 100,000 as the package size, the difference in performance is not significant.

Now, do I try reducing the package size or increase it ?

Your reply will be much appreciated, <removed by moderator>.

Thanks,

Kewal.

Edited by: Thomas Zloch on Apr 23, 2010 5:30 PM - please do not offer points, just assign them as you seem fit. Moved to ABAP performance

former_member194613 · ‎04-26-2010

the recommended package size is much smaller, use 10.000 or up to 50.000. The package size is required to keep the

needed memory small and it is assumed that you do something with the data, i.e. change something else, and there the log should not become to large.

The impact of the package size in the SELECT only, especially with FAE (not visible in your posting anymore) is close to zero.

THE FAE has a completely different packaging - blocking, and here there the blocksize is between 5 - about 100. A larger blocksize has of course an effect on the performance, but can confuse the optimizer.

So here you should forget testing.

Siegfried

ThomasZloch · ‎04-23-2010

Package size does not directly affect the runtime of your program, it is used to split large data volumes into chunks that can be handled without memory overflows. There can be an indirect effect, e.g. handling of smaller internal tables might run faster than handling larger ones.

Often there is an optimum somewhere between package size 1 and package size "very large", which you can narrow down by trying around with different values.

For "real" performance tuning techniques check the sticky threads of this forum.

Thomas

Former Member · ‎04-23-2010

Thank you Thomas, Your reply confirms what I was thinking. This helps !

- Kewal.

Former Member · ‎04-24-2010

Hi Kewal,

I have 1.5 million records to be read from table MARA and I do a for all entries on MBEW which have more records then MARA in it.

Hard to tell from that short sentence, but you might benefit from using a [join|http://help.sap.com/abapdocu_70/en/ABAPSELECT_JOIN.htm] on MARA and MBEW.

I have tried giving 200,000 and then 100,000 as the package size, the difference in performance is not significant. Now, do I try reducing the package size or increase it ?

Well, probably depends on what you mean by performance, but I suspect you're looking at program runtime. However, as mentioned by Thomas, there's other aspects, like memory requirements or considering for example trying to shift load from database server to another application server.

By processing data in packages we reduce communication overhead and give also the database the opportunity to select the data more efficiently in bulk (since database IO is done in pages you'd see for example a good speed-up if you need several rows that are stored within a single page).

From that perspective (time and database IO) it's kind of clear that from some point onwards increasing the package size doesn't really make any difference. Since bigger package sizes most likely will increase memory requirements of the program I'd say it's good practice to try keeping the package size small (less memory consumption), though big enough so that you still get good throughput (which ideally doesn't change much when increasing the package size further).

As mentioned by Thomas, higher package sizes can also possibly cause a longer runtime due to inefficient table lookups (i.e. if you have to look-up data from a huge unsorted internal table (type standard table). If you spot something like that, you should fix it by using appropriate table definitions (e.g. [sorted or hash|http://help.sap.com/abapdocu_70/en/ABAPTYPES_TABKIND.htm] table).

It's best to experiment with different package sizes using different order of magnitudes like 1,000 (might be as good as 100,000). I often utilize a program parameter for package size with a reasonable default value that can then be changed if necessary (e.g. different requirements or simply due to lack of reasonable test data in the development system). Whenever you access the impact, make sure though to get a complete picture (not only runtime otherwise you might get some nasty feedback later).

Cheers, harald

former_member194613 · ‎04-26-2010

the recommended package size is much smaller, use 10.000 or up to 50.000. The package size is required to keep the

needed memory small and it is assumed that you do something with the data, i.e. change something else, and there the log should not become to large.

The impact of the package size in the SELECT only, especially with FAE (not visible in your posting anymore) is close to zero.

THE FAE has a completely different packaging - blocking, and here there the blocksize is between 5 - about 100. A larger blocksize has of course an effect on the performance, but can confuse the optimizer.

So here you should forget testing.

Siegfried

Former Member · ‎04-26-2010

Hello Siegfried Boes,

What is FAE ? I am not aware of this in SAP. The program I am currently working on is written in SAP 4.7 version. Is it available in this version ? From my search on the forum about FAE, I understand that it is used for selecting large amount of data and it automatically breaks the large internal table into smaller chunks on the DB server. Could you please elaborate more on this ?

Thanks,

Kewal.

Former Member · ‎04-26-2010

Appologize for my ignorance, FAE is nothing but For all Entries in I suppose.

To answer your question, Yes I am using FAE on MBEW table for all the materials fetched from MARA.

here is what I do -

SELECT matnr attyp
      INTO TABLE it_itab
   PACKAGE SIZE p_blksiz
      FROM mara
     WHERE matnr IN in_matnr.

      SORT it_itab BY matnr attyp.
  IF it_itab[] IS NOT INITIAL.
    SELECT matnr bwkey verpr lvorm
      FROM mbew
      INTO TABLE it_mbew
       FOR ALL ENTRIES IN it_itab
     WHERE matnr EQ it_itab-matnr AND
           bwkey IN r_bwkey    AND
           bwtar EQ space.

    SORT it_mbew BY matnr bwkey.
      LOOP AT it_itab ASSIGNING <fs_itab_temp1>.
        READ TABLE it_mbew ASSIGNING <fs_mbew> WITH KEY matnr = <fs_itab_temp1>-matnr
                                                        bwkey = c_avrt
                                                 BINARY SEARCH.
        IF sy-subrc EQ 0.
          <fs_itab_temp1>-verpr1 = <fs_mbew>-verpr.
        ENDIF.
        CLEAR wa_mbew.
        READ TABLE it_mbew ASSIGNING <fs_mbew> WITH KEY matnr = <fs_itab_temp1>-matnr
                                                        bwkey = c_lbfa
                                                 BINARY SEARCH.
        IF sy-subrc EQ 0.
          <fs_itab_temp1>-verpr2 = <fs_mbew>-verpr.
        ENDIF.
      ENDLOOP.
    PERFORM f_write_to_file.
 ENDSELECT.

p_blksiz - comes from the Selection screen and determines the package size and "PERFORM f_write_to_file" just loops on it_itab and writes it into the file on application server. Now, MARA has close to 2 million records and MBEW has more than that.

Edited by: Thomas Zloch on Apr 26, 2010 6:43 PM - code formatting fixed

Former Member · ‎04-27-2010

You might be able to do a join (needs modified version of ITAB or other temp table):


select MARA~MATNR MARA~ATTYP MBEW~BWKEY MBEW~VERPR into table IT_ITAB
       package size P_BLKSIZ
       from MARA join MBEW on
            MBEW~MATNR = MARA~MATNR
       where MBEW~BWKEY in (C_AVRT, C_LBFA) and
             MBEW~BWTAR =  SPACE.

However, this approach would only work if you only need entries that are maintained in mara and mbew (which in your current coding is not the case). One could try using a LEFT OUTER JOIN instead, but then Open SQL does not allow us to specify the required restriction on mbew-bwkey and mbew-bwtar.

So if from a functional perspective it's ok to omit materials without a moving average price, then I'd use the join and then consolidate the results afterward (to get to IT_ITAB as you have it now). If you only have those two valuation areas and valuation type space you could still use the outer join (kind of slippery path though if in the future anybody decides to add another valuation area that's not relevant for the program).

If you cannot use a join, I'd do some minor code changes: I prefer to use SORTED tables over standard tables with SORT. Also, since you have probably more entries in IT_MBEW than in IT_ITAB it makes sense to iterate over IT_MBEW (negligible speed-up, but coding is as easy to read as before so why not choose the version using less system resources). The IF statement is superfluous, since the coding in the SELECT statement is not entered unless some data was found.

data:
  IT_ITAB type sorted table of T_ITAB with unique key MATNR,
  IT_MBEW type standard table of T_MBEW with non-unique key MATNR BWKEY.

select MATNR ATTYP from MARA into table IT_ITAB package size P_BLKSIZ
       where MATNR in IN_MATNR.

  select MATNR BWKEY VERPR LVORM from MBEW into table IT_MBEW
         for all entries in IT_ITAB
         where MATNR =  IT_ITAB-MATNR    and
               BWKEY in (C_AVRT, C_LBFA) and
               BWTAR =  SPACE.

  loop at IT_MBEW assigning <FS_MBEW>.

    read table IT_ITAB assigning <FS_ITAB> with key MATNR = <FS_MBEW>-MATNR.
    check SY-SUBRC = 0.

    if <FS_MBEW>-BWKEY = C_AVRT.
      <FS_ITAB>-VERPR1 = <FS_MBEW>-VERPR.
    elseif <FS_MBEW>-BWKEY =  C_LBFA.
      <FS_ITAB>-VERPR2 = <FS_MBEW>-VERPR.
    endif.

  endloop.

  perform F_WRITE_TO_FILE.

endselect.

Cheers, harald

Former Member · ‎04-27-2010

Hello Herald,

I really appreciate your review and feedback on my post. I have a question, If i try to use join (or LEft outer Join) with the package size option I am afraid that some records might just be missed out from the MBEW table. Say for e.g. -

I have 10 records to be fetched from MARA and I specify the package size of 5. Now, these first 5 records in MARA may have lets say 8 records in MBEW. So due to the package size restriction, I fear that 3 records may be left out during the join. Please correct if my understanding is correct.

I really liked your other suggestion for trimming the code and making it cleaner. I am happy if it makes improvement in the performance even though marginally.

Thanks,

Kewal.

Former Member · ‎04-27-2010

>


> I am happy if it makes improvement in the performance even though marginally.

At some point, you run into the law of diminishing returns. It's OK as an exercise, but in a production environment, take care of the big problems and leave the small ones.

Rob

Former Member · ‎04-27-2010

Kewal wrote:
I have a question, If i try to use join (or LEft outer Join) with the package size option I am afraid that some records might just be missed out from the MBEW table.

You will still get all records with a join (inner or outer join). However, you're right, you have to be a bit careful, because you don't know when the limit with the package size hits. Let's take a silly example with a package size of 3 and 2 materials A and B with 2 MBEW records each. So A's records can be read completely, but then after reading B's first MBEW record the package size limit kicks in, so we can only retrieve one MBEW record. On the following SELECT the remaining MBEW record will be read, but with your current processing consolidating two MBEW lines into one and immediately writing the data you'd have a problem. Thus you'd need to adjust your program to handle also those border cases.

I suspect though that from a functional perspective the join might not be the answer, because of the problems I described in my previous post.

Rob wrote:
At some point, you run into the law of diminishing returns. It's OK as an exercise, but in a production environment, take care of the big problems and leave the small ones.

I absolutely agree when it comes to changes of existing programs. However, whenever we design a new program and we have the choice between two coding variants that are both simple, easy to understand and need similar number of code lines then I don't see any reason why one would not pick the variant that saves some system resources - apart from obvious interests of the hardware industry in tough economic times though...

Having said that I'm completely aware that the changes I've suggested are pretty sure not measurable (the database selects should dominate the timings here), but this is not the point. To me it's about good design and this includes performance considerations and clean coding from the start (without falling into the trap of premature optimization of course).

Former Member · ‎04-27-2010

Thanks Rob,

Point noted. Do you see any further scope to polish my code ?

Thanks,

Kewal.

Former Member · ‎04-27-2010

Thank you Once again Harald !

Yes, I tried using the Join and it doesn't give me the record set which I am looking for and fulfills the functional requirements. So I suppose I would stick to the FAE option.

Just came to know that my MBEW table has - 518,639,803 records in it. This is in Production.

Its a pleasure to get the comments from all of you fine experts and thank you for sharing your thoughts ! I am leaving the thread open for now, should any gentlemen would like to add any comments. I am still testing my code in quality and trying to cut down on the processing time as much as possible.

Thanks,

Kewal.

former_member192616 · ‎04-28-2010

Hi Kewal,

if you stick with the FAE option consider the documentation:

If the addition PACKAGE SIZE is specified together with FOR ALL ENTRIES, all rows selected are first imported to an internal system table. The packages are only formed during the transfer from the system table to the actual target table. In this way, the FOR ALL ENTRIES addition overrides the effect of PACKAGE SIZE to prevent memory overflows.

Kind regards,

Hermann

Former Member · ‎04-28-2010


> Just came to know that my MBEW table has - 518,639,803 records in it. This is in Production.

I apologise for derailing your thread - but that number gives me the creeps. Have you considered archiving? You are talking MBEW, so you are pretty safe by archiving MM_MATNR, since this object does not have to many dependencies you have to consider and even the legal implications should not hit too heavily --> which means: it can be achieved in a very short time (no need to make a major project out of it). A couple of other transactions (e. g. MIRO) might benefit from this, too (performance-wise).

Edited by: Mylène Dorias on Apr 28, 2010 11:01 AM

typo: MM_MATNR, not MATBEL

Former Member · ‎04-28-2010

Hello Hermann,

"If the addition PACKAGE SIZE is specified together with FOR ALL ENTRIES "

You mean using Package size and FAE in the same select query ? I am using them in 2 different queries I use package size while fetching from MARA and use FAE on MBEW for the MATNRs.

Thank you for the info.

- Kewal.

former_member192616 · ‎04-29-2010

Hi,

> You mean using Package size and FAE in the same select query ?

yes, i meant that but realized too late that your FAE has no package size, sorry.

Kind regards,

Hermann

Former Member · ‎04-30-2010

Folks,

Just to let you know, I made some more changes in the Program and I got rid of Standard tables. Instead I introduced it_itab as Sorted table and it_mbew as a Hashed table and currently testing the program with different package sizes. I can see some difference in the performance already. I will share my results with all of you once I move the code to production with some brief statistics.

I thank you all for sharing vital knowledge on this thread ! I will close it after I observe the performance in production.

Thanks,

Kewal.

Former Member · ‎05-27-2010

Thank you all for pitching in with your expert comments !

After lots of testing we finally came to the following possible causes for the slow performance -

1) The database I/O might have been the problem - this is because the same program finishes in quality server with 9000 seconds runtime and in productionserver it runs for more than 35000 seconds. One reason could be that, MBEW table is lying in an uncached area of the DB (oracle). Thus fetching the records from this table is a painful experience.

2) Using database Hints greatly improves the performance - We tested it at the oracle DB level by running the same query and it took some 14 hours to fetch the result set (3 million records). Where as using a DB hint it finished in just 2.5 hours.

Neverthless, our DBA agreed to move the MBEW table to a high cache zone with in the oracle DB (not sure about the technicalities of this). Which finally saved our day.

Thank you once again for all your inputs. I am closing this thread.

Thanks,

Kewal.

Edited by: kewal shah on May 27, 2010 4:26 PM Oops- forgot to close it.

Former Member · ‎05-27-2010

>


Oops- forgot to close it.

Still did!

Rob

Former Member · ‎05-27-2010

now it is...

former_member194613 · ‎04-28-2010

> I apologise for derailing your thread

That is not derailing, you hit the nail on the head!

How to find out the optimum Package size for the Select query ?