cancel
Showing results for 
Search instead for 
Did you mean: 

Import Server (MDIS) parallel processing

Former Member
0 Kudos

Hi,

We are running import server and each night it processes MATMAS XML files from R/3 into our main MDM repository. I have been working to tune this process with some good results, but have some questions regarding processing file chunks in parallel. I have lowered the chunk size (in MDIS.ini) to 12000 and set the number of chunks that can be processed in parallel to 7. Here is the snippet from mdis.ini :

\[ISKU_PRODUSTCAS69MSQL_9_8_4_3\]

Chunk Size=12000

No. Of Chunks Proccessed In Parallel=7

I know the "proccessed" isn't spelled right, but MDM put it there and I know MDIS is finding it because at the top of the import server log is the following (and I tried correcting the spelling without any change!) :

[ISKU_PROD\] Import Task Started. Chunk size \[12000\], No. parallel chunks\[7\]

Now I know that most of our files have multiple chunks since these MATMAS files can be quite large. Here is an example of a section of import log.

1756 2008/09/15 05:44:32.494 Timer: name: Import Records - Stage 1 - Prepare Import Records total ms: 18892.865415 6

1756 2008/09/15 05:44:37.962 Timer: name: Import Records - Stage 2 - Filter/Merge total ms: 5406.211170 6

1756 2008/09/15 05:44:38.697 Timer: name: Import Records - Stage 3 [Time spent on MDS ] total ms: 729.913273 6

1756 2008/09/15 05:44:38.697 Timer: name: Import Records - Stage 4 - exception generation total ms: 0.004424 6

1756 2008/09/15 05:44:38.697 Import action: Skip: 0 Create: 0 Updated (NULL fields only): 0 Updated (all mapped fields): 12000 Replace: 0 Delete [destination]: 0

1756 2008/09/15 05:44:38.728 Timer: name: Import Chunk total ms: 81269.620471

The '6' at the end of each line means that this refers to chunk number 6, and you can see this chunk contained 12000 records. Now for the question. Despite being told to process 7 chunks in parallel, each chunk is processed one after the other, ie sequentially and using the same thread. The import log shows quite clearing 1 chunk being processed after another.

I've been digging around in the documenation and the OSS notes but can't find anything to indicate why. I read the notes on streaming, but I don't think they apply since I'm only importing one file at a time. And it clearly knows the file has multiple chunks.

For reference, import server is running on a 4 CPU machine with 4GB of memory running Windows 2003 server (32-bit).

All suggestions gratefully received!

Mark

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

It is Queued parallel Processing, you will not see a true parallel processing here, just like a pipelined CPU architecture.

Say Chunk 1 is at stage 1 preparing import records

Then Nothing else happens parallelly

Now is Chunk 1 moved to stage 2 Filter/Merge

Then Chunk 2 can move to stage 1 preparing import records

Now Chunk 1 moves to stage 3 MDS

Then Chunk 2 moves to stage 2 Filter/Merge

And Chunk 3 moves to stage 1 preparing import records

so on ......and so forth......

So if you prepare a Gantt chart with all these tasks in swim lanes then over a period of time you will observe that actually 'n' number of tasks will be running on various chunks and this 'n' is the what you set in No.of Chunks parameter.

try observing, when the stage 1 task started on Chunk 2 , see if it kinda overlaps stage 2 of chunk 1. Try posting all the logs , very interesting observation !

Size of Chunk is like preparing the packet, in any communication protocol, the more records you have in chunk , the less overhead/record but more wait time. The less the no.of records you have in your chunk the more the overhead/record but less the wait time. So yeah this can be tweaked according to your memory.

-Sudhir.

Former Member
0 Kudos

I would post the log, but it's 3146 entries and so very large I couldn't work out a nice way to filter it for just the chunk and stage timings. If you are very bored I can email it to you

There is definitely nothing parallel happening. Chunk 1 goes through stage 1, stage 2, stage 3. Then Chunk 2 goes through the stages. All the chunks pass through the 3 stages before the next chunk starts.

Maybe it's just due to the file type. I'm not an expert, but I think the MATMAS message qualifies as "Complex XML" and there are a number of restrictions on what can and cannot be done with that.

thanks

Mark

Former Member
0 Kudos

Also, in case it's relevant - the MDM server is running version 5.5.62.33 I know this is not the most recent but I'd rather not go through the pain of an upgrade with all it's associated testing just to see if that fixes this!

Mark

Former Member
0 Kudos

I managed to answer the question myself

Parallel import processing with import server is done by a process called 'streaming'.

Complex XML files cannot be streamed.

The XML files coming from R/3 are Complex XML.

Therefore -> XML files from R/3 cannot be processed in a parallel fashion.

Can I award myself points

Mark