Duplicate file check

Former Member · ‎11-28-2011

Hello

I have been asked to investigate methods of preventing a file being submitted to PI multiple times. The files are picked up by an NFS file adapter.

We are already making checks on filenames. However, this still allows a file to be submitted multiple times if the file name is changed.

The solution to this might be to take a hash of the file contents and compare against previous files. However, this would be a significant load on the PI server.

Has anyone got any suggestions on an efficient way to prevent duplicate files being submitted?

Kind regards

Steve

Former Member · ‎11-29-2011

you have a option "wait for milisecond" on your file adapter which does job of taking the hash value to check the size of the file after the interval of those many miliseconds. Try this it should work.

anupam_ghosh2 · ‎11-29-2011

Hi Steve,

Say you file "GTM.txt" has content like this in first line or last line


Header,20110808, "xyz",PN00223,10000

Now you need to choose a value in a row preferably the first or the last row, whch is unique for each file.

say the value "PN00223". Sender should also send this value as part of filename also, thus the file name now becomes "GTM.PN00223.txt". Whenever any file is received in message mapping UDF or java mapping validation code you need to check if the first line/last line of the file carries the value which is also the part of filename. In case there is a match you can process the file further else reject the file for further processing (you can send rejection emails to business).

If you follow this process, then simply by renaming the file, PI server won't process the file further. This method won't prevent posting of duplicate files. Thus you need to follow the link provided by Baskar in addition to this method.

If no unique field value is present in the file you can add a field called sequence number in the first line and increment it with each file. The sequence number must be alphanumeric so that it does not grow too much in length over time.

regards

Anupam

Former Member · ‎11-28-2011

Hi,

Few days ago, I did such a search, and it seems on SDN, most of controls are based only on filenames, and not on file content.

Do not forget with ASMA, you can also stored/get the source file size, in addition of the source filename.

to a control on file content, there are mainly two technic:

1. store in a ztable the objects key (like payment number, bank account), but be sure to have all the relevant keys which distinguish two sendings.

2. to store the whole file content itself, inside a Ztable.

3. to store the a part of file content itself, inside a Ztable. Indeed, depending on your volumetry, and of file size, maybe a solution is to store only the 100 first lines, or the first 64 kB (for example).

4. Another solution, that I suggested (with some reserves...), it's to limit the access to the FTP or NFS server, to only admin team, and not to business employee. So to change to business process!

presonally, for the moment, we have not take the decision on which is the best way, for our need (mainly one), but also maybe for the future (reusable method/process).

Mickael

baskar_gopalakrishnan2 · ‎11-28-2011

try this wiki blog for the solution

http://wiki.sdn.sap.com/wiki/display/XI/DifferentwaystokeepyourInterfacefromprocessingduplicate+files

Former Member · ‎11-28-2011

Hi Steve,

if you think of a file as a transaction, each file should have a transaction ID. It can be any combination of fields in the file or a dedicated ID which is unique for the content in the file. Then you can log the ID, implement a check and reject the file if the transaction has been processed already.

Regards, Martin

Duplicate file check

Accepted Solutions (0)

Answers (5)

Answers (5)

Re: Struggling with Filters on Select - Fiori App

Re: How to configure SAP system in Eclipse ?

Re: Incorrect documentation regarding "Error Sanit...

Re: [BIG PROBLEM] SAP Host Agent cannot connect to...

Re: using Already availble XSUAA service to anothe...