Skip to Content

Archived discussions are read-only. Learn more about SAP Q&A

bods script to delete duplicate input files

hi all,

i have a list of csv files being sent via FTP daily to a folder.

i need to check if duplicate files available and then delete one of the copy and read the rest of the files.

i am not able to find the script to identify the duplicate file.

can anyone please suggest?

Former Member
replied

Hi Swetha,

In case of linux, you can delete the specific file name as long as you can establish which file you want to delete. To do this

Lets consider for example there are six files, a.csv, A.csv, ab.csv, Ab.csv, aB.csv, AB.csv. You will only need a.csv and ab.csv.

1. Read in all the file names into BODS.

2. Keep one column with the actual file name (call this col1)  and another column where the file names are set to either lower case or upper case (call this col2) .

3. Sort the file names by Col2 and prepare a row_num by group Col3 using the gen_row_num_by_group function. and load this data into a table. The table sould have a structure like below

Once you have this data in a table, you can use a script to call the EXEC function to run the command line statement to delete the file names (OriginalFileName) where the NumberByGroup is >1

You will then end up with just a.csv and ab.csv out of the six files that are originally in the fileshare. It can even be A.csv and AB.csv. It doesnt really matter if you have the same data for a given file name regardless of which case conbination it is given in.

The assumption is that AB.csv, ab.csv, Ab.csv and aB.csv all contain the same data but stored under different file names and the same for a.csv and A.csv. If however you have different datasets, then I would suggest you import all the files into the database with the DI_FILENAME column enabled so that you can isolate the data based on the file name at a later stage of the job.

kind regards

Raghu

1 View this answer in context
Not what you were looking for? View more on this topic or Ask a question