How to remove header and footer rows from a flat file?
Case 1: process a flat file with header information
If you know how many header lines there are, removing them can easily be done by specifying the correct value for “Skipped rows” in the File Format:
The output will look like the Data Preview in the File Format Editor. Just make sure to add a query transform so that the correct column headers are generated:
Case 2: process a flat file with header and footer information
Unfortunately, the “Skipped rows” only deal with rows at the beginning of the file and there’s no equivalent setting for at the end. If you don’t do anything special, your data flow may run into errors when processing the extraneous rows that are in a different format.
Removing header and footer lines can be done in a single data flow:
To that extent an alternate single-field fixed-width File Format is defined, that is used for both source and target in the data flow:
The ReadFile Query transform reads all rows from the file and adds a sequence number:
The Sort Query transform orders the lines in descending order:
The GenRowNum Query transform adds a 2nd sequence number. Because the rows were sorted in descending order, this sequence will start counting from the bottom of the file:
Finally, the Filter Query transform strips off all header and footer rows:
The global variables $RowsInHDR and $RowsInFTR contain the number of header and footer lines respectively. The end result will look like this, a file apt for further processing: