cancel
Showing results for 
Search instead for 
Did you mean: 

Load PDF pages as Strings into a HANA Table

Former Member
0 Kudos

Hi SCN community,

i created a table in HANA which should be filled with the content of a local stored PDF file.

The table has two columns, PAGE NUMBER and CONTENT.

Each row should represent one page of the PDF.

I tried this extracting part via a external Python Script, but the content couldn't be extracted for a lot of the PDF files.

The reason for that might be the diversification of PDF types or versions.

My questions are:

How can i extract these information out of an PDF file and load it into my HANA table without using external tools like Python and so on?

Is there already a file upload / extract tool integrated within HANA?

Thanks in advance!

Sebastian

Accepted Solutions (1)

Accepted Solutions (1)

Bojan-lv-85
Advisor
Advisor
0 Kudos

Hi Sebastian,

what about the File Adapter feature of EIM:

http://help.sap.com/download/multimedia/hana_options_eim/SAP_HANA_EIM_Administration_Guide_en.pdf

Chapter "6.5 File"

Not sure whether this is configurable in that way to consider page-numbers.


BR, Bojan

Former Member
0 Kudos

Thanks Bojan,

unfortunately that idea doesnt worked out. Any alternatives out there?

Best Regards

Sebastian

lbreddemann
Active Contributor
0 Kudos

The text analysis and text mining features of SAP HANA allow you to load and process PDF files.

However, I don't know any way to go from there and to actually store the data page wise.

What is the purpose of this anyway? Why would you want to store the files page wise?

Former Member
0 Kudos

Hi Lars,

thanks for feedback. Could you list the mentioned features of HANA to load and process PDF files please?

The page separated storage is just an idea. Nevertheless, even a full PDF stored in a table row would be a success for me.

Thanks in advance

Sebastian

lbreddemann
Active Contributor
0 Kudos

Loading files as well as the text analysis and text mining features are documented. There's even a tool called File Loader available.

Technically it boils down to inserting data into a LOB column. Once the data is in SAP HANA the processing can happen via full text indexes.

Answers (0)