How do you load UNSTRUCTURED data into HANA?

Former Member · ‎06-11-2012

As SAP claimes HANA supports both structured and unstructured data. Can any of you - hoping from SAP - to explain how HANA supports unstructured data please? I would like to know how to

1) load unstructured data into HANA - say scanned invoices for example - JPEG or PDF files

2) search say a customer name within these files

3) show them when clicked into one search results

Cheers

Tansu

Former Member · ‎05-13-2014

Looking for the same question. We are trying to load wave files from call centers to HANA (thousands) and need a voice translator to text, as well we a load mechanism to HANA.

Ideas?

Dr. Berg

Former Member · ‎09-10-2012

Hi, below you will find a sample script that will allow you to upload any type of file to a BLOB column in a column table in HANA db. I was able to built this script with help from Juergen Schmerder. You can use any programming language that can establish a connection thru ODBC or JDBC, like .NET, Java, etc...

con = dbapi.connect(‘hanahost', 30015, 'SYSTEM', '********') #Open connection to SAP HANA
cur = con.cursor() #Open a cursor

file = open('doc.pdf', 'rb') #Open file in read-only and binary
content = file.read() #Save the content of the file in a variable

cur.execute("INSERT INTO BLOBTEST VALUES(?,?)", (2,content)) #Save the content to a table

file.close() #Close the file
cur.close() #Close the cursor
con.close() #Close the connection

Now, to be able to search within the content of the files you will need to use Fuzzy Search. Here's an example of a query that looks for the word "march" in the content of the files. The score that you will get back is a TF/IDF score (Term Frequency/Inverse Document Frequency), which means that the score will be calculated based on the number of times the word "march" is found in the content of the file, the file with the most number of matches will have the highest score.

SELECT TO_DECIMAL(SCORE(),3,2) AS score, *

FROM BLOBTEST

WHERE CONTAINS("File_Content", 'march',

FUZZY(0.5, 'textSearch=fulltext'))

ORDER BY "Year", "Month";

Hope it helps, Lucas.

Practice your SAP HANA™ development skills:

www.GetYourHandsOn.it

Info en Español sobre SAP HANA™:

www.HablemosHANA.com

How do you load UNSTRUCTURED data into HANA?

Accepted Solutions (0)

Answers (2)

Answers (2)

Re: How can assign in Identity Authentication Serv...

Re: Are there plans to update the Spring framework...

Vendor Invoice Screen 'Payment' tab screen field '...

I have data like Date, Net, Gross. I want to deriv...

Re: Re Generate Co files and data files