Skip to Content

Text Analysis - custom dictionaries


If you're working with SAP HANA Text Analysis feature, you'll most likely need to write custom dictionaries and CGUL rules.

While CGUL rules are specialized, custom dictionaries are more likely to be re usable among projects. Here's two, please send remarks/additions.

The platform doesn't allow attachments with the extension hdbtextdict so you'll have to change the extension from .txt to .hdbtextdict

Big list of cars

There are about 1400 entries, and is fairly up to date as of June 2015. It is a surdefinition of the base entity VEHICLE@LAND.

All entries are written with the manufacturer then the model : "Renault KADJAR", "Citroen DS5", "BMW serie ?"

Note that the manufacturers are also included, so Alfa Romeo, Aston Martin and the others appear as VEHICLE@LAND, not ORGANIZATION@COMMERCIAL.

Law enforcement organizations

They got in the way when I was doing text analysis so I put some in a dictionary. It only has 40 entries, mostly European police:

Europol, Interpol, Bundespolizei, Politievakbond, Gendarmerie Nationale, etc...

It is useful to translate abbreviations into standard form : Kripo => Kriminalpolizei.



No comments