cancel
Showing results for 
Search instead for 
Did you mean: 

HANA db case sensitivity with SQL text searches

patrickbachmann
Active Contributor
0 Kudos

Hi folks,

We have a HANA view that users are consuming via Microstrategy.  All fields such as PLANT (WERKS) are case sensitive.  For example if the plant is named 'XYZ1' then the following SQL is NOT returning any results;

select * from TABLENAME where WERKS = 'xyz1'

Now my microstrategy developer is saying that other databases he consumes are not case sensitive and this setting is configured typically at the database level.  So my question is twofold;

1) Is there a database level setting in HANA for case sensitivity?

2) If this option is available does it really make sense to turn this sensitivity off?

Thanks,

-Patrick

Accepted Solutions (1)

Accepted Solutions (1)

henrique_pinto
Active Contributor
0 Kudos

Interesting topic.

After some quick googling, apparently the default behavior varies from DB to DB.

SQL Server apparently is case insensitive by default, while Oracle apparently isn't.

One thing the BI layer could do is to use something like

WHERE UPPER("COLUMN") = UPPER(:criteria)

In a classic RDBMS, this would have a terrible performance since it wouldn't consider the existing  indexes as in the straightforward WHERE clause. In an in-memory DB, though, I'm not really sure what would be the performance degradation (if any). Please test and let us know.

stefan_koehler
Active Contributor
0 Kudos

Hi Henrique,

basically it is the way how characters are compared. You can make use of binary comparisons of characters (= default setting) or linguistic comparisons on Oracle (i don't know how it is handled internally by other RDBMS). Characters are compared according to their binary value in case of binary comparison, otherwise the numeric code of each character does not have to be identical in order to match.

You would even have a bad performance (index unique lookups or range scans not possible) by linguistic usage, if you don't create an index with the nlssort function (which is used implicitly). There are other limitations as well (like bitmap stuff and so on), but that would be too far right now.

Regards

Stefan

lbreddemann
Active Contributor
0 Kudos

In addition to Stefans comment, the classic workaround for this requirement would be to create what Oracle calls a Function Based Index.

It's pretty similar to generated columns in SAP HANA and would allow the optimizer to leverage an existing index structure.

In SAP HANA the SQL optimizer is clever enough to figure out that there already is a pre-comuputed upper case version of the data in this case:

create column table bbb (name varchar(20)

                       , upper_name varchar(20) generated always as UPPER(name));

insert into bbb values ('Lars');

insert into bbb values ('Mara');

insert into bbb values ('Josh');

insert into bbb values ('Caroline');

select * from bbb;

NAME    UPPER_NAME
Lars    LARS
Mara    MARA
Josh    JOSH
CarolineCAROLINE

EXPLAIN PLAN FOR select * from bbb where upper(name) = upper('LaRs');

OPERATOR_NAME   OPERATOR_DETAILS

COLUMN SEARCH   BBB.NAME, BBB.UPPER_NAME (LATE MATERIALIZATION)

  COLUMN TABLE  FILTER CONDITION: BBB.UPPER_NAME = 'LARS'

As we see, the generated column is used here without any re-coding of the SQL statement.

Now, for the use of LIKE instead of equal, the feature works as well:

select * from bbb where upper(name) like '%AR%';

NAME    UPPER_NAME
Lars    LARS     
Mara    MARA     
CarolineCAROLINE 

EXPLAIN PLAN FOR select * from bbb where upper(name) like '%AR%';


OPERATOR_NAME   OPERATOR_DETAILS

COLUMN SEARCH   BBB.NAME, BBB.UPPER_NAME (LATE MATERIALIZATION)

  COLUMN TABLE  FILTER CONDITION: BBB.UPPER_NAME LIKE ''%AR%''

So, the old tricks still work with SAP HANA .

If this provides a much better performance is something to really test with actual data.

The huge difference in SAP HANA compared to row oriented DBMS is that the evaluation of the WHERE condition only works on the dictionary (the unique values of each column) and not on every single row.

Cheers, Lars

henrique_pinto
Active Contributor
0 Kudos

Hi Lars,

thank you for the information.

This is very handy for someone without a heavy DB modeling experience as myself to learn the tips&tricks the official manuals usually don't include.

So, back to the original problem: in a real life case, where by default there isn't such an upper_name column in our fact table, how should that "function-based index" be created in HANA? Could you create such a customer index with CREATE INDEX command or some other command? Or would it be necessary to alter the original table to include it?

lbreddemann
Active Contributor
0 Kudos

Well, the example that I posted actually is SAP HANA code.

One could use it just like that...

In any case, the Microstrategy Developer still will have to adopt his code, to include the UPPER() function, if this is the approach (s)he wants to take.

Investing a bit more effort to use the CONTAINS() function would allow for fuzzy search, similar to what we have with the enhanced F4 value help in Suite on HANA.

With that not only different letter cases would be handled but also common misspellings.

- Lars

henrique_pinto
Active Contributor
0 Kudos

Hi Lars,

I meant, for example, we have an existing KNA1 table, including the customer master data, and we want to select all customers in company code 'BR01', while allowing the user in the BI frontend to be able to filter for 'br01'. So, the question is: would we need to alter the original table to include a new column with UPPER(BUKRS)? Or is there a way which doesn't involve altering the original table (e.g. creating an index)?

lbreddemann
Active Contributor
0 Kudos

Hi again,

except for using the CONTAINS() function, there's no way to have this behavior without adding the generated column.

On the other hand, this is not too bad.

For pure NetWeaver scenarios using CONTAINS() is a viable and efficient option.

For scenarios where the table is replicated into a SAP HANA data mart, you can add the column without breaking any existing code.

And the change of table structure really is only an issue, when the table is accessed via SELECT *.

In all other cases, the generated column will simply be ignored.

- Lars

henrique_pinto
Active Contributor
0 Kudos

Thank you very much for the reply.

Best regards,

Henrique.

patrickbachmann
Active Contributor
0 Kudos

Lars, I'm just re-reading this forum and specifically stuck on your comment about the generated columns.  It seems your example is creating a new column with the new entries inserted as lower case (ie: insert into bbb values ('Lars');) and then they are actually stored upper case in the new column.  The problem I'm having though is actually the opposite.  I ALREADY have all caps in my column and my PLANT (werks) is stored in the column as 'MY01' but the user is searching using just lower case (ie: they are searching for 'my01' and no result).  Perhaps I am misunderstanding your comment?  My workaround for the user is to use UCASE in their select statement but they are asking if we can turn case sensitivity off at the HANA database level due to restrictions in the functionality of the Microstrategy software they are using.  Supposedly when they have to write custom SQL in their prompts for example (like to select only UCASE using my workaround) they then lose other functionality with their prompts. 

patrickbachmann
Active Contributor
0 Kudos

PS:  Lars, if I was indeed to create a new index column wouldn't I actually need to make it all lower case and store the new column like 'my01'.  Of course that still would not be good because the user could type My01 and still have no match.

stefan_koehler
Active Contributor
0 Kudos

Hi Patrick,

as far as i get you (and Lars) right, Lars already mentioned the CONTAINS() function for that.

Documentation: http://help.sap.com/hana/html/_dsql_predicates.html#sql_predicates_contains_predicate

It seems like HANA does not have such global settings (like nls_sort and nls_comp on Oracle) for case insensitive search.

Regards

Stefan

P.S.: Damn, that HANA documentation is so complex like finding the Higgs particle and it seems like missing some information as well

patrickbachmann
Active Contributor
0 Kudos

Thanks Stefan.  Yes that's more what I was looking for; a global setting as you touched upon.  My view is being consumed via microstrategy and when they choose a value from a PROMPT it is a standard SQL statement and if the user typed lower case then that's what is passed.  So all of these great workarounds that you guys suggested (and I had suggested they use UCASE before I created this thread) indeed work in Microstrategy but apparently when they tweak the prompt and instead of the default SQL that is generated, they place something like the CONTAINS or UCASE workaround, then they lose a bunch of standard functionality that comes with the microstrategy prompt functionality out of the box.  ie: The prompt is like a dialog box and you can see a list of values, or search the list, etc.  So this is they only reason I was asking for a global setting.... because they asked the question.  But I knew it's not something we wanted to do even if there was such an option.  I just had to ask.  Thanks for your great feedback.

-Patrick

henrique_pinto
Active Contributor
0 Kudos

Sounds like a great chance to showcase BO instead.

lbreddemann
Active Contributor
0 Kudos

Hi Patrick,

you're right - in the use case as I know understand it, the generated column would be of limited use.

Question here is of course: for which kind of data does it make sense to switch off case-sensitivity *globally*?

I tend to believe that the search feature should be case-agnostic but the data itself is actually meant to contain different case letters (otherwise this should be handled during data insert/update).

Since not all DBMS provide such a "ignore-case" switch, this might be a call for Microstrategy to extend the feature set of their search dialogue... 😉

- Lars

lbreddemann
Active Contributor
0 Kudos

Hi Stefan,

maybe is useful for you in this case

But I of course agree: the documentation could be better and I can reassure you that it is being worked on that.

- Lars

Answers (3)

Answers (3)

former_member182114
Active Contributor

Hi Patrick,

For text search and non sensitivy you should use:

CONTAINS("WERKS",'xyz1')         ( to catch a word in a phrase )

or

CONTAINS("WERKS",'%xyz1%')   ( % to works like like and catch part of a word )

You don't need a text index to try this out, but for special options like fuzzy or performance it's required.

Regards, Fernando Da Rós

Former Member
0 Kudos

If I need to do group by on any text field and avoid case sensitive Is there any Collate setting in HANA database level?

patrickbachmann
Active Contributor
0 Kudos

Vinod, according to this link below collation is not supported at the db level although I haven't found any other official sap documentation saying this.  However I had a message open with SAP on this issue and they just said at the database level turning off case sensitivity was not possible.

http://scn.sap.com/docs/DOC-39120

-Patrick

Former Member
0 Kudos

Thanks Patrick, I resolved this by replacing with UPPER() for the grouping filed. Thanks for the reference. It would be really great if SAP come up with a config parameter, it will definitely get used in DATA MART scenarios...

-Thanks,

Vinod Nair.

vinod.nair@cognilytics.com

rindia
Active Contributor
0 Kudos

Hi Patrick,

In order to avoid case sensitive issues, no need of changing an settings.

Just proper sql is enough to solve this problem.

Here is my work around for your situation.

First i modified the table (EMPLOYEE) to include names in a column "ENAME" with name as JOHN.

SO i have four employees with the same name but with different cases like all lower case, all upper case, mixed case as shown below

            

Then the SQL required to get the solution is

select "SRK"."EMPLOYEE"."EID","SRK"."EMPLOYEE"."ENAME" from "SRK"."EMPLOYEE"

WHERE  contains ("SRK"."EMPLOYEE"."ENAME", 'john')

ORDER BY "SRK"."EMPLOYEE"."EID";

        

Irrespective of the search string case you give (JOHN, john, JOhn, etc), you will always get it.

Regards

Raj Kumar

patrickbachmann
Active Contributor
0 Kudos

Guys, I appreciate all of your great feedback on this.  I am digesting all of this information and will discuss with the rest of my HANA developer team before deciding our approach and closing this thread.

Thanks!

-Patrick