cancel
Showing results for 
Search instead for 
Did you mean: 

Trailing whitespace ignored

Former Member
0 Kudos

Doing the following query:

  select 'wat ' as x from dummy

  union

  select 'wat' as x from dummy

In other words, I'm doing a union on two rows, where one value has a trailing whitespace.

This is what I get back:

xcount(*)
wat 1

Since the strings are not the same, I expect to get 2 different results. Is there something I'm missing?

Accepted Solutions (1)

Accepted Solutions (1)

lbreddemann
Active Contributor
0 Kudos

Just out of fun, could you try and check the result for

select to_varchar( 'wat ') as x from dummy

  union

  select to_varchar('wat') as x from dummy

?

I suspect an implicit type conversion to CHAR is happening here.

kr_pavankumar
Contributor
0 Kudos

Hi Lars,

Here is the output - only one record is returned

--

KRPK

Former Member
0 Kudos

Nicolas, I believe you can avoid your issue by using "union all":


The simple "union" operator will check the results for a real difference and whitespace isn't sufficient to constitute a difference. While "union all" will not perform a comparison and therefore (as Lars suggested) refrain from an implicit conversion.

Former Member
0 Kudos

I was just using union as a way to build a simple test case. This is a problem that affects other operations.

Join

with

  test1 as (

   select 'wat ' as trail from dummy

  ),

  test2 as (

   select 'wat' as normal from dummy

  )

select

  trail

, normal

, length(trail) as trail_length

, length(normal) as normal_length

from test1

join test2 on test1.trail = test2.normal

which results in this:

trailingnormallength(trailing)length(normal)
watwat43

Group By

select x, count(*) from (

  select 'wat ' as x from dummy

  union all

  select 'wat' as x from dummy

)

group by x


...results in one "wat, 2" row.

Distinct

select distinct x from (

  select 'wat ' as x from dummy

  union all

  select 'wat' as x from dummy

)


... results in one "wat" row.

The big issue with this bug is that it hides underlying data quality problems and results in incorrect reporting. I wonder if this is actually a feature, intended to clean people's bad data. If that's the case, then there should be a way to turn it off through a config or something.

Former Member
0 Kudos

Just to add more tests:

select LENGTH(TO_VARCHAR('wat ')) from dummy

Results in 4, as expected.

Also this might be interesting:

select HASH_MD5(TO_BINARY('wat ')) from dummy

union

select HASH_MD5(TO_BINARY('wat')) from dummy

This results in 2 rows (i.e. two different hashes), as expected.

lbreddemann
Active Contributor
0 Kudos

Ok, this is not a bug, but ANSI SQL standard behavior.

For comparisons (except for LIKE) of two strings, the shorter string gets right-padded with space before comparison.

Operations like UNION and JOIN perform string-wise comparisons and therefore implement this semantic.

HASH_MD5, LENGTH and e.g. DISTINCT on the other hand focus don't follow this approach.

If you look around for 'distinct char trailing' you'll find that this specific bit of ANSI conform behavior has spread quite a bit of confusion around nearly all DBMS out there.

Former Member
0 Kudos

Thanks for the explanation! Is there a way to turn off this behavior via some kind of flag or config?

lbreddemann
Active Contributor
0 Kudos

I'm not aware of such an option.

Having written that: what's the actual use case for this? Why do the trailing spaces represent significant information that should be considered in a character context?

If byte-semantics are required, you could always just pick a binary data type.

Answers (2)

Answers (2)

Former Member
0 Kudos

Hi Nicolas,

I consider this behaviour as a bug. You could open a SAP call for this. And it is inconsistent, see the length on strings which is handled correctly:


Regards,
Mark

Former Member
0 Kudos

Hey Mark,

I wish I could open some kind of bug report with SAP, but I don't think I have a channel to do this since I'm using SAP Hana One on AWS.

kr_pavankumar
Contributor
0 Kudos

When the given select statement is executed, system performs the union on two sets and picks the distinct set of values. It seems the 'wat' and 'wat  '(with trailing white-space) are treated as same value by the row engine and an output of 1 records. "trailing white-sapces are not considered"

whereas the values with leading white-space are treated as different values and its returning two records for the below query

select ' wat ' as x from dummy

  union

  select '   wat' as x from dummy

--

KRPK