cancel
Showing results for 
Search instead for 
Did you mean: 

DS XI 3.0 Output is format masks of to_char function rather than the data

Former Member
0 Kudos

Data Services XI 3.0 (12.0.0.0)

Just started using DS and thought I was doing something wrong, but it now appears to me to be a bug.

Background: The intention is to standardise the data coming from different sources so we want to generate a single Unique Reference Number (URN) from each source. Some sources have already got a single column unique reference and others have composite columns.

I create a data flow (let's call it dataflow_a) which essentially has a Flat File source, a query to read that flat file and the output goes to a template table (stored on Oracle 10g database) (let's call it output_a) which has "Make Port" checked as I require the data flow to be an embedded source in another data flow. The query does a little manipulation of the incoming flat data so that it can be read from the file (i.e. it needs rtrimming and there are some dates without times etc.) and the output columns are a mixture of "int", "varchar" and "datetime". The data in this has a composite key to identify unique records and that is made up of 4 columns... ref_no (int), ref_year (int), region (varchar), seq_no (int)

If this dataflow is executed (batch job) then the data is read into the template table ok. So far so good.

dataflow_a:


flatfile - - - - query - - - - output_a (port)

Now, to generate the single URN, dataflow_a is used as a source in another dataflow (let's call it dataflow_b). A query is attached to the output of dataflow_a and the columns mapped directly from the left side to the right side.

In addition, within the query, we add an Output column called "urn" for which we include a mapping of:

(I've split this onto seperate lines at the concatenation points to make it easier to read on the forum)


ltrim_blanks(to_char(dataflow_a.ref_no,'099999'))||
ltrim_blanks(to_char(dataflow_a.ref_year,'09'))||
dataflow_a.region||
ltrim_blanks(to_char(dataflow_a.seq_no,'09'))

The output is captured in another template table (output_b) on the same Oracle database.

dataflow_b:


dataflow_a - - - - query - - - - output_b

However, when I execute this job now, all the data is populated correctly apart from the URN column which shows...

09999909SA09

for each row, with only the region changing with the data. The ref_no, ref_year and seq_no only show the format mask rather than the actual data.

Ok, so at first I thought I wasn't using to_char correctly, so I checked in the manual and it was identical to the examples given, it's documented that it works with "int" types and I also tested with some hardcoded numbers rather than the data columns just to check.

Now here's where it get's odd.

Firstly...

Within dataflow_b, if I put a second query pulling data from dataflow_a ...

dataflow_b:


dataflow_a - - - - query - - - - output_b
         \
          \- - - - query - - - - output_c

... and I execute this job, then the output_b data suddently works correctly and I get the ref_no, ref_yr, region and seq_no concatenated correctly within it. If I then remove that second query and output_c from it and execute the job again, output_b reverts to showing the format masks rather than the data again.

Secondly...

Within the query for dataflow_b, if I change the mapping for the URN to read...


ltrim_blanks(to_char(dataflow_a.ref_no*1,'099999'))||
ltrim_blanks(to_char(dataflow_a.ref_year*1,'09'))||
dataflow_a.region||
ltrim_blanks(to_char(dataflow_a.seq_no*1,'09'))

or


ltrim_blanks(to_char(to_decimal(dataflow_a.ref_no,'.',',',1),'099999'))||
ltrim_blanks(to_char(to_decimal(dataflow_a.ref_year,'.',',',1),'09'))||
dataflow_a.region||
ltrim_blanks(to_char(to_decimal(dataflow_a.seq_no,'.',',',1),'09'))

then this also works ok.

So it would seem that there is an issue within the to_char function for it recognising the datatype of the source columns (which are int) even though the documentation says that to_char is happy to accept int. However that still doesn't explain why adding a second query makes it work.

Is it something I'm doing? Is is something anyone else can reproduce? Is it a known bug?

Your input is appreciated.

Giles

Edited by: Giles Cartmel on Sep 8, 2008 3:01 PM

Changed title to something a little more appropriate

Accepted Solutions (0)

Answers (2)

Answers (2)

Former Member
0 Kudos

Greetings Thread Originator,

This post is older than 60 days and there are no entries in the past 30 days. Based on the content discussed, it appears that you question has been

answered. This message is being marked as answered and points are being assigned if available where possible.

Thank you for being an active participant in the SAP Forums,

Rob Siegele

Forum Moderator

SAP Americas

Former Member
0 Kudos

Ok, after a litte more testing (with simple reproducable test data) and investigation on my part I've found further evidence...

My source data (flat file) is like...

1¬¬¬¬¬<tab>2¬<tab>XX<tab>1¬<tab>Text Data¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬<tab>Text Data¬¬¬¬¬¬¬¬¬¬¬¬¬<newline> 2¬¬¬¬¬<tab>2¬<tab>XX<tab>1¬<tab>Text Data¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬<tab>Text Data¬¬¬¬¬¬¬¬¬¬¬¬¬<newline> 2¬¬¬¬¬<tab>2¬<tab>XX<tab>2¬<tab>Text Data¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬<tab>Text Data¬¬¬¬¬¬¬¬¬¬¬¬¬<newline> 1¬¬¬¬¬<tab>3¬<tab>XX<tab>1¬<tab>Text Data¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬<tab>Text Data¬¬¬¬¬¬¬¬¬¬¬¬¬<newline>

Where ¬ is a . i.e. I have fixed width columns with tab field delimiters.

When I read the flat file using the query, I do an RTRIM_BLANKS on all the data, whether numeric or text before it gets put into output_a where the columns are the corresponding int or varchar datatypes.

What appears to be happening is that because I've used RTRIM_BLANKS on the numeric data which is going into "int" columns in the output template table, it is somehow remembering that the data is varchar (the output data type of the rtrim_blanks function).

If I wrap the rtrim_blanks with a to_decimal conversion or just remove the rtrim_blanks (the source flat file has the column defined as "int" too), then the problem is solved.

However, this still doesn't explain why, in my first post, adding a second query taking data from the dataflow_a as a source, made it work too.

It seems to me that there is still a bug around implicit conversions of datatypes which are not passing through the flows correctly. Certainly, if my output_a has a column defined as "int" I would expect it to implicitly convert the data to int and treat it as such from that point onwards, but it's not.

Can anyone else reproduce this, or tell me that I'm doing something blatently stupid that I haven't seen, or is it really a bug as I think it is?

Thanks

Former Member
0 Kudos

I guess from the number of views of this thread and the lack of responses, that nobody has got a clue.

?

werner_daehn
Active Contributor
0 Kudos

Well, first I can tell you I had to read your post a couple of times to understand all tests and possible implications. And I am still not sure.

What you seem to overlook is the optimizer. If you have a varchar assigned to an int in one query, that query might not exist physically as it could get merged with others or split by the optimizer. Second, if the optimizer identifies you read a decimal, convert it to varchar and write it as decimal again, it might not even do the conversion - what for.

But logically, both cannot be the root cause to your finding. Converting a varchar to an int should happen somewhere inside the dataflow and certainly when you use a to_char(number) function. A pushdown of that into the source database cannot be the cause either, you are reading from a flat file.

The case where you split the data and suddenly even the unchanged data-stream works fine is an indicator that something along those lines happens. But what???

The to_char() was a complete surprise at first but then I made a quick test:

print(to_char(0, '09090909')) returned> '00000000'

print(to_char(sysdate(), '09090909')) returned> '09090909'

So it depends what the to_char() believes to get as an input datatpye, number or date. As you pass a varchar into the to_char() it is not obvious. Actually, could that be the root cause???

Former Member
0 Kudos

Whilst I know what optimizers are (from developing on Databases) I haven't a clue how the optimizer works on DS.

That wasn't something that was taught on our Business Objects DI and DQ courses.

The most in depth they went were the pushing down to the database stuff.

Sorry if the above was a little confusing, but I was still investigating to try and find the causes myself.

Also, being a regular contributor on another forum, I know how annoying it can be when someone posts a question and doesn't provide sufficient details.

Anyway I can reproduce the issue with this following simple example if this helps...

Flat file: (¬ = space, = tab)

1¬¬¬¬¬<tab>2¬<tab>XX<tab>Test1 2¬¬¬¬¬<tab>2¬<tab>XX<tab>Test2 3¬¬¬¬¬<tab>2¬<tab>XX<tab>Test3 1¬¬¬¬¬<tab>3¬<tab>YY<tab>Test4 2¬¬¬¬¬<tab>3¬<tab>YY<tab>Test5 1¬¬¬¬¬<tab>4¬<tab>ZZ<tab>Test6

Create a flat file template ff_src with columns

logno int

year int

region varchar (2)

text varchar (5)

Properties:

Location: Job Server

Root Directory and File name as appropriate

Type: Delimited

Delimiter Column: Tab

Delimiter Row: {new line}

Delimiter Text:

Create a Project

Create a Batch Job

Add a new Data Flow

Put ff_src as the source

Add a query and link ff_src into the query

Add a template table called test_src and link the output of the query to this

In the query...

Drag across logno, year, region and text from the left hand side to the right hand side

Add a new output column called "urn" of type varchar (20)

For the mappings for logno, year, region and text include an rtrim_blanks around each one...

rtrim_blanks(ff_src.logno) rtrim_blanks(ff_src.year) rtrim_blanks(ff_src.region) rtrim_blanks(ff_src.text)

For the mapping for urn set this to...

ltrim_blanks(to_char(rtrim_blanks(ff_src.logno),'099999'))||ltrim_blanks(to_char(rtrim_blanks(ff_src.year),'09'))||ff_src.region

Run this and the output shows the format mask rather than the values.

Now I'm assuming that, based on what you say, even though the logno and year are "int" the DS optimizer is looking at the destination column,

seeing that it's varchar, perhaps even noticing that region (which is varchar also) is included in the mapping and therefore treating everything as

varchar rather than doing the implicit conversions that we would expect. As the to_char function expects date or numeric data types as its

inputs I would assume it should implicitly ensure that the input values are numbers even if they were varchar (which they're not in this case).

Does that give any more of a clue?

werner_daehn
Active Contributor
0 Kudos
to_char(rtrim_blanks(ff_src.logno),'099999')

This is exactly the problem.

rtrim_blanks() returns a varchar.

to_char(number, string) and to_char(date, string) are defined, but not if the first parameter is a string. So DI assumes you would like to use the to_char(date, string) version of the to_char() function and hence converts your number into a date. The date then is converted into a char using the formatstring '099999' and for a date the format string replaces YYYY,MM,DD and the such characters, all others remain as is - it returns '099999'.

Former Member
0 Kudos

>

>

to_char(rtrim_blanks(ff_src.logno),'099999')

>

>

> This is exactly the problem.

>

> rtrim_blanks() returns a varchar.

>

>

> to_char(number, string) and to_char(date, string) are defined, but not if the first parameter is a string. So DI assumes you would like to use the to_char(date, string) version of the to_char() function and hence converts your number into a date. The date then is converted into a char using the formatstring '099999' and for a date the format string replaces YYYY,MM,DD and the such characters, all others remain as is - it returns '099999'.

It's unusual functionality as it's not what I would expect and certainly not what you would get from any normal database type language, for example...

1* select to_char(rtrim('1234 '),'099999') from dual SQL> / TO_CHAR -


001234

Here, rtrim returns a varchar just like DI does, but the to_char function which is also defined to accept numbers and dates assumes it's a number and does the implicit conversion to number before converting to varchar using the format mask, raising an "invalid number" if the string isn't a number i.e. for dates, they must be of DATE type. Having written low level parsers before myself, I know that the function overloading wouldn't look at the format mask to try and determine the intention behind what is being passed, however, it seems odd that DI assumes it would be the date version of the to_char function that is required, over and above the number function, which would often be more common and is they way it is implemented in all other languages I've used (and I've used many in my 25 years of software development)

Definitely one I'm going to have to watch out for.