cancel
Showing results for 
Search instead for 
Did you mean: 

Problems loading data into IQ

Former Member
0 Kudos

Hey,

Doing some testing with IQ and struggling to load some simple data.

CREATE TABLE "TRANSACTION" (

           "CUSTOMER_ID" INTEGER,

           "MERCHANT_ID" INTEGER,

           "TXTIMESTAMP" TIMESTAMP,

           "TXAMOUNT" DECIMAL(17,2),

           "TXCURR" CHAR(3));

LOAD TABLE transaction

( customer_id, merchant_id, txtimestamp datetime('yyyy-mm-dd hh:nn:ss,'), txamount, txcurr)

FROM '/iq/demo/1.csv'

ESCAPES OFF

DELIMITED BY ','

ROW DELIMITED BY '\n';

host:/iq/demo # cat 1.csv

1945178,2058615,2001-01-08 16:52:53, 8.75,GBP

1180243,3118407,2001-01-02 01:25:25,14.22,GBP

Now I read you have to have a row delimiter to enable load parallelism which is why I included it. But, it returns an error.

host:/iq/iq/IQ-16_0/demo # dbisql -c "uid=DBA;pwd=sql" load3.sql

Could not execute statement.

  Missing row delimiter detected during a row delimited insert.

I found that if I changed the format to this (note the trailing commas):

host:/iq/demo # cat 1.csv

1945178,2058615,2001-01-08 16:52:53, 8.75,GBP,

1180243,3118407,2001-01-02 01:25:25,14.22,GBP,

And then I run a simpler load statement:

LOAD TABLE transaction

( customer_id, merchant_id, txtimestamp datetime('yyyy-mm-dd hh:nn:ss,'), txamount, txcurr)

FROM '/iq/demo/1.csv'

ESCAPES OFF;

Then it works just fine. Not sure why though...

Ideas much appreciated!

John

Accepted Solutions (1)

Accepted Solutions (1)

markmumy
Advisor
Advisor
0 Kudos

John,

Likely your issue is because you don't have FORMAT BCP specified.  Your file does not appear to have a delimiter after the last column.  When you use DELIMITED BY and ROW DELIMITED BY the default is to expect a column delimiter after every column, even the last.  This is known as FORMAT ASCII (the default).  Changing to FORMAT BCP then tells IQ that the column delimiter will exist on all columns EXCEPT the last one.

Keep in mind, too, that using the DATETIME() feature means that that column must ALWAYS be that wide.  You cannot have an empty string that represents a NULL.  In reality, though, you don't need to specify the DATETIME() function as your format is one of the default formats for IQ.

Here is a sample:

set temporary option escape_character='on';

set temporary option quoted_identifier='on';

set temporary option date_format='ymd'; -- forces the date order to be YMD for loading

LOAD TABLE transaction

( customer_id, merchant_id, txtimestamp, txamount, txcurr)

FROM '/iq/demo/1.csv'

FORMAT BCP

ESCAPES OFF

DELIMITED BY ','

ROW DELIMITED BY '\n';

I ran this on my Linux machine and loaded the data without issue.

Mark

Former Member
0 Kudos

Hi Mark,

Thanks for the reply. I did see the BCP format in the LOAD documentation but I couldn't figure out what it means. This explains it!

However what I run the code that you give, I get the following error instead. I guessed that it's a delimiter problem... I checked in a hex editor and I see:

Delimiter is 0x2c which is what's in the file (comma)

Row Delimiter is 0x0d 0x0a which is CRLF

So... I changed the delimiter and it worked. Yay!

On a side-note I've used these same files to load into 10 different databases and only IQ was this hard - normally I'm up and running within 5-10 minutes. Think it's a combination of weird terminology, weird syntax and lack of action on forums with people discussing this stuff. It's a shame because IQ seems like a pretty interesting database.

Thanks again!

set temporary option escape_character='on';

set temporary option quoted_identifier='on';

set temporary option date_format='ymd'; -- forces the date order to be YMD for loading

LOAD TABLE transaction

( customer_id, merchant_id, txtimestamp, txamount, txcurr)

FROM '/iq/demo/1.csv'

FORMAT BCP

ESCAPES OFF

DELIMITED BY ','

ROW DELIMITED BY '\x0d\x0a';

John

server:/iq/iq/IQ-16_0/demo # dbisql -c "uid=DBA;pwd=sql" loadnew.sql

Could not execute statement.

  Right truncation of string data

  SQLCODE=-638, ODBC 3 State="22001"

  File: "loadnew.sql" on line 1, column 1

  LOAD TABLE transaction

  ( customer_id, merchant_id, txtimestamp, txamount, txcurr)

  FROM

  '/iq/transaction_2001.csv',

  '/iq/transaction_2002.csv',

  ...

  You can continue executing or stop.

1. Stop

2. Continue

John

markmumy
Advisor
Advisor
0 Kudos

The 'right truncation of string data' generally means that you are loading data into IQ that is wider than the column type (for any reason).  For instance, if you have a char(10) field and load 11 or more characters into it, you will see the error.  In previous versions we silently truncated the data.  There is an option called String_Rtruncation that can be set from the default of ON (throw an error) to OFF.

Personally, I prefer to use the hex values for carriage return and line feed just as you have.  You could also drop in \r\n if needed.  But like I said, the hex values leave no room for error as they are quite specific.

Glad you got it working!

Former Member
0 Kudos

Well this is curious... when I added the , at the end, it did load my data. And when I loaded 12 files concurrently, it loaded remarkably quickly - 2m30s for 1.4bn rows. Very nice.

But now with this new syntax, I don't see the 1200% on process iqsrv16 that I saw before (which is I'm guessing 12 concurrent load threads). Instead, I see one java process (for the loader?) with 14% CPU utilization.

10 minutes later... nothing much has happened. Any ideas what I have done wrong?


John

markmumy
Advisor
Advisor
0 Kudos

Can you send the exact load table?  And the size of the file to be loaded?

In the previous load with a trailing comma you mentioned 12 files.  IQ 16 will load each file in the FROM clause in separate threads.  If this is just a single file, then the CPU drop makes sense.  However, a properly formatted load table should run fully parallel with just 1 file.

Likely, the cause is that you don't have QUOTES OFF specified.  The default is ON, kinda backwards in my opinion.  Anyway, QUOTES ON means that we cannot load fully parallel as we have to scan each column single threaded looking for the beginning quote and end quote.  This is time consuming and is not currently parallel.

Before sending anything, try dropping in QUOTES OFF near the ESCAPES OFF line and see if that works.

Former Member
0 Kudos

Thanks one again! Looks like this was unrelated - the IQ process had hung and no SQL was being executed 🙂

I tried to stop the I server, and got the following. So I had to kill the process in the end to get it to stop. After that, I was able to restart IQ and restart the load process.

One thing that I find odd though is that it's no faster - 2m30s and I only see 1200% CPU when I have 40 cores and 80 threads available.

Feels like I've done something that has dropped the IQ Bulk Loader out of parallel mode.

John

server:/iq/iq/IQ-16_0/demo # more loadnew.sql

set temporary option escape_character='on';

set temporary option quoted_identifier='on';

set temporary option date_format='ymd';

DROP TABLE transaction;

CREATE TABLE "TRANSACTION" (

  "CUSTOMER_ID" INTEGER,

  "MERCHANT_ID" INTEGER,

  "TXTIMESTAMP" TIMESTAMP,

  "TXAMOUNT" DECIMAL(17,2),

  "TXCURR" CHAR(3));

LOAD TABLE transaction

( customer_id, merchant_id, txtimestamp, txamount, txcurr)

FROM

'/iq/transaction_2001.csv',

'/iq/transaction_2002.csv',

'/iq/transaction_2003.csv',

'/iq/transaction_2004.csv',

'/iq/transaction_2005.csv',

'/iq/transaction_2006.csv',

'/iq/transaction_2007.csv',

'/iq/transaction_2008.csv',

'/iq/transaction_2009.csv',

'/iq/transaction_2010.csv',

'/iq/transaction_2011.csv',

'/iq/transaction_2012.csv'

FORMAT BCP

QUOTES OFF

ESCAPES OFF

DELIMITED BY ','

ROW DELIMITED BY '\x0d\x0a'

;

commit work;

server:/iq/iq/IQ-16_0/demo # stop_iq

Checking system ...

The following 1 server(s) are owned by 'root'

## Owner          PID   Started  CPU Time  Additional Information

-- ---------  -------  --------  --------  ------------------------------------

1: root         73757     Nov22  00:51:15  SVR:server_iqdemo DB:iqdemo PORT:2638

              /iq/iq/IQ-16_0/bin64/iqsrv16 @iqdemo.cfg iqdemo.db -ti 4400 -gn 25 -o /iq/iq/IQ-16_0/logfiles/server_iqdemo.0001.srvlog -hn 5

--

  Please note that 'stop_iq' will shut down a server completely

  without regard for users, connections, or load process status.

  For more control, use the 'dbstop' utility, which has options

  that control stopping servers based on active connections.

Do you want to stop the server displayed above <Y/N>? Y

Shutting down server (73757) ...

Checkpointing server (73757) .........................................................

Notice:    Time alotted for server shutdown has expired. The server may

    be waiting on a system or network lock or it may requires

    more time depending on memory allocation. If server does

    not respond further, a hard shutdown may be required.

markmumy
Advisor
Advisor
0 Kudos

How large is each file?

Former Member
0 Kudos

120m records, 5.2GB per file. There's a white paper here with some suggestions. I'd better check the number of CPUs I have allocated.

http://www.sybase.com/files/White_Papers/IQ-15-Loading-WP.pdf

John

Former Member
0 Kudos

Looks like I have plenty of CPUs but I'm running out of buffer cache pages. I didn't tune any parameters yet so that makes sense. I'm loading 62GB of flat files, so 2GB of RAM is joing to hurt sooner or later.

Not sure if that's causing the lack of parallel processing and I don't get the message "portions of this load may be single threaded".

I. 11/24 16:35:20. 0000000000 main Bufman: All buffer cache pages are in use, ask your DBA to increase the size of the buffer cache. slib/s_bufman.cxx:3861

John

=============================================================

IQ server starting with:

     10 connections         (       -gm )

    170 cmd resources       ( -iqgovern )

   3989 threads             (     -iqmt )

    512 Kb thread stack size   (   -iqtss  )

  2042368 Kb thread memory size ( -iqmt * -iqtss )

     80 IQ number of cpus  ( -iqnumbercpus )

      0 MB maximum size of IQMSG file ( -iqmsgsz )

      0 copies of IQMSG file archives ( -iqmsgnum )

=============================================================

markmumy
Advisor
Advisor
0 Kudos

That paper is really for IQ 15.  Can you send your CFG file?

The system has 40 cores, but IQ sees 80.  That's because hyperthreading is turned on.  We recommend that HT be turned off or you add this option to your cfg file:

     -iqnumbercpus 40

Where 40 is the number of physical cores without hyperthreading.

Seeing your CFG file will help with other options.


Mark

Former Member
0 Kudos

Hi,

Here's the config file. I've clearly not correctly understood IQ's configuration... because now it doesn't load at all 🙂

Could not execute statement.

  main Bufman: All buffer cache pages are in use, ask your DBA to increase

  the size of the buffer cache.

 

John

-c  96g

-gc 20

-gd all

-gl all

-gm 10

-gp 4096

-iqmc 96g

-iqtc 256g

-iqnumbercpus 40

Former Member
0 Kudos

There is a performance tuning guide here, that offers a bunch of suggestions. http://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc00169.1600/doc/pdf/iqperf.pdf

Based on those I changed my config file to the following... and now I get a different error 🙂

-c 48m

-gc 20

-gd all

-gl all

-gm 10

-gp 4096

-iqlm 128g

-iqmc 128g

-iqtc 128g

-iqnumbercpus 40

Could not execute statement.

  Ran out of IQ large memory (-iqlm) .

  -- (dflib/dfo_ColumnVector.cxx

John

Former Member
0 Kudos

OK... so IQ doesn't recognize the 128g for -iqlm, -iqmc and -iqtc, they have to be in MB. That's pretty stupid, it recognizes these for the -c option. Grr.

The following works and I see it is allocating 256GB RAM to IQ. Curiously, that wasn't improving performance, and I noticed the loader was I/O bound at 140 seconds.

So I moved the IQ files onto SSD. That made it slower, at 160 seconds. Now I'm really confused 🙂

Edit: I should point out at this stage that 140 seconds is much faster than any other database I've ever used - though that's not unexpected, IQ is incredibly fast for bulk loads, I think it was you that did the world record test, right? I just hate having an unexplained bottleneck!

John

-c 48m

-gc 20

-gd all

-gl all

-gm 10

-gp 4096

-iqlm 131072

-iqmc 131072

-iqtc 131072

-iqnumbercpus 40

markmumy
Advisor
Advisor
0 Kudos

So 140-160 seconds isn't too bad.  A bit below normal, but not bad.

Our rough calculation for IQ 16 is 10-20 MB per second per core.  A 40 core system should be able to process 400-800 MB per second in total.  62gb would take roughly 155 seconds at 400 MB/sec (10MB/sec/core).

Mark

PS - I've been trying to post for some hours and it keeps kicking out my tech response to one of the earlier posts.  GRRRRRRRR

markmumy
Advisor
Advisor
0 Kudos

I forgot to post this before I left and have been having trouble trying to post it.  Looks like you already figured that piece out:

Since this is IQ 16, there are some new things to consider.     Check out my new IQ hardware sizing guide: http://scn.sap.com/docs/DOC-46166.

Your -c is a tad too big at 96gb.  I would set it to 1g at most.  There are some systems were we do set it in the 4-8g range, but let's not go that high.

The 3 main parameters that you need to focus on are -iqtc, -iqmc, and -iqlm.  Take your total memory on the machine and take 90-95% of that number.  I see you have 256g for temp cache and 96g for main cache, plus 96g for catalog cache (-c).  I would assume (?) that you have 512gb of total RAM??

Assuming 512g total ram, take 90-95% of that.  Let's say 460gb.  Divide that by 3 to get 154gb.  Use that number for each of the three caches above:

-c 1g
-iqmc 154000
-iqtc 154000
-iqlm 154000

This should give you enough memory to do the loads and have other stuff running.  If your total RAM on the host is less than 512gb, adjust the above number accordingly.

Mark

Answers (0)