cancel
Showing results for 
Search instead for 
Did you mean: 

INNER JOIN PERFORMANCE ISSUE

Former Member
0 Kudos

MAXDB 7.6.06.3

Hello,

I have the following issue JOIN in MAXDB.

I have 2 tables, a MasterTable and a SlaveTable. For each line in the master table, there is one line in the SlaveTable. MSN is the unique key. I have 200.000 records in the table. I had to create a slave table because a line is longer than 8Ko.

I make the following request

SELECT m.msn from MasterTable m, SlaveTable s where m.msn=s.msn

This statement take more than 500 seconds to return, because it create a result table with 200.000 lines.

Is there a way to write statement with JOIN without copy the result, like when you do something like

SELECT * from MasterTable.

If I add a TOP 100 clause in my statement the request is very quick, but if I add an order clause it becomes very slow even if the column in the where clause is indexed.

Thanks

Yann.

Accepted Solutions (0)

Answers (3)

Answers (3)

Former Member
0 Kudos

Hello ,

there are several things that I don't get here:

What is it you mean by 8Ko or 20Ko ? Kilo bytes?

Yes, each line has almost 8Ko as DatabaseColumn and a Zipped XML BLOB. 1 million lines take 19Go

E.g. the whole "SNDADDRESS", "SNDNAME" etc. stuff cries out loud "Normalize me!"

In fact each line describe a fax or for other tables a mail. SNDADDRESS and SNDNAME are sender information, and it's a free text for customer.

Then there are such fields like "OWNERID" Varchar (120) UNICODE and "IDX_OWNERID" Varchar (120) UNICODE - what's the difference here?

We want to make case insensitive search using index so we store the uppercase value of OWNERID in IDX_OWNERID.

Concerning the indexes: you did not supply any filter predicate. You wanted all rows where the join condition is met.

In the filter of customer we will have filter predicate, but I cannot be sure that the customer will not request a lot of lines. Today for our customer request are fast, because of database optimization even if there is a lot of results.

Even upgrading to MaxDB 7.7, 8.0 or 23.4 won't fix that.

I don't think we could wait the version 23.5. But with the 7.7 version. The MAXDB record can now reach 32Ko, and several concurrency feature seems interesting.

Thanks for your help.

Yann.

lbreddemann
Active Contributor
0 Kudos

> Yes, each line has almost 8Ko as DatabaseColumn and a Zipped XML BLOB. 1 million lines take 19Go

Ok, the LONG fields are not stored within the primary table - so this doesn't matter.

As most other fields are defined with variable length it's unlikely that these really come up to 8 KB rowlength in reality.

> E.g. the whole "SNDADDRESS", "SNDNAME" etc. stuff cries out loud "Normalize me!"

>

> In fact each line describe a fax or for other tables a mail. SNDADDRESS and SNDNAME are sender information, and it's a free text for customer.

Ok, it's sender information - it shouldn't be in the FAXes table then.

This is a design fault.

> Then there are such fields like "OWNERID" Varchar (120) UNICODE and "IDX_OWNERID" Varchar (120) UNICODE - what's the difference here?

>

> We want to make case insensitive search using index so we store the uppercase value of OWNERID in IDX_OWNERID.

Same thing - build a owner table - do your indexing there.

That way you'll only read the information when you really need them.

E.g. when the customer really wants to perform a case insensetive search.

> Concerning the indexes: you did not supply any filter predicate. You wanted all rows where the join condition is met.

>

> In the filter of customer we will have filter predicate, but I cannot be sure that the customer will not request a lot of lines. Today for our customer request are fast, because of database optimization even if there is a lot of results.

That's comparing apples with oranges.

A non-restricted query will nearly always perform differently than a restricted query.

If you cannot restrict your customer from accessing the database directly - well, then you're really in dire straits here...

> Even upgrading to MaxDB 7.7, 8.0 or 23.4 won't fix that.

> I don't think we could wait the version 23.5. But with the 7.7 version. The MAXDB record can now reach 32Ko, and several concurrency feature seems interesting.

Sure - but with that design you'll end up using the 32 KB completely again.

The point is that the entity design here is plain wrong.

Relational databases are designed to support relational data models.

That's what they are optimized for.

They are not big excel sheets with a server function.

regards,

Lars

Former Member
0 Kudos

Hello,

We have an application that use almost 8Ko of data for internal computing, it also use a BLOB in zipped XML format that contains 20Ko of data. We would like to improve our customer experience by allowing them to create their own fields and allowing us to make request on those data.

We want to manage several millions rows. Our database already have 20 millions records and we want to keep those records online 10 years. And today request performance are good.

So I made test to see own I could implement this new architecture. the only way I find, is to use JOIN request. The customer will be able to create, and index data. Another approach should be to use the new MAXDB7.7.

What I would like to know it is. Is there a way to have the same performance with/and without JOIN CLAUSE.

If you have any other suggestion to implement this feature, please let me know.

For sample in this simple request with 200.000 records :

Index statistics has been updated.

SELECT m.msn from FGFAXOUT m, FGFAXOUT2 s where m.msn=s.msn

The explain returns with SQLStudio :

B FGFAXOUT_LANFAXOWNERA INDEX SCAN 43019

ONLY INDEX ACCESSED

A MSN JOIN VIA KEY COLUMN 43148

NO TEMPORARY RESULTS CREATED

RESULT IS COPIED , COSTVALUE IS 43269

QUERYREWRITE - APPLIED RULES:

DistinctPullUp 1

The explain join returns

B 172076 1 1 827.288461538462 172076 249

JOIN VIA SINGLE KEY A 172592 1 1 4001.76744186046 172076 43268

Our software uses ODBC to request the database, I made some test with several cursor types. This request always copy the result. So of course it is very slow.

In fact, I have not able to create a request where the result is not copied. Even all fields, I use are indexed.

For my test, the slave table and the master table are the same, here is the definition

CREATE TABLE "ESKDBADM"."DBM350_FGFAXOUT"

(

"ARCHIVEEXPIRATION" Date,

"CANCELREQUESTDATETIME" Timestamp,

"SNDADDRESS" Varchar (120) UNICODE,

"SNDTYPE" Varchar (20) UNICODE,

"RCPCOMPANY" Varchar (50) UNICODE,

"RCPNAME" Varchar (50) UNICODE,

"IDX_RCPNAME" Varchar (50) UNICODE,

"SNDACCOUNT" Varchar (40) UNICODE,

"SNDCOMPANY" Varchar (50) UNICODE,

"SNDNAME" Varchar (50) UNICODE,

"SUB_DATE_TIME" Timestamp,

"ACC_DATE_TIME" Timestamp,

"SENDDATETIME" Timestamp,

"CONV_DATE_TIME" Timestamp,

"COMP_DATE_TIME" Timestamp,

"CALL_COST" Integer,

"TOTALCALLCOST" Integer,

"VALID_DATE_TIME" Timestamp,

"LSTRY_DATE_TIME" Timestamp,

"NTRY_DATE_TIME" Timestamp,

"DEFERRED" Smallint,

"DIST_LASTWRITE" Timestamp,

"DIST_NUM" Integer,

"DIST_VERSION" Integer,

"DIST_OWNER" Char (64) UNICODE,

"DIST_OWNER_DATE_TIME" Timestamp,

"DIST_ABORTED" Integer,

"DIST_ABORTEDCOUNT" Integer,

"CONTFLAG" Integer,

"STATUSCODE" Integer,

"PREVIEWMESSAGE" Integer,

"PREVIEWVISIBLE" Integer,

"STATUS_STR" Varchar (100) UNICODE,

"MAX_RTRY" Integer,

"N_TRIES" Integer,

"MSN" Integer NOT NULL,

UNIQUE(MSN),

"ARCHIVEDURATION" Integer,

"APPNAME" Varchar (50) UNICODE,

"IDX_APPNAME" Varchar (50) UNICODE,

"ESTIMPRICE" Float (38),

"ESTIMPRICEUNIT" Varchar (3) UNICODE,

"REALPRICE" Float (38),

"REALPRICEUNIT" Varchar (3) UNICODE,

"CONTRACTID" Varchar (40) UNICODE,

"MAINACCOUNTID" Varchar (20) UNICODE,

"PROCESSINGLABEL" Varchar (6) UNICODE,

"VIEWED" Smallint,

"NOTIF" Smallint,

"NOTIFINDEX" Integer,

"PURGED" Integer,

"ARCMSN" Integer,

"ARCSTATE" Integer,

"ARCSAVFILE" Varchar (30) UNICODE,

"PRGF_DATE_TIME" Timestamp,

"PRGR_DATE_TIME" Timestamp,

"REMOTEMSN" Integer,

"DELEGATEDSEND" Integer,

"DELEGATED" Integer,

"DELEGATEDCANCELLED" Integer,

"DELEGATIONSYNCID" Varchar (28) UNICODE,

"REMOTENETWORKLOCATION" Varchar (128) UNICODE,

"REMOTENETWORKPORT" Integer,

"REMOTENETWORKPROTOCOL" Varchar (5) UNICODE,

"NEEDVALID" Smallint,

"VLD_DATE_TIME" Timestamp,

"VALID_OWNERID" Varchar (128) UNICODE,

"VALIDATIONSTATE" Integer,

"PREFERREDVALID_OWNERID" Varchar (128) UNICODE,

"OWNERID" Varchar (120) UNICODE,

"IDX_OWNERID" Varchar (120) UNICODE,

"OWNERPB" Varchar (50) UNICODE,

"SRCRUID" Varchar (24) UNICODE,

"ORIGINALJOBID" Varchar (12) UNICODE,

"WORKFLOWID" Varchar (32) UNICODE,

"USERCOMMENT" Varchar (50) UNICODE,

"N_PAGES" Integer,

"PAGES_PRODUCED" Integer,

"PAGES_SENT" Integer,

"GROUPKEY" Varchar (32) UNICODE,

"ISBILLABLE" Varchar (1) UNICODE,

"RETRIEVEDBYCLIENT" Varchar (1) UNICODE,

"IDX_RETRIEVEDBYCLIENT" Varchar (1) UNICODE,

"NOCOVER" Smallint,

"SEND_CSID" Varchar (20) UNICODE,

"DEF_DATE_TIME" Timestamp,

"IDENT" Varchar (120) UNICODE,

"IDX_IDENT" Varchar (120) UNICODE,

"FAXNUMBER" Varchar (40) UNICODE,

"ALTFAXNUMB" Varchar (40) UNICODE,

"STATE" Integer,

"ACTFAXNUMB" Varchar (40) UNICODE,

"RECV_CSID" Varchar (20) UNICODE,

"DURATION" Integer,

"TOTALDURATION" Integer,

"SPEED" Integer,

"SIG_STRGTH" Integer,

"SIG_QUAL" Integer,

"SIG_NOISE" Integer,

"LINE" Integer,

"SERVER" Varchar (50) UNICODE,

"SUBSTATE" Integer,

"RESOLUTION" Smallint,

"CODING" Varchar (4) UNICODE,

"ERRORCORRECTION" Smallint,

"MODF" Varchar (2) UNICODE,

"RMTFROMID" Varchar (120) UNICODE,

"RMTFROMTP" Varchar (20) UNICODE,

"RESULTFLOWID" Varchar (12) UNICODE,

"PRIORITY" Integer DEFAULT 0,

"INTNUMB" Varchar (40) UNICODE,

"INTALTNUMB" Varchar (40) UNICODE,

"CNCLNUM" Varchar (40) UNICODE,

"CNCLNUMALT" Varchar (40) UNICODE,

"SUBJECT" Varchar (120) UNICODE,

"IDX_SUBJECT" Varchar (120) UNICODE,

"TRY1_DATE_TIME" Timestamp,

"FAILOVERTIME" Timestamp,

"NUMINDEX" Integer,

"LANFAXOWNER" Varchar (64) UNICODE,

"IDX_LANFAXOWNER" Varchar (64) UNICODE,

"PROVIDERID" Varchar (30) UNICODE,

"DOCID" Varchar (12) UNICODE,

"DELETED" Smallint,

"DEPARTMENT" Varchar (120) UNICODE,

"FLOWID" Varchar (12) UNICODE,

"SATID" Varchar (40) UNICODE,

"LBSERVER" Varchar (40) UNICODE,

"DIST_FILE" Long BYTE,

PRIMARY KEY("MSN")

)

there is several index :

INDEXNAME --- COLUMNNAME -- SORT -- COLUMNNO -- TYPE

FGFAXOUT_ARCSTATEAMSNA ARCSTATE ASC 1

FGFAXOUT_ARCSTATEAMSNA MSN ASC 2

FGFAXOUT_COMP_DATE_TIMEA COMP_DATE_TIME ASC 1

FGFAXOUT_CONTASTATADELEAMSNA CONTFLAG ASC 1

FGFAXOUT_CONTASTATADELEAMSNA STATE ASC 2

FGFAXOUT_CONTASTATADELEAMSNA DELEGATEDSEND ASC 3

FGFAXOUT_CONTASTATADELEAMSNA MSN ASC 4

FGFAXOUT_CONTFLAPRIORIAMSNA CONTFLAG ASC 1

FGFAXOUT_CONTFLAPRIORIAMSNA PRIORITY ASC 2

FGFAXOUT_CONTFLAPRIORIAMSNA MSN ASC 3

FGFAXOUT_DELEGATIONAMSNA DELEGATIONSYNCID ASC 1

FGFAXOUT_DELEGATIONAMSNA MSN ASC 2

FGFAXOUT_DIST_OWNERA DIST_OWNER ASC 1

FGFAXOUT_FAXNUMBERA FAXNUMBER ASC 1

FGFAXOUT_FLOWIDAMSNA FLOWID ASC 1

FGFAXOUT_FLOWIDAMSNA MSN ASC 2

FGFAXOUT_IDENTAMSNA IDX_IDENT ASC 1

FGFAXOUT_IDENTAMSNA MSN ASC 2

FGFAXOUT_LANFAXOWNERA IDX_LANFAXOWNER ASC 1

FGFAXOUT_MAINACAAPPNAMAMSNA MAINACCOUNTID ASC 1

FGFAXOUT_MAINACAAPPNAMAMSNA IDX_APPNAME ASC 2

FGFAXOUT_MAINACAAPPNAMAMSNA MSN ASC 3

FGFAXOUT_MAINACAPREFERAMSNA MAINACCOUNTID ASC 1

FGFAXOUT_MAINACAPREFERAMSNA PREFERREDVALID_OWNERID ASC 2

FGFAXOUT_MAINACAPREFERAMSNA MSN ASC 3

FGFAXOUT_MAINACARETRIEDMSNA MAINACCOUNTID ASC 1

FGFAXOUT_MAINACARETRIEDMSNA IDX_RETRIEVEDBYCLIENT DESC 2

FGFAXOUT_MAINACARETRIEDMSNA MSN ASC 3

FGFAXOUT_MAINACCOUNASUB_DATE_TD MAINACCOUNTID ASC 1

FGFAXOUT_MAINACCOUNASUB_DATE_TD SUB_DATE_TIME DESC 2

FGFAXOUT_MSND MSN DESC 1 UNIQUE

FGFAXOUT_NTRY_DATE_DMSNA NTRY_DATE_TIME DESC 1

FGFAXOUT_NTRY_DATE_DMSNA MSN ASC 2

FGFAXOUT_ORIGINALJOAMSNA ORIGINALJOBID ASC 1

FGFAXOUT_ORIGINALJOAMSNA MSN ASC 2

FGFAXOUT_OWNERIDAMSNA IDX_OWNERID ASC 1

FGFAXOUT_OWNERIDAMSNA MSN ASC 2

FGFAXOUT_PRGF_DATE_TIMEA PRGF_DATE_TIME ASC 1

FGFAXOUT_PRGR_DATE_TIMEA PRGR_DATE_TIME ASC 1

FGFAXOUT_RCPNAMEAMSNA IDX_RCPNAME ASC 1

FGFAXOUT_RCPNAMEAMSNA MSN ASC 2

FGFAXOUT_RESULTFLOWIDA RESULTFLOWID ASC 1

FGFAXOUT_SRCRUIDAMSNA SRCRUID ASC 1

FGFAXOUT_SRCRUIDAMSNA MSN ASC 2

FGFAXOUT_STATEAPREVIEAMSNA STATE ASC 1

FGFAXOUT_STATEAPREVIEAMSNA PREVIEWMESSAGE ASC 2

FGFAXOUT_STATEAPREVIEAMSNA MSN ASC 3

FGFAXOUT_STATEASRCRUIAMSNA STATE ASC 1

FGFAXOUT_STATEASRCRUIAMSNA SRCRUID ASC 2

FGFAXOUT_STATEASRCRUIAMSNA MSN ASC 3

FGFAXOUT_SUBJECTAMSNA IDX_SUBJECT ASC 1

FGFAXOUT_SUBJECTAMSNA MSN ASC 2

FGFAXOUT_SUB_DATE_TIMED SUB_DATE_TIME DESC 1

regards.

Yann.

lbreddemann
Active Contributor
0 Kudos

Hi Yann,

there are several things that I don't get here:

What is it you mean by 8Ko or 20Ko ? Kilo bytes?

> "We would like to improve our customer experience by allowing them to create their own fields and allowing us to make request on those data."

This really sounds like the "generic" data model, where you're going to have a ton of "field/value" entries.

Such a design is inherently slow due to the necessary joins/lookups.

Looking at the table definition you posted, I cannot believe that there is no way to refine the modelling here and make multiple entities (tables) from that.

E.g. the whole "SNDADDRESS", "SNDNAME" etc. stuff cries out loud "Normalize me!"

Then there are such fields like "OWNERID" Varchar (120) UNICODE and "IDX_OWNERID" Varchar (120) UNICODE - what's the difference here?

> The customer will be able to create, and index data

No problem with that usually - except that your customers have to know a whole lot about databases to be able to make good decisions on indexing data in a sensible manner.

That's not exactly a no-brainer.

> Is there a way to have the same performance with/and without JOIN CLAUSE.

In general: no. A join does a lot more work than a simple table query.

It reads a record and tries to find the fitting ones in the other table. Having found all fitting rows it combines the data and stores it until you fetch it (so you have to perform the copying of data here).

Tell me, how should that much more work be done in the same time?

Concerning the indexes: you did not supply any filter predicate. You wanted all rows where the join condition is met.

You got it - 200.000 joined rows.

What should MaxDB use any index for in this case? The join condition is defined on the primary key which is the quickest access path there is for any row in a MaxDB database.

Therefore, the indexes are (for this query) useless.

I highly recommend to review this "all in one table" design and to make a proper relational design out of it.

Also - although I don't really know your application - I really don't think that you always want to fetch all rows there are, do you?

So, why not provide some filter predicates to bring down the resultset to the rows you're interested in?

All in all - this is a classic misdesign which will run slow on any database software there is.

Even upgrading to MaxDB 7.7, 8.0 or 23.4 won't fix that.

regards,

Lars

lbreddemann
Active Contributor
0 Kudos

> I have 2 tables, a MasterTable and a SlaveTable. For each line in the master table, there is one line in the SlaveTable. MSN is the unique key. I have 200.000 records in the table. I had to create a slave table because a line is longer than 8Ko.

I don't see the reason, why you need to have a "slave" table here.

How exactly do the tables look like? What kind of data do you store in 8K without using LONG/LOB values?

> I make the following request

>

> SELECT m.msn from MasterTable m, SlaveTable s where m.msn=s.msn

>

> This statement take more than 500 seconds to return, because it create a result table with 200.000 lines.

Sure it does - you asked exactely for this.

> Is there a way to write statement with JOIN without copy the result, like when you do something like

>

> SELECT * from MasterTable.

Depends on your query and the DB parameter settings.

> If I add a TOP 100 clause in my statement the request is very quick, but if I add an order clause it becomes very slow even if the column in the where clause is indexed.

Ok, the TOP100 query will finish the join processing when 100 rows had been retrieved - no wonder its quicker.

If you use ORDER BY - what do you expect? To order all rows the database has to know all rows first.

And this large resultset is also not a "quick-sort"

Anyhow - you're describing here many problems at once. Be much more spefic!

How does the query look like?

What are the db parameter settings? Have you implemented the parameter recommendations from the SDN MaxDB Wiki?

How do the tables look like?

What indexes are in place?

Are the statistics in place?

Have you enabled time measurement and started the db analyzer? Are there any warnings?

regards,

Lars