cancel
Showing results for 
Search instead for 
Did you mean: 

Semi-join doesn't trigger

Former Member
0 Kudos

Hi all,

I'm creating a multisource universe that will require to join tables of different sources with great difference in number of rows. Thus, I was trying to get advantage of the Data Federator semi-joins when joining a big table from source A to a small table of source B. I have tuned the parameters just to always force the semi-join (just for testing) and however I've found that in some cases it never triggers, retrieving a huge record set from the big table when it shouldn't be necessary.

Check the next queries:

SELECT

  Table__3."Field_A",

  Table__1."Field_X",

  Table__1."Field_Y",

  Table__3."Field_Z"

FROM

Table__3 INNER JOIN Table__1 ON (Table__1."Field_A"=Table__3."Field_A")

 

WHERE

  Table__1."Field_X"  =  'Value_X'

  AND

   Table__3."Field_Z"='Value_Z'

Using this query the semi-join doesn't trigger and so the big table Table__3 receives a select where Table__3."Field_Z"="Value_Z" which returns a huge amount of data. However if I remove the additional filter on Table__3 it works:

SELECT

  Table__3."Field_A",

  Table__1."Field_X",

  Table__1."Field_Y",

  Table__3."Field_Z"

FROM

Table__3 INNER JOIN Table__1 ON (Table__1."Field_A"=Table__3."Field_A")

 

WHERE

  Table__1."Field_X"  =  'Value_X'

This way the semi-join triggers and Table__3 is selected using as a filter in the where the returned values from Table__1, reducing greatly the total number of records returned.

Anyone knows why should this condition Table__3."Field_Z"='Value_Z' might be preventing the semi-join?

Both tables have statistics computed. Table__3 (which is a view in fact) has a total amount of 70 million records (each field is also computed). Table__1 has just 3 records (this difference was made on purpose for the testing).

System parameters are configured like follows:

ACTIVATE_SEMI_JOIN_RULE = true

MIN_SOURCE_CARDINALITY_THRESHOLD_FOR_SEMI_JOIN_RULE = 15000

MIN_ACTIVATION_THRESHOLD_FOR_SEMI_JOIN_RULE = 1 (this should always force the semi-join!)

Both tables are linked in my Data Foundation by Field_A of both tables, with a 1:1 relationship, though in fact it should be 0-1:1 (all the records in the small table must be related to the big one, but one record of the big one may or not be related to the small one) but that relationship cannot be configured.

Any advice will be welcome. Thanks in advance.

Accepted Solutions (1)

Accepted Solutions (1)

abhinav_shrivastava1
Active Participant
0 Kudos

Hi Lopez,

Just a small clarification, where these System parameters are configured? In Data Federation Administration Tools or at Business Layer (MSU).

-Abhi

Former Member
0 Kudos

It's in data federation administration tool. I'm not aware that those can be configured in any other place. Could you explain about that configuration you talk about in Business Layer?

Anyway, I answer myself. Looks like statistics of one of the filter fields of the big table was not computed. Even if I used the compute with children option, it seems it failed at some point and didn't compute all of the fields statistics.

In this case Data Federation seems to just ignore all the rest of statistics and will never trigger a semi-join (so policy seems to be, when in doubt don't trigger semi-join).

So bottom line is, always be sure that all statistics have been computed or Data Federation may never trigger a semi-join.

Answers (0)