on 10-01-2013 8:40 AM
Hi all,
I'm creating a multisource universe that will require to join tables of different sources with great difference in number of rows. Thus, I was trying to get advantage of the Data Federator semi-joins when joining a big table from source A to a small table of source B. I have tuned the parameters just to always force the semi-join (just for testing) and however I've found that in some cases it never triggers, retrieving a huge record set from the big table when it shouldn't be necessary.
Check the next queries:
SELECT
Table__3."Field_A",
Table__1."Field_X",
Table__1."Field_Y",
Table__3."Field_Z"
FROM
Table__3 INNER JOIN Table__1 ON (Table__1."Field_A"=Table__3."Field_A")
WHERE
Table__1."Field_X" = 'Value_X'
AND
Table__3."Field_Z"='Value_Z'
Using this query the semi-join doesn't trigger and so the big table Table__3 receives a select where Table__3."Field_Z"="Value_Z" which returns a huge amount of data. However if I remove the additional filter on Table__3 it works:
SELECT
Table__3."Field_A",
Table__1."Field_X",
Table__1."Field_Y",
Table__3."Field_Z"
FROM
Table__3 INNER JOIN Table__1 ON (Table__1."Field_A"=Table__3."Field_A")
WHERE
Table__1."Field_X" = 'Value_X'
This way the semi-join triggers and Table__3 is selected using as a filter in the where the returned values from Table__1, reducing greatly the total number of records returned.
Anyone knows why should this condition Table__3."Field_Z"='Value_Z' might be preventing the semi-join?
Both tables have statistics computed. Table__3 (which is a view in fact) has a total amount of 70 million records (each field is also computed). Table__1 has just 3 records (this difference was made on purpose for the testing).
System parameters are configured like follows:
ACTIVATE_SEMI_JOIN_RULE = true
MIN_SOURCE_CARDINALITY_THRESHOLD_FOR_SEMI_JOIN_RULE = 15000
MIN_ACTIVATION_THRESHOLD_FOR_SEMI_JOIN_RULE = 1 (this should always force the semi-join!)
Both tables are linked in my Data Foundation by Field_A of both tables, with a 1:1 relationship, though in fact it should be 0-1:1 (all the records in the small table must be related to the big one, but one record of the big one may or not be related to the small one) but that relationship cannot be configured.
Any advice will be welcome. Thanks in advance.
Hi Lopez,
Just a small clarification, where these System parameters are configured? In Data Federation Administration Tools or at Business Layer (MSU).
-Abhi
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
It's in data federation administration tool. I'm not aware that those can be configured in any other place. Could you explain about that configuration you talk about in Business Layer?
Anyway, I answer myself. Looks like statistics of one of the filter fields of the big table was not computed. Even if I used the compute with children option, it seems it failed at some point and didn't compute all of the fields statistics.
In this case Data Federation seems to just ignore all the rest of statistics and will never trigger a semi-join (so policy seems to be, when in doubt don't trigger semi-join).
So bottom line is, always be sure that all statistics have been computed or Data Federation may never trigger a semi-join.
User | Count |
---|---|
88 | |
10 | |
10 | |
9 | |
6 | |
6 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.