cancel
Showing results for 
Search instead for 
Did you mean: 

Question about default Fast Projection indexes

Former Member
0 Kudos

Hi

As far as i could understand, IQ (15.4) documentation states that all standard columns are in fact implemented as Fast Projection indexes.

My question is : Does Fast projection indexes allow a keyed (or range) look up over the data? Or, Does all searches of FP indexes traverse all bitmaps up to the end of the bitmap list (or all bitmaps since begining till the last value matching bitmap) ?

We are currently evaluating the need of an additional HG index on a IQ stage table which stores a great amount of analytical (atomic) data extracted and processed from the main warehouse tables. Those extractions are 'grouped' (not by GROUP BY, but each day for a given month, using BETWEEN ranges) by date and version and there should be no more than 10 groups/'extractions' of data in this table (and not more than 2 groups on the same day). The extraction not being used for some time and the ones which are 'correct' or 'final' are pruned from this table.

The issue is that the current effort to read each block of data (each month from day 1 to 31) is taking more time than expected (2 hours) on a

very capable server. (There is a join on the SELECT query, but its a simple one)

The table currently doesn't have a primary key or any additional index

The referred Date column is a datetime column, but the time information is being stored as 'zeroed' (00:00:00)

While checking about the need of addtional index(es) on this case with the current DBA in charge , we end up in doubt about creating a new HG index , since the cardinality for unique value for the given date column is not that big and the worst we could expect is about 3560 unique values (365 days x 10 years), and the amount of rows per day is about 6 million. The table does have about 180 columns ( I could positively guess that each 'row' is 4k wide)

We use to look for the most recent extraction, so, our current test environemnt have about 10 "extractions" from the last 5 months (2 per month - one 'original' and another one 'fixed').

We ran the query over Index Advisor, and so far it indeed recommended an HG or LF for both Date and Version columns, separately.

Also, given that there could happen more than one version per month, would a composite HG index be advisable?

And one last question. I have been looking over the following doc and its great bout talking about Fast Projection indexes. Is there any place where i could find details about physical layout for FP indexes ? The best info i could find about FP indexes are on the links or references below

https://sdm.lbl.gov/~kewu/ps/LBNL-62756.pdf

P. O’Neil and D. Quass. Improved Query Performance with Variant Indexes. In SIGMOD, Tucson, AR, USA,May 1997. ACM Press.

The Sybase IQ Survival Guide, page 173

Really thanks

Alexandre

Accepted Solutions (1)

Accepted Solutions (1)

tayeb_hadjou
Advisor
Advisor
0 Kudos

Hi Alexandre,

I would suggest

Blog post by Mark Mumy

SAP Sybase IQ Indexes and Indexing Techniques

Wiki page by Christine Zhong

SAP IQ Index Internal Data Structures - SAP IQ - SCN Wiki

From IQ15.4 Manuals :

Choosing an Index type

In case you plan upgrade to IQ16 , consider changes introduced in FP indexes.

Also in IQ 16, LF indexes are no longer recommended. Always choose HG in place of LF.

Regards, Tayeb

tayeb_hadjou
Advisor
Advisor
0 Kudos

I add

Customer Virtual Coffee Corner slides by Saroj Bagai :

Troubleshooting SAP IQ Query Performance - Part 1, 2 and 3 in

Customer Virtual Coffee Corner for SAP IQ - SAP IQ - SCN Wiki

Answers (1)

Answers (1)

Former Member
0 Kudos

Hi Tayeb

Really thanks for the information provided

I came up with the following conclusion.

Fast Projection 'indexes' are really not an index in the sense that it would provide a fast (no 'wasted' IO) or keyed path to a specific given value or range of values which are very small in quantity in the referred column structure storage (low cardinality representation in the 'column'). Any search done over a FP index is in fact a full 'column' scan in order to find all occurences of the given binary symbol occuring in all bitmaps which belong to the column. Values being used as SARG at the end of the column's data structure will only be read after reading all bitmap pages..

Is this correct?

On the other hand, columns with very low cardinality have a proportionally really small storage footprint. FP1 indexes could in theory leapfrog many very tuned b-tree indexes, even the ones with just one column/attribute. Another good thing for FP 'indexes' is that they have a great tendency of allowing parallel simultaneous scans over the same column, given their lower complexity.

My feeling is that since im reading a month range of data, in a table with many months (datetime) and 1-2 versions, i should add a HG index...   ill try an HNG index and make a comparison between it and the HG index ,since HNG does have a smaller footprint and processing overhead..

Thanks!

Alexandre

tayeb_hadjou
Advisor
Advisor
0 Kudos

Hi Alexandre,

For non-flat, FP uses Sargs to search date. Not only read. In particular for String search.

Here is a list of queries types where FP is the most used:

1- Projection (subset of columns)

2- String searchs Like (non-flat FP)

3- Calculation eg. sum(col1*col2), Avg(col1+col2), etc

4- adhoc joins

5- FP any type can be done in Parallel

6- Evaluation of search conditions.

Yes, Non-flat FP (low or middle cardinality) optimizes storage.

Also FP indexes provide columns metadata needed by optimizer.

For full dates ranges or month ranges using DATEPART function, the Date or DTTM (datetime) indexes are recommended :

Queries with range predicates (>, <, >=, <=, BETWEEN)-

Queries with DATEPART (equality, non-equality, range, in list predicates).

Regards,

Tayeb.

Former Member
0 Kudos

Hi Alexandre,

I made some searches on this same topic some time ago and came to the same conclusion as yours regarding FP indexes. FP indexes are not the same kind of indexes we are used to work with (B-trees).They are not optimized for selecting a subset of rows out of a larger set of rows . However,  FP indexes are really fast when rows need to be projected/materialized (join the RowID from several columns to construct a row). As the number of values into a page is constant, when you have the RowID (page#+row#) from one column of a table, it is very easy/fast to find the corresponding RowID into another column of the same table using the FP index. It is just a matter of arithmetic.

Hope this helps.

Marco

PS: These are my personal findings on this topic, and, may be proven to be wrong.