cancel
Showing results for 
Search instead for 
Did you mean: 

Setting up semantic partitions ASE

james_morrison
Explorer
0 Kudos

Running  ASE 15.5

We have a  non-partitioned table  with 640 million rows. We are looking at setting up range semantic partitions  by create date

There are 200 + million rows for 2013 and 380 + million rows for 2014

I am thinking about setting up following partitions by create date:

Partitions 1  -  4 :  For 2015  By quarter   

Partition 5 -  For year 2014


Partition 6   -  2013 and and earlier

Add new Partitons for each new  year  ...

Only updating  current data  --  i.e. any data more than month old is no longer updated ..

Is this a viable breakdown ?

1st attempt at partitioning  ..

james_morrison
Explorer
0 Kudos

Thanks Mark

That is Correct I will set up  quarterly partitions on yearly basis

at this point I am not sure how long I need to keep data  --  Part of the process we are going through will determine that --

Thanks ..

Accepted Solutions (0)

Answers (1)

Answers (1)

james_morrison
Explorer
0 Kudos

One more question  -- before we can partition -- what would be best practice for running update stats on large  tables -- say 300 million  or more rows

Current default  command is :

update index statistics table name using 25 values with sampling = 10

former_member182259
Contributor
0 Kudos

Actually, I would like to comment that there are some nuances with partitioning and stats to be aware of.....but as far as your question goes, a lot depends on your version of ASE.   For pre-ASE 15.7, sampling works, but the best practice taught in classes was to do a full non-sampled stats first, then do 2-3 updates with sampling, then a full non-sampled stats in cycle - so if doing update stats weekly, the first run of the month would be a full non-sampled and other weeks in the month would be sampled.    However, what this is doing is trying to help you determine if stats sampling is working similarly to a non-sampled stats by virtue of the fact that you may have performance issues the latter weeks of the month using sampled stats vs. non-sampled.   How well this works often depends on how values are added to the different columns - e.g. scattered around evenly or monotonically increasing.    I personally have found that in later revs of 15.7 (e.g. sp100+) that running stats with hashing is much faster than stats with sampling and generates more accurate stats.   I know Mark has seen some issues - not sure where/why - but then I have seen problems with just update stats generically in which we have had to delete stats before re-running update stats....so not sure if the problem was caused by starting with non-hashed stats and then trying to update with hashed stats or not (I have always started with hashed stats).

Now, there is an interesting nuance of update stats with partitioning.    Yes, you can run update stats on partition basis.......but it doesn't update table stats (problem #1) and it can also lead to stats explosion.   I am not saying don't run update stats on a partition basis - I actually encourage it  - but suggest you know what is going to happen.    For example, partitioning - especially range partitioning - works best as far as maintenance commands - when you get in the 30+ partition range - and especially in the 50-100 partition range - assuming evenly distributed partitions.  In your case, you will likely get the same effect on 2014 and 2015 partitions as they will be much smaller.  When you run update stats on each partition, (assuming the default histogram steps) you will get 20 steps PER partition.....which can mean 1000+ for the entire table (if 50 partitions).  Not necessarily a problem unless the query needs to hit all the partitions (or some significant number of them) at which point the query will need considerable proc cache to load those stats.  Sooo......when using partitions, keep in mind that you may need to increase proc cache to handle the increase use during optimization.    On the table stats perspective, what it means is that periodically you might want to run update statistics (not update index statistics) on the table.....however, in my experience this hasn't been as necessary as one would think....and might only be necessary if you see the optimizer picking a table/partition scan when you think it should be choosing an index.

In your case, you might only have 20 steps for the whonking huge partition and then 20 steps for 2014 and 20 steps for each of the 2015 quarterly partition.    You might want to run the update stats for the 2013 and before partition with a larger step count (e.g. 100) and then run it with 20 or so for the other partitions.

Using partitions the way you are doing is interesting in a different perspective.   The current data is extremely small and therefore fast access (fewer index levels) and you don't get quite the penalty for queries that span a lot of partitions - e.g. a 5 year query doesn't have to hit 20 partitions the way it would for complete quarterly partitions.   However, this assumes the scheme is:

Partition 1 = data 2+ previous years

Partition 2 = data for previous year

Partitions 3-6 = data for current year by quarter

Which means at the end (or so) of each year, you will need to merge partitions.   Whenever you merge partitions, you will then need to run update stats again.

If the scheme instead is just to have historical partitions but going forward each year will simply have data by quarter, you might want to see what the impact on queries are - especially reports on non-unique indexes where the date range spans a lot of partitions or date range is not part of the query.