[Info-ingres] Generating Stats with Optimizedb

Steve s.anderson.au at gmail.com
Fri Aug 30 05:58:39 UTC 2019


On Thursday, 29 August 2019 18:59:12 UTC+10, Roy Hann  wrote:
> Steve wrote:
> 
> > Are there any general rules regarding generating statistics for a table with 
> > optimizedb? For example, should a table have at least X rows?
> 
> There're a couple of things to consider. How long will it take
> optimizedb to generate the statistics? And if it will take a long time
> how much benefit will you gain from taking the time to think more
> carefully about which statistics to gather?
> 
> If it is going to be quick--and quite what "quick" means is up to
> you--there is no point spending time thinking about it. Just run
> optimizedb with the -zk flag and be done with it.
> 
> If optimizedb could take a long time because you have some huge tables,
> and you are aware that specific queries are excessively slow, then you
> can reasonably invest the time to work out which tables and which
> columns need statistics.
> 
> To do that you will need to look at the query plans for the slow
> queries. Without getting into what is a massive topic in detail,
> one thing to look for would be K-joins where the optimizer has
> under-estimated the number of rows in the outer source (the left-hand
> source). Try to work out which columns it mistakenly thought would be
> highly selective, then run optimizedb on that column(s).
> 
> Another big win often comes from gathering stats on non-key columns
> used with constant conditions in the query where the values are highly
> skewed. Status indicator columns are a classsic example, e.g. 
> 
>   WHERE status = 'FINISHED'
> 
> versus 
> 
>   WHERE status = 'PROCESSING'
> 
> The first will usually be unselective, the latter will usually be very
> selective. The optimizer needs to know that. In that case you'd run
> optmizedb using -rtable_name and -astatus.
> 
> Roy

Great information, thanks Roy!

Steve



More information about the Info-ingres mailing list