[Info-ingres] Generating Stats with Optimizedb
Steve
s.anderson.au at gmail.com
Fri Aug 30 05:58:39 UTC 2019
On Thursday, 29 August 2019 18:59:12 UTC+10, Roy Hann wrote:
> Steve wrote:
>
> > Are there any general rules regarding generating statistics for a table with
> > optimizedb? For example, should a table have at least X rows?
>
> There're a couple of things to consider. How long will it take
> optimizedb to generate the statistics? And if it will take a long time
> how much benefit will you gain from taking the time to think more
> carefully about which statistics to gather?
>
> If it is going to be quick--and quite what "quick" means is up to
> you--there is no point spending time thinking about it. Just run
> optimizedb with the -zk flag and be done with it.
>
> If optimizedb could take a long time because you have some huge tables,
> and you are aware that specific queries are excessively slow, then you
> can reasonably invest the time to work out which tables and which
> columns need statistics.
>
> To do that you will need to look at the query plans for the slow
> queries. Without getting into what is a massive topic in detail,
> one thing to look for would be K-joins where the optimizer has
> under-estimated the number of rows in the outer source (the left-hand
> source). Try to work out which columns it mistakenly thought would be
> highly selective, then run optimizedb on that column(s).
>
> Another big win often comes from gathering stats on non-key columns
> used with constant conditions in the query where the values are highly
> skewed. Status indicator columns are a classsic example, e.g.
>
> WHERE status = 'FINISHED'
>
> versus
>
> WHERE status = 'PROCESSING'
>
> The first will usually be unselective, the latter will usually be very
> selective. The optimizer needs to know that. In that case you'd run
> optmizedb using -rtable_name and -astatus.
>
> Roy
Great information, thanks Roy!
Steve
More information about the Info-ingres
mailing list