[Info-ingres] Generating Stats with Optimizedb
Roy Hann
specially at processed.almost.meat
Thu Aug 29 08:59:11 UTC 2019
s.anderson.au at gmail.com wrote:
> Are there any general rules regarding generating statistics for a table with
> optimizedb? For example, should a table have at least X rows?
There're a couple of things to consider. How long will it take
optimizedb to generate the statistics? And if it will take a long time
how much benefit will you gain from taking the time to think more
carefully about which statistics to gather?
If it is going to be quick--and quite what "quick" means is up to
you--there is no point spending time thinking about it. Just run
optimizedb with the -zk flag and be done with it.
If optimizedb could take a long time because you have some huge tables,
and you are aware that specific queries are excessively slow, then you
can reasonably invest the time to work out which tables and which
columns need statistics.
To do that you will need to look at the query plans for the slow
queries. Without getting into what is a massive topic in detail,
one thing to look for would be K-joins where the optimizer has
under-estimated the number of rows in the outer source (the left-hand
source). Try to work out which columns it mistakenly thought would be
highly selective, then run optimizedb on that column(s).
Another big win often comes from gathering stats on non-key columns
used with constant conditions in the query where the values are highly
skewed. Status indicator columns are a classsic example, e.g.
WHERE status = 'FINISHED'
versus
WHERE status = 'PROCESSING'
The first will usually be unselective, the latter will usually be very
selective. The optimizer needs to know that. In that case you'd run
optmizedb using -rtable_name and -astatus.
Roy
More information about the Info-ingres
mailing list