[Info-ingres] stats on tables with more than 1million rows
douglas.inkster at gmail.com
douglas.inkster at gmail.com
Fri Jan 27 21:56:46 UTC 2017
Yeah - I guess that's a forever problem when sampling is used (which it is under the covers given the implicit sampling you're talking about). This is actually a chronic issue in query optimization - trying to extrapolate from a sample to figure out the number of unique values in the whole table. We use a statistical heuristic that I dug out of a paper somewhere to do this (the jack knife estimation technique, I'm sure you'll be glad to know). But it is simply that and isn't always very accurate.
In a case like this where the sample clearly contains all unique values, I suppose it would be pretty trivial to intuit that the column is in all likelihood distinct for the whole table. I can easily do something like that.
Doug.
On Friday, January 27, 2017 at 8:32:08 AM UTC-5, Martin Bowes wrote:
> Hi All,
>
>
>
> It appears that optimizedb automatically switches to sampling once a table breaks 1million rows, which is cool.
>
>
>
> But now I have a table with unique keys which the stats insist is a non-unique key with an average count per value of 1.6.
>
>
>
> Is this a problem?
>
>
>
> Martin Bowes
>
>
More information about the Info-ingres
mailing list