签到成功

知道了

CNDBA社区CNDBA社区

oracle12c新类型的优化器统计信息-top frequency histogram

2017-08-01 09:54 2395 0 原创 Oracle 12C
作者: Expect-乐

说明

官方文档:

http://docs.oracle.com/database/121/TGSQL/tgsql_histo.htm#TGSQL-GUID-DA1B97DA-DFE5-47CA-B8A0-57AB248B10EF

Oracle creates histograms on columns that have a data skew to improve cardinality estimates. Two additional types of histograms have been introduced for columns which have more than 254 distinct values to improve the cardinality estimates generated using histograms. A top frequency histogram is created if a small number of distinct values occupy most of the data (greater than 99% of the data). The histogram is created using the small number of extremely popular distinct values. By ignoring the unpopular values, which are statistically insignificant, a better quality histogram for the highly popular values can be produced. Alternatively, a hybrid histogram can be created which combines a height-based histogram and frequency histogram. It is a height-based histogram where frequent values always become the endpoint values and a value never spans more than one bucket. By recording the frequency of each end value, we record the frequency of the frequent values.

top frequency histogram它忽略了那些“非流行数据”(即出现频率低的数值)。例如,1000枚硬币中只有一枚面值1分的硬币,那在创建柱状图分组时,它就就可以被忽略。TopN频率柱状图能产生一个更利于“流行数据”(高频率数据)的柱状图。http://www.cndba.cn/Expect-le/article/2034

 

A top frequency histogram provides more accurate cardinality estimates for columns that have more than 254 distinct values but contain a small number of extremely popular distinct values (greater than 99% of the data has one of those values).

12c中,优化器的一个新特性就是新增了新的柱状图数据- top frequency histogram(TOP N频率柱状图)hybrid histogram(混合柱状图)优化器利用这两个柱状图,可以更准确的计算列的基数估计,从而生成更好的执行计划。

产生Top Frequency Histograms的条件

NDV(Distinct Value Number,不同值的总数) is greater than n, where n is the number of histogram buckets (default 254).http://www.cndba.cn/Expect-le/article/2034

柱状图的分组数量一定要小于NDV

The percentage of rows occupied by the top n frequent values is equal to or greater than threshold p, where p is (1-(1/n))*100.

出现频率最高的数值所占比例大于等于阈值Pp=(1-(1/n))*100。如果n=254p=99.6http://www.cndba.cn/Expect-le/article/2034

The estimate_percent parameter in the DBMS_STATS statistics gathering procedure is set to AUTO_SAMPLE_SIZE.

 DBMS_STATSestimate_percent必须设置为默认比例(AUTO_SAMPLE_SIZE)

对于上面几个参数,官网解释如下:

NDVhttp://www.cndba.cn/Expect-le/article/2034

This represents the number of distinct values in a column. For example, if a column only contains the values 100, 200, and 300, then the NDV for this column is 3.

n

This variable represents the number of histogram buckets. The default is 254.

p

This variable represents an internal percentage threshold that is equal to (1(1/n)) * 100. For example, if n = 254, then p is 99.6.

实验-如何产生Top Frequency Histogram

3.1   创建表并插入合适的数据

--创建表

http://www.cndba.cn/Expect-le/article/2034http://www.cndba.cn/Expect-le/article/2034

create table topfreq_t(id number);

--插入数据

insert into topfreq_t values(52792);

insert into topfreq_t values(52793);

insert into topfreq_t values(52793);

insert into topfreq_t values(52793);

insert into topfreq_t values(52793);

insert into topfreq_t values(52793);

insert into topfreq_t values(52794);

insert into topfreq_t values(52794);

insert into topfreq_t values(52795);

insert into topfreq_t values(52796);

insert into topfreq_t values(52797);

insert into topfreq_t values(52797);

insert into topfreq_t values(52798);

insert into topfreq_t values(52798);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

commit;

topfreq_t中有23条数据,与8种不同的值。如下:

SQL> select t.id, count(*) from TOPFREQ_T t group by t.id order by 1
  2  ;
 
ID   COUNT(*)
---------- ----------
     52792	    1
     52793	    5
     52794	    2
     52795	    1
     52796	    1
     52797	    2
     52798	    2
     52799	    9
 
8 rows selected.

3.2   生成 top frequency histogram

3.2.1  收集统计信息

http://www.cndba.cn/Expect-le/article/2034http://www.cndba.cn/Expect-le/article/2034

BEGIN

  DBMS_STATS.GATHER_TABLE_STATS (

    ownname    => 'LEI'

,   tabname    => 'TOPFREQ_T'

,   method_opt => 'FOR COLUMNS id SIZE 7'

);

END;

/

PL/SQL procedure successfully completed.

注意:id为表的列名,7是分组数要小于不同值的总数(8)

3.2.2  查看id列的统计信息

http://www.cndba.cn/Expect-le/article/2034

SQL> col table_name for a10

SQL> col COLUMN_NAME for a20

SQL> SELECT TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, HISTOGRAM

FROM   USER_TAB_COL_STATISTICS

WHERE  TABLE_NAME='TOPFREQ_T'

AND    COLUMN_NAME='ID';  2    3    4  
 

TABLE_NAME COLUMN_NAME	NUM_DISTINCT HISTOGRAM
----------------------  ----------------- --------------
TOPFREQ_T  	ID	   8 TOP-FREQUENCY

产生的原因:首先肯定是满足了前面的条件

分组数7小于了不同值的总数8

由于n=7,所以p=85.7,而7个出现最频繁的值所占的比例是95.65>85.7

3.2.3  查看Oracle所收集的7个频率最高的值

http://www.cndba.cn/Expect-le/article/2034

SQL> SELECT ENDPOINT_NUMBER, ENDPOINT_VALUE

FROM   USER_HISTOGRAMS

WHERE  TABLE_NAME='TOPFREQ_T'

AND    COLUMN_NAME='ID';  2    3    4  

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
      1  52792

      6  52793

      8  52794

      9  52796

     11  52797

     13  52798

     22  52799

7 rows selected.

可以看到,少了52795(只有一条记录)ENDPOINT_VALUE表示值出现频率的总和。

下图表示Top Frequency Histogram


版权声明:本文为博主原创文章,未经博主允许不得转载。

top frequency histogram

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
Expect-乐

Expect-乐

关注

Without the continuous bitter cold, there can be no fragrant plum blossom

  • 336
    原创
  • 6
    翻译
  • 100
    转载
  • 41
    评论
  • 访问:1571919次
  • 积分:1957
  • 等级:核心会员
  • 排名:第4名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ