openGauss ANALYZE | ANALYSE 说明

1 ANALYZE | ANALYSE 说明

1.1 ANALYZE 背景

openGauss是基于代价估算生成最优执行计划。优化器需要根据analyze收集的统计信息估算代价，因此统计信息对优化器行数估算和代价估算起着至关重要的作用，准确的统计信息可以生成最有效的查询执行计划，提高查询性能。

因此在表中数据发生明显变化的时候，建议执行ANALYZE语句生成最新的表统计信息。统计结果存储在系统表PG_STATISTIC 和PG_CLASS中。

pg_class表：relpages和reltuples；
pg_statistic 表： stadistinct 、 stanullfrac 、 stanumbersN 、 stavaluesN 。

[dave@www.cndba.cn cndba]$ gsql -h localhost -p 15500 -d postgres -U omm -W omm@123456 -r 
gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr  )
SSL connection (cipher: ECDHE-RSA-AES128-GCM-SHA256, bits: 128)
Type "help" for help.

openGauss=# /dS+ pg_class 
                               Table "pg_catalog.pg_class"
      Column      |       Type       | Modifiers | Storage  | Stats target | Description 
------------------+------------------+-----------+----------+--------------+-------------
……
 relpages         | double precision | not null  | plain    |              
 reltuples        | double precision | not null  | plain    |             
 ……
openGauss=# /dS+ pg_statistic
                       Table "pg_catalog.pg_statistic"
    Column     |   Type   | Modifiers | Storage  | Stats target | Description 
---------------+----------+-----------+----------+--------------+-------------
 starelid      | oid      | not null  | plain    |              | 
 starelkind    | "char"   | not null  | plain    |              | 
 staattnum     | smallint | not null  | plain    |              | 
 stainherit    | boolean  | not null  | plain    |              | 
 stanullfrac   | real     | not null  | plain    |              | 
 stawidth      | integer  | not null  | plain    |              | 
 stadistinct   | real     | not null  | plain    |              | 
 stakind1      | smallint | not null  | plain    |              | 
 stakind2      | smallint | not null  | plain    |              | 
 stakind3      | smallint | not null  | plain    |              | 
 stakind4      | smallint | not null  | plain    |              | 
 stakind5      | smallint | not null  | plain    |              | 
……
 stanumbers1   | real[]   |           | extended |              | 
 stanumbers2   | real[]   |           | extended |              | 
 stanumbers3   | real[]   |           | extended |              | 
 stanumbers4   | real[]   |           | extended |              | 
 stanumbers5   | real[]   |           | extended |              | 
 stavalues1    | anyarray |           | extended |              | 
 stavalues2    | anyarray |           | extended |              | 
……

关于这些字段的说明，参考官方手册：

https://docs.opengauss.org/zh/docs/5.0.0/docs/DatabaseReference/PG_STATISTIC.htmlhttp://www.cndba.cn/dave/article/116556

当然也可以通过PG_STATS系统视图查看，pg_statistic 表更接近于底层，信息查看不是很方便，可以通过pg_stats封装的视图更直接的查看。

https://docs.opengauss.org/zh/docs/5.0.0/docs/DatabaseReference/PG_STATS.html

1.2 表自动分析

在openGauss5.0.0中通过参数autovacuum控制数据库自动清理功能的启动。

当autovacuum设置为on时，系统定时启动autovacuum线程来进行表自动分析，如果表中数据量发生较大变化达到阈值时，会触发表自动分析，即autoanalyze。

对于空表而言，当表中插入数据的行数大于50时，会触发表自动进行ANALYZE。
对于表中已有数据的情况，阈值设定为50+10%*reltuples，其中reltuples是表的总行数。

autovacuum可以进行表自动分析功能的生效还依赖于下面几个参数：

track_counts参数需要设置为on，表示开启收集收据库统计数据功能。
autovacuum_max_workers参数需要大于0，该参数表示能同时运行的自动清理线程的最大数量。
autovacuum_mode参需要配置允许进行analyze。

这几个参数默认都是启用的，所以默认会自动进行表分析：http://www.cndba.cn/dave/article/116556

[dave@www.cndba.cn cndba]$ gsql -h localhost -p 15500 -d postgres -U omm -W omm@123456 -r 
gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr  )
SSL connection (cipher: ECDHE-RSA-AES128-GCM-SHA256, bits: 128)
Type "help" for help.

openGauss=# show autovacuum;
 autovacuum 
------------
 on
(1 row)
openGauss=# show track_counts ;
 track_counts 
--------------
 on
(1 row)
openGauss=# show autovacuum_max_workers;
 autovacuum_max_workers 
------------------------
 3
(1 row)
openGauss=# show autovacuum_mode;
 autovacuum_mode 
-----------------
 mix
(1 row)

2 操作示例

2.1 语法说明

在执行ANALYZE时，如果没有指定参数，ANALYZE会分析当前数据库中的每个表和分区表。同时也可以通过指定table_name、column和partition_name参数把分析限定在特定的表、列或分区表中。

也可以执行ANALYZE|ANALYSE VERIFY用于检测数据库中普通表（行存表、列存表）的数据文件是否损坏。

收集表的统计信息：

{ ANALYZE | ANALYSE } [ VERBOSE ]
    [ table_name [ ( column_name [, ...] ) ] ];

收集分区表的统计信息：

{ ANALYZE | ANALYSE } [ VERBOSE ]
    [ table_name [ ( column_name [, ...] ) ] ]
    PARTITION ( patrition_name ) ;

普通分区表目前支持针对某个分区的统计信息的语法，但功能上不支持针对某个分区的统计信息收集。

收集多列统计信息：

{ANALYZE | ANALYSE} [ VERBOSE ]
    table_name (( column_1_name, column_2_name [, ...] ));

收集多列统计信息时，需要设置GUC参数default_statistics_target为负数，以使用百分比采样方式。

openGauss=# show default_statistics_target;
 default_statistics_target 
---------------------------
 100
(1 row)

另外需要注意：如果关闭GUC参数enable_functional_dependency，每组多列统计信息最多支持32列；如果开启GUC参数enable_functional_dependency，每组多列统计信息最多支持4列。http://www.cndba.cn/dave/article/116556

openGauss=# show enable_functional_dependency;
 enable_functional_dependency 
------------------------------
 off
(1 row)

不支持收集多列统计信息的表：系统表。

检测当前库的数据文件：

{ANALYZE | ANALYSE} VERIFY {FAST|COMPLETE};

Fast模式校验时，需要对校验的表有并发的DML操作，会导致校验过程中有误报的问题，因为当前Fast模式是直接从磁盘上读取，并发有其他线程修改文件时，会导致获取的数据不准确，建议离线操作。http://www.cndba.cn/dave/article/116556

支持对全库进行操作，由于涉及的表较多，建议以重定向保存结果：

gsql -d database -p port -f “verify.sql”> verify_warning.txt 2>&1。
http://www.cndba.cn/dave/article/116556

对于全库操作时，当关键系统表出现损坏则直接报错，不再继续执行。

检测表和索引的数据文件：

{ANALYZE | ANALYSE} VERIFY {FAST|COMPLETE} table_name|index_name [CASCADE];

支持对普通表的操作和索引表的操作，但不支持对索引表index使用CASCADE操作。原因是由于CASCADE模式用于处理主表的所有索引表，当单独对索引表进行检测时，无需使用CASCADE模式。http://www.cndba.cn/dave/article/116556

对于主表的检测会同步检测主表的内部表，例如toast表、cudesc表等。

检测表分区的数据文件：

{ANALYZE | ANALYSE} VERIFY {FAST|COMPLETE} table_name PARTITION {(patrition_name)}[CASCADE];
支持对表的单独分区进行检测操作，但不支持对索引表index使用CASCADE操作。http://www.cndba.cn/dave/article/116556

2.2 操作示例

2.2.1 分析单表

openGauss=# analyze cndba;
ANALYZE
openGauss=# analyze verbose cndba;
INFO:  analyzing "public.cndba"(dn_6001_6002_6003 pid=3297)
INFO:  ANALYZE INFO : "cndba": scanned 1 of 1 pages, containing 2 live rows and 0 dead rows; 2 rows in sample, 2 estimated total rows(dn_6001_6002_6003 pid=3297)
ANALYZE
openGauss=# 

openGauss=# select oid,relname,relfilenode,reltuples,relpages from pg_class where relname='cndba';
  oid  | relname | relfilenode | reltuples | relpages 
-------+---------+-------------+-----------+----------
 25117 | cndba   |       25117 |         2 |        1
(1 row)

openGauss=# /x
Expanded display is on.
openGauss=# select * from pg_stats where tablename='cndba';
-[ RECORD 1 ]----------+-----------------------
schemaname             | public
tablename              | cndba
attname                | id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | -1
n_dndistinct           | 0
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {1,2}
correlation            | 1
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
-[ RECORD 2 ]----------+-----------------------
schemaname             | public
tablename              | cndba
attname                | website
inherited              | f
null_frac              | 0
avg_width              | 21
n_distinct             | -.5
n_dndistinct           | 0
most_common_vals       | {https://www.cndba.cn}
most_common_freqs      | {1}
histogram_bounds       | 
correlation            | 1
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 

openGauss=#

2.2.2 分析所有表

不加参数，表示分析所有表：

openGauss=# analyze verbose;
INFO:  analyzing "pg_catalog.pg_type"(dn_6001_6002_6003 pid=3297)
INFO:  ANALYZE INFO : "pg_type": scanned 21 of 21 pages, containing 897 live rows and 6 dead rows; 897 rows in sample, 897 estimated total rows(dn_6001_6002_6003 pid=3297)
INFO:  analyzing "pg_catalog.gs_column_keys_args"(dn_6001_6002_6003 pid=3297)
INFO:  ANALYZE INFO : "gs_column_keys_args": scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows(dn_6001_6002_6003 pid=3297)
……

2.2.3 分析多列

openGauss=# analyze verbose cndba(id,website);
INFO:  analyzing "public.cndba"(dn_6001_6002_6003 pid=3297)
INFO:  ANALYZE INFO : "cndba": scanned 1 of 1 pages, containing 2 live rows and 0 dead rows; 2 rows in sample, 2 estimated total rows(dn_6001_6002_6003 pid=3297)
ANALYZE
openGauss=#

2.2.4 分析某个分区

openGauss=# analyze customer_par partition(p1);
ANALYZE
openGauss=# analyze verbose customer_par partition(p1);
ANALYZE
openGauss=#

2.2.5 检测表和索引的数据文件

openGauss=# analyze verify complete cndba cascade;
ANALYZE VERIFY
openGauss=# analyze verify fast cndba cascade;
ANALYZE VERIFY
openGauss=#

签到成功

CNDBA社区