签到成功

知道了

CNDBA社区CNDBA社区

GreenplumDB 迁移报错解决

2018-12-07 11:05 6204 1 原创 PostgreSQL
作者: Marvinn

GreenplumDB 迁移报错http://www.cndba.cn/Marvinn/article/3191

http://www.cndba.cn/Marvinn/article/3191
http://www.cndba.cn/Marvinn/article/3191http://www.cndba.cn/Marvinn/article/3191

2018.12.06晚,帮助客户迁移GreenplumDB,由于不了解之前的环境如何,所以不采用GP自带的gp_dump、gp_restore并行备份以及恢复,而采用pg_dump postgres备份方式备份,pg_restore恢复,但是还是出现了报错,以下是迁移报错解决:

数据全部在某个schema下,所以-n 参数指定schema,目标端环境、数据库、用户、schema已提前创建好….

http://www.cndba.cn/Marvinn/article/3191

备份命令:http://www.cndba.cn/Marvinn/article/3191

$ pg_dump marvin -h 172.16.10.21 -U marvin -p 5432 -W -Fc -n marvin -b --gp-syntax >>marvin.dump
Password: 
pg_dump: query to obtain distribution policy of table "food_type" returned no data

报错如上:主要因为Greenplum的元数据错误也会影响到数据备份过程,在使用pg_dump进行数据结构备份时,会由于分布策略的缺失导致备份失败,本文介绍此错误的解决办法,参考博客链接:https://blog.csdn.net/cloudguru/article/details/45718535,但是该解决方案只能少数表重建。并按照GP语法带有分布键才可以,而我通过以下SQL查询,
pg_dump是个讨厌的家伙,它一遇到错误就停止,解决了它才会继续向下,如果再遇到错误再停止,为了避免此类型的错误继续出现,可使用如下命令查询ns1中没有分布键值的表,此输出中可能包含视图,挑选需要处理的用户表按照上面的步骤进行处理:

psql databasename -c "select oid,relnamespace from pg_class where relname='table1';"

databasename=# select relname from pg_class where relnamespace=5678 and oid not in (select localoid from gp_distribution_policy);

发现数据库中所有表都缺乏分布键策略,可能跟之前管理创建表时未指定分布键导致的,所以手动重建所有表不现实,所以发现另外一种解决方案,就是pg_dump时不适用gp-syntax语法,这样导出的就是postgres语法,后续在导入gp是采用随机分布策略,它自动分布,一般以第一个字段为分布键,这样会造成源库与目标端分布键不一致,并且主键无法建立,详细看后续.

参数--gp-syntax 默认是持有的,使用下面的命令备份即可:
$ pg_dump unidbc -h 10.17.4.81 -U admin -p 5432 -W -Fc -n zygx -b --no-gp-syntax >>marvin.dump

恢复命令http://www.cndba.cn/Marvinn/article/3191

$ pg_restore -d marvin -h 172.16.10.21 -U marvin -p 5432 -W -Fc -n marvin marvin.dump
或者
$ pg_restore -d marvin -h 172.16.10.21 -U marvin -p 5432 -W -Fc  marvin1.dump

恢复过程报错

pg_restore: [archiver (db)] Error from TOC entry 2100; 1259 46034 TABLE zjfda2017062601t02 admin
pg_restore: [archiver (db)] could not execute query: ERROR:  role "admin" does not exist
    Command was: ALTER TABLE zygx.zjfda2017062601t02 OWNER TO admin;
pg_restore: [archiver (db)] Error from TOC entry 2101; 1259 46062 TABLE zjfda2017081801t01 admin
pg_restore: [archiver (db)] could not execute query: ERROR:  role "admin" does not exist
    Command was: ALTER TABLE zygx.zjfda2017081801t01 OWNER TO admin;
pg_restore: [archiver (db)] Error from TOC entry 2102; 1259 46090 TABLE zjfda2018040901t01 admin
pg_restore: [archiver (db)] could not execute query: ERROR:  role "admin" does not exist
    Command was: ALTER TABLE zygx.zjfda2018040901t01 OWNER TO admin;
pg_restore: [archiver (db)] Error from TOC entry 2085; 1259 44894 TABLE zl_log admin
pg_restore: [archiver (db)] could not execute query: ERROR:  role "admin" does not exist
    Command was: ALTER TABLE zygx.zl_log OWNER TO admin;
pg_restore: [archiver (db)] Error from TOC entry 2086; 1259 44940 SEQUENCE zl_log_logid_seq admin
pg_restore: [archiver (db)] could not execute query: ERROR:  role "admin" does not exist
    Command was: ALTER TABLE zygx.zl_log_logid_seq OWNER TO admin;

该类型报错可以忽略,主要是因为当前并不存在admin用户,而源库是使用该数据库用户,当前使用marvin用户,pg数据库采用当前数据库用户连接并导入到当前用户,再修改数据字典到其他用户,所以若用户不存在就会报错,但是表还是存在marvin用户下

接下来类似报错:涉及到该表主键建立不上问题(源库所有表都报这个错,以下截取部分)
.................................................
pg_restore: [archiver (db)] Error from TOC entry 3200; 0 55288 TABLE DATA map_arealist admin
pg_restore: [archiver (db)] COPY failed: ERROR:  duplicate key violates unique constraint "map_arealist_arealist_id_key"  (seg0 gpdb1:40000 pid=10185)
CONTEXT:  COPY map_arealist, line 1: "846       330211002210    清水浦村委会    /N      /N      /N"

pg_restore: [archiver (db)] Error from TOC entry 2670; 2606 32733 CONSTRAINT an_alterstockinfo_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_alterstockinfo"
    Command was: ALTER TABLE ONLY an_alterstockinfo
    ADD CONSTRAINT an_alterstockinfo_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2673; 2606 32852 CONSTRAINT an_asset_send_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_asset_send"
    Command was: ALTER TABLE ONLY an_asset_send
    ADD CONSTRAINT an_asset_send_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2676; 2606 32988 CONSTRAINT an_baseinfo_send_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_baseinfo_send"
    Command was: ALTER TABLE ONLY an_baseinfo_send
    ADD CONSTRAINT an_baseinfo_send_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2679; 2606 33109 CONSTRAINT an_contact_send_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_contact_send"
    Command was: ALTER TABLE ONLY an_contact_send
    ADD CONSTRAINT an_contact_send_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2682; 2606 33230 CONSTRAINT an_forguaranteeinfo_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_forguaranteeinfo"
    Command was: ALTER TABLE ONLY an_forguaranteeinfo
    ADD CONSTRAINT an_forguaranteeinfo_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2685; 2606 33351 CONSTRAINT an_forinvestment_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_forinvestment"
    Command was: ALTER TABLE ONLY an_forinvestment
    ADD CONSTRAINT an_forinvestment_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2688; 2606 33476 CONSTRAINT an_pb_baseinfo_send_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_pb_baseinfo_send"
    Command was: ALTER TABLE ONLY an_pb_baseinfo_send
    ADD CONSTRAINT an_pb_baseinfo_send_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2691; 2606 33597 CONSTRAINT an_pb_licenceinfo_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_pb_licenceinfo"
    Command was: ALTER TABLE ONLY an_pb_licenceinfo
    ADD CONSTRAINT an_pb_licenceinfo_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2694; 2606 33718 CONSTRAINT an_pb_websiteinfo_send_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_pb_websiteinfo_send"
    Command was: ALTER TABLE ONLY an_pb_websiteinfo_send
    ADD CONSTRAINT an_pb_websiteinfo_send_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2697; 2606 33843 CONSTRAINT an_sfc_baseinfo_send_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_sfc_baseinfo_send"
    Command was: ALTER TABLE ONLY an_sfc_baseinfo_send
    ADD CONSTRAINT an_sfc_baseinfo_send_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2700; 2606 33964 CONSTRAINT an_sfc_branchinfo_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_sfc_branchinfo"
    Command was: ALTER TABLE ONLY an_sfc_branchinfo
    ADD CONSTRAINT an_sfc_branchinfo_pkey PRIMARY KEY (nbxfxh);

这个原因上面备份有说过,主要因为后续在导入gp是采用随机分布策略,它自动分布,一般以第一个字段为分布键,这样会造成源库与目标端分布键不一致,并且主键无法建立,报错:
 ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_pb_websiteinfo_send"

 这是因但当前存在分布键,而且随机分布的分布键不是主键,所以在添加主键的时候报错,除非添加联合主键即主键内带有分布键字段即可,但是这显然是不行的,因为跟源库不一致,故采用以下方式解决

报错解决http://www.cndba.cn/Marvinn/article/3191http://www.cndba.cn/Marvinn/article/3191http://www.cndba.cn/Marvinn/article/3191

1、--gp 查询数据库所有表名 

select relname as TABLE_NAME ,col_description(c.oid, 0) as COMMENTS from pg_class c
where relkind = 'r' and relname not like 'pg_%' and relname not like 'gp_%'  and relname not like 'sql_%' order by relname

2、--查看某模式下所有表的分布键信息,并且生成批量修改分布键脚本  (源库执行)

Greenplum是关系型的分布式数据库,需要存储的数据库在进入数据库时,将先进行数据分布的处理工作,讲一个表的数据平均分不到每个节点上,并为每个表指定一个分发列(distribute Column),之后根据Hash来分布数据。

查询某模式下所有表的分布键信息,适用于:

  1、排查是否有分布键创建不合理的表,如果分布键是重复率比较高的字段会造成数据分布不均匀,存储过于倾斜。
    2、排查经常做表关联的表是否是相同分布键,这样会提高执行效率。

SELECT
aaa.nspname AS "模式名",
aaa.table_comment AS "中文表明",
'alter table '|| aaa.relname || ' set distributed by(' || ccc.attname ||');' AS Update_Distinct
FROM
 (
  SELECT   aa.oid,
    obj_description (aa.oid) AS table_comment,
    aa.relname,
    bb.localoid,
    bb.attrnums,
    regexp_split_to_table (
     array_to_string (bb.attrnums, ','),
     ','  
  ) att,
    dd.nspname   FROM   pg_class aa --原数据信息 最重要的表!
  LEFT
 JOIN gp_distribution_policy bb ON bb.localoid = aa.oid --分布键表
LEFT
 JOIN pg_namespace dd ON dd.oid = aa.relnamespace --模式
LEFT
 JOIN pg_inherits hh ON aa.oid = hh.inhrelid --继承表 
WHERE   dd.nspname = 'dim'  -- 替换成需要的模式
AND hh.inhrelid IS NULL 
 ) aaa
LEFT JOIN pg_attribute ccc ON ccc.attrelid = aaa.oid
AND ccc.attnum = aaa.att
WHERE
ccc.attnum > 0
ORDER BY
aaa.relname;

3、目标端批量执行该批量修改分布键脚本

4、再将上面所有报错信息,单独提取ALTER TABLE命令增加主键即可,有些是本身就是的报错即可忽略
pg_restore: [archiver (db)] Error from TOC entry 2691; 2606 33597 CONSTRAINT an_pb_licenceinfo_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_pb_licenceinfo"
    Command was: ALTER TABLE ONLY an_pb_licenceinfo
    ADD CONSTRAINT an_pb_licenceinfo_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2694; 2606 33718 CONSTRAINT an_pb_websiteinfo_send_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_pb_websiteinfo_send"
    Command was: ALTER TABLE ONLY an_pb_websiteinfo_send
    ADD CONSTRAINT an_pb_websiteinfo_send_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2697; 2606 33843 CONSTRAINT an_sfc_baseinfo_send_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_sfc_baseinfo_send"
    Command was: ALTER TABLE ONLY an_sfc_baseinfo_send
    ADD CONSTRAINT an_sfc_baseinfo_send_pkey PRIMARY KEY (nbxfxh);
pg_restore: [archiver (db)] Error from TOC entry 2700; 2606 33964 CONSTRAINT an_sfc_branchinfo_pkey admin
pg_restore: [archiver (db)] could not execute query: ERROR:  PRIMARY KEY must contain all columns in the distribution key of relation "an_sfc_branchinfo"
    Command was: ALTER TABLE ONLY an_sfc_branchinfo
    ADD CONSTRAINT an_sfc_branchinfo_pkey PRIMARY KEY (nbxfxh);


提取执行命令:
ALTER TABLE ONLY an_pb_licenceinfo
    ADD CONSTRAINT an_pb_licenceinfo_pkey PRIMARY KEY (nbxfxh);
ALTER TABLE ONLY an_pb_websiteinfo_send
    ADD CONSTRAINT an_pb_websiteinfo_send_pkey PRIMARY KEY (nbxfxh);
ALTER TABLE ONLY an_sfc_baseinfo_send
    ADD CONSTRAINT an_sfc_baseinfo_send_pkey PRIMARY KEY (nbxfxh);
 ALTER TABLE ONLY an_sfc_branchinfo
    ADD CONSTRAINT an_sfc_branchinfo_pkey PRIMARY KEY (nbxfxh);

5、最后在目标端迁移库中批量执行提取的ALTER TABLE创建主键脚本即可

6、数据核对(源库与目标端schema空间占用可能会有些差距,因为涉及到表碎片以及回收处理)

--批量生成表数据核对脚本
select 'select count(*) from  '|| tablename ||';' from pg_tables where tableowner='dbcuser';

--查看源库数据库各Schema占用空间
select pg_size_pretty(cast( sum(pg_relation_size( schemaname  || '.' || tablename)) as bigint)), schemaname
from pg_tables t inner join pg_namespace d on t.schemaname=d.nspname  group by schemaname;

最后,Greenplum迁移完成,后续通知业务应用进行验证.

版权声明:本文为博主原创文章,未经博主允许不得转载。

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
Marvinn

Marvinn

关注

路漫漫其修远兮、吾将上下而求索

  • 99
    原创
  • 0
    翻译
  • 2
    转载
  • 36
    评论
  • 访问:458451次
  • 积分:449
  • 等级:中级会员
  • 排名:第12名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ