签到成功

知道了

CNDBA社区CNDBA社区

TiDB 数据校验工具sync-diff-inspector

2019-03-05 17:52 5646 2 原创 TiDB
作者: Marvinn

sync-diff-inspector 简介
sync-diff-inspector 是一个用于校验 MySQL/TiDB 中两份数据是否一致的工具,该工具提供了修复数据的功能(适用于修复少量不一致的数据)。
主要功能:
对比表结构和数据
如果数据不一致,则生成用于修复数据的 SQL
支持多个表的数据与单个表数据的比较(针对分库分表同步数据到总表的场景)
支持不同库名/表名的数据的比较

http://www.cndba.cn/Marvinn/article/3323

工具下载以及使用说明参考链接:https://www.pingcap.com/docs-cn/tools/sync-diff-inspector/http://www.cndba.cn/Marvinn/article/3323http://www.cndba.cn/Marvinn/article/3323

当前测试环境为 TiDB - Mysql(单库),对比需求: 对比 TiDB 以及 MySQL 中marvin数据库中所有表的数据是否一致,marvin数据库中存在三张表:
mysql> show tables;
+—————————+
| Tables_in_marvin |
+—————————+
| t1 |
| t2 |
| tidb |
+—————————+

其中表t1、tidb两边数据是不一致的http://www.cndba.cn/Marvinn/article/3323

源端:http://www.cndba.cn/Marvinn/article/3323

mysql> select * from tidb;
+----+---------+
| id | name    |
+----+---------+
|  1 | PingCAP |
|  2 | dd      |
|  3 | FF      |
|  4 | ggg     |
|  5 | ggg     |
|  6 | test    |
+----+---------+
6 rows in set (0.00 sec)

mysql> select * from t2;
+----+------+
| id | name |
+----+------+
|  1 | FF   |
|  2 | FF   |
+----+------+
2 rows in set (0.01 sec)

mysql> select * from t1;
+----+------+
| id | name |
+----+------+
|  1 | DD   |
|  2 | GG   |
+----+------+
2 rows in set (0.00 sec)

目标端:


mysql> select * from tidb;
+----+---------+
| id | name    |
+----+---------+
|  1 | PingCAP |
|  2 | dd      |
|  5 | ggg     |
|  6 | test    |
+----+---------+
4 rows in set (0.00 sec)

mysql> select * from t2;
+----+------+
| id | name |
+----+------+
|  1 | FF   |
|  2 | FF   |
+----+------+
2 rows in set (0.00 sec)

mysql> select * from t1;
+----+------+
| id | name |
+----+------+
|  1 | DD   |
+----+------+
1 row in set (0.00 sec)


对比源端目标端记录:目标端比源端缺少表tidb记录(3 | FF )、( 4 | ggg ) 表t1 缺少(2 | GG)

配置文件config.toml如下:
[tidb@ip-172-16-30-86 bin]$ cat config.toml

# diff Configuration.

# 日志级别,可以设置为 info、debug
log-level = "debug"

# sync-diff-inspector 根据主键/唯一键/索引将数据划分为多个 chunk,
# 对每一个 chunk 的数据进行对比。使用 chunk-size 设置 chunk 的大小
chunk-size = 1000

# 检查数据的线程数量
check-thread-count = 4

# 抽样检查的比例,如果设置为 100 则检查全部数据
sample-percent = 100

# 通过计算 chunk 的 checksum 来对比数据,如果不开启则逐行对比数据
use-checksum = true

# 不对比数据
ignore-data-check = false

# 不对比表结构
ignore-struct-check = false

# 保存用于修复数据的 sql 的文件名称
fix-sql-file = "fix.sql"

# 如果需要使用 TiDB 的统计信息划分 chunk,需要设置 tidb-instance-id,值为 source-db 或者 target-db 中配置的 instance-id 的值
# tidb-instance-id = "target-1"

# 如果需要对比大量的不同库名或者表名的表的数据,可以通过 table-rule 来设置映射关系。可以只配置 schema 或者 table 的映射关系,也可以都配置
[[table-rules]]
# schema-pattern 和 table-pattern 支持正则表达式
schema-pattern = "marvin"
#table-pattern = "test"
target-schema = "marvin"
#target-table = "test"

# 配置需要对比的目标数据库中的表
[[check-tables]]
# 目标库中数据库的名称
schema = "marvin"

# 需要检查的表
#tables = ["test1", "test2", "test3"]

# 支持使用正则表达式配置检查的表,需要以‘~’开始,
# 下面的配置会检查所有表名以‘test’为前缀的表
# tables = ["~^test.*"]
# 下面的配置会检查配置库中所有的表
tables = ["~^"]

# 对部分表进行特殊的配置,配置的表必须包含在 check-tables 中
[[table-config]]
# 目标库中数据库的名称
schema = "marvin"

# 表名
table = "t1"

# 指定用于划分 chunk 的列,如果不配置该项,sync-diff-inspector 会选取一个合适的列(主键/唯一键/索引)
#index-field = "id"

# 指定检查的数据的范围,需要符合 sql 中 where 条件的语法
#range = "age > 10 AND age < 20"

# 如果是对比多个分表与总表的数据,则设置为 true
is-sharding = false

# 在某些情况下字符类型的数据的排序会不一致,通过指定 collation 来保证排序的一致,
# 需要与数据库中 charset 的设置相对应
# collation = "latin1_bin"

# 忽略某些列的检查,但是这些列仍然可以用于划分 chunk、对检查的数据进行排序
# ignore-columns = ["name"]

# 移除某些列,检查时会将这些列从表结构中移除,既不会检查这些列的数据,
# 也不会用这些列做 chunk 的划分,或者用于对数据进行排序
# remove-columns = ["name"]

# 下面是一个对比不同库名和表名的两个表的配置示例
#[[table-config]]
# 目标库名
#schema = "marvin"

# 目标表名
#table = "test2"

# 非分库分表场景,设置为 false
#is-sharding = false

# 源数据的配置
[[table-config.source-tables]]
# 源库的实例 id
instance-id = "source-1"
# 源数据库的名称
schema = "marvin"
# 源表的名称
table = "t1"

# 源数据库实例的配置
[[source-db]]
host = "172.16.30.86"
port = 5000
user = "root"
password = "123456"
# 源数据库实例的 id,唯一标识一个数据库实例
instance-id = "source-1"
# 使用 TiDB 的 snapshot 功能,如果开启的话会使用历史数据进行对比
# snapshot = "2016-10-08 16:45:26"

# 目标数据库实例的配置
[target-db]
host = "172.16.30.89"
port = 3308
user = "root"
password = "123456"
# 使用 TiDB 的 snapshot 功能,如果开启的话会使用历史数据进行对比
# snapshot = "2016-10-08 16:45:26"

对于核对整个单库所有表数据是否一致需要注意以下几点:http://www.cndba.cn/Marvinn/article/3323

1. 选择目标端以及源端对应的库,表不选,schema-pattern 表示源端,target-schema 表示目标端
# 如果需要对比大量的不同库名或者表名的表的数据,可以通过 table-rule 来设置映射关系。可以只配置 schema 或者 table 的映射关系,也可以都配置
[[table-rules]]
# schema-pattern 和 table-pattern 支持正则表达式
schema-pattern = "marvin"
#table-pattern = "test"
target-schema = "marvin"
#target-table = "test"


2. check_tables必须填,配置会检查配置库中所有的表
# 配置需要对比的目标数据库中的表
[[check-tables]]
# 目标库中数据库的名称
schema = "marvin"

# 需要检查的表
#tables = ["test1", "test2", "test3"]

# 支持使用正则表达式配置检查的表,需要以‘~’开始,
# 下面的配置会检查所有表名以‘test’为前缀的表
# tables = ["~^test.*"]
# 下面的配置会检查配置库中所有的表
tables = ["~^"]


3. table_config必须填(填一个就好,多个针对于分库分表),并且table与souece db配置中的table 名需要一致(尝试过不一致,即table_config中随便写目标端存在的表名,源端source db中table随便写存在的表名,结果fix.sql结果不正确,存在问题)
[[table-config]]
# 目标库中数据库的名称
schema = "marvin"

# 表名
table = "t1"


4. 源端配置instance-id必须填,名可以随便写,table 对应table_config中的table
# 源数据的配置
[[table-config.source-tables]]
# 源库的实例 id
instance-id = "source-1"
# 源数据库的名称
schema = "marvin"
# 源表的名称
table = "t1"

运行 sync-diff-inspector
执行如下命令:

$ ./bin/sync_diff_inspector --config=./config.toml

该命令最终会在日志中输出一个检查报告,说明每个表的检查情况。如果数据存在不一致的情况,sync-diff-inspector 会生成 SQL 修复不一致的数据,并将这些 SQL 语句保存到 fix.sql 文件中。

http://www.cndba.cn/Marvinn/article/3323

fix.sql如下:

http://www.cndba.cn/Marvinn/article/3323

[tidb@ip-172-16-30-86 bin]$ cat fix.sql
REPLACE INTO `marvin`.`tidb`(`id`,`name`) VALUES (3,'FF');

REPLACE INTO `marvin`.`tidb`(`id`,`name`) VALUES (4,'ggg');

REPLACE INTO `marvin`.`t1`(`id`,`name`) VALUES (2,'GG');


发现跟之前总结缺失的marvin数据库中tidb以及t1表数据一致(数据插入使用replace,存在相同记录删除在插入)

日志输出如下:http://www.cndba.cn/Marvinn/article/3323http://www.cndba.cn/Marvinn/article/3323

[tidb@ip-172-16-30-86 bin]$ ./sync_diff_inspector -config=./config.toml
DEBU[0000] chunks: [{begin:1 end:6 containBegin:true containEnd:true noBegin:false noEnd:false} {begin:{} end:1 containBegin:false containEnd:false noBegin:true noEnd:false} {begin:6 end:{} containBegin:false containEnd:false noBegin:false noEnd:true}] 
DEBU[0000] marvin.tidb create dump job, where: (`id` >= ? AND `id` <= ? AND TRUE), begin: 1, end: 6 
DEBU[0000] marvin.tidb create dump job, where: (TRUE AND `id` < ? AND TRUE), begin: {}, end: 1 
DEBU[0000] marvin.tidb create dump job, where: (`id` > ? AND TRUE AND TRUE), begin: 6, end: {} 
2019/03/05 14:35:42 diff.go:178: [info] total has 3 check jobs, check 3 of them
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`tidb` WHERE (`id` >= ? AND `id` <= ? AND TRUE);, args: [1 6] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`tidb` WHERE (TRUE AND `id` < ? AND TRUE);, args: [1] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`tidb` WHERE (`id` > ? AND TRUE AND TRUE);, args: [6] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`tidb` WHERE (TRUE AND `id` < ? AND TRUE);, args: [1] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`tidb` WHERE (`id` > ? AND TRUE AND TRUE);, args: [6] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`tidb` WHERE (`id` >= ? AND `id` <= ? AND TRUE);, args: [1 6] 
2019/03/05 14:35:42 diff.go:254: [info] table: tidb, range: (TRUE AND `id` < ? AND TRUE), args: [1], checksum is equal, checksum: 0
2019/03/05 14:35:42 diff.go:258: [error] table: tidb, range: (`id` >= ? AND `id` <= ? AND TRUE), args: [1 6], checksum is not equal, one is 1225055177, another is 2699779646
2019/03/05 14:35:42 diff.go:254: [info] table: tidb, range: (`id` > ? AND TRUE AND TRUE), args: [6], checksum is equal, checksum: 0
2019/03/05 14:35:42 diff.go:491: [error] find difference data in column id, data1: map[id:3 name:FF], data2: map[id:5 name:ggg]
2019/03/05 14:35:42 diff.go:393: [info] [insert] sql: REPLACE INTO `marvin`.`tidb`(`id`,`name`) VALUES (3,'FF');
2019/03/05 14:35:42 diff.go:491: [error] find difference data in column id, data1: map[id:4 name:ggg], data2: map[id:5 name:ggg]
2019/03/05 14:35:42 diff.go:393: [info] [insert] sql: REPLACE INTO `marvin`.`tidb`(`id`,`name`) VALUES (4,'ggg');
DEBU[0000] chunks: [{begin:1 end:1 containBegin:true containEnd:true noBegin:false noEnd:false} {begin:{} end:1 containBegin:false containEnd:false noBegin:true noEnd:false} {begin:1 end:{} containBegin:false containEnd:false noBegin:false noEnd:true}] 
DEBU[0000] marvin.t1 create dump job, where: (`id` >= ? AND `id` <= ? AND TRUE), begin: 1, end: 1 
DEBU[0000] marvin.t1 create dump job, where: (TRUE AND `id` < ? AND TRUE), begin: {}, end: 1 
DEBU[0000] marvin.t1 create dump job, where: (`id` > ? AND TRUE AND TRUE), begin: 1, end: {} 
2019/03/05 14:35:42 diff.go:178: [info] total has 3 check jobs, check 3 of them
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t1` WHERE (TRUE AND `id` < ? AND TRUE);, args: [1] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t1` WHERE (`id` > ? AND TRUE AND TRUE);, args: [1] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t1` WHERE (`id` >= ? AND `id` <= ? AND TRUE);, args: [1 1] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t1` WHERE (TRUE AND `id` < ? AND TRUE);, args: [1] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t1` WHERE (`id` >= ? AND `id` <= ? AND TRUE);, args: [1 1] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t1` WHERE (`id` > ? AND TRUE AND TRUE);, args: [1] 
2019/03/05 14:35:42 diff.go:254: [info] table: t1, range: (TRUE AND `id` < ? AND TRUE), args: [1], checksum is equal, checksum: 0
2019/03/05 14:35:42 diff.go:254: [info] table: t1, range: (`id` >= ? AND `id` <= ? AND TRUE), args: [1 1], checksum is equal, checksum: 2463841517
2019/03/05 14:35:42 diff.go:258: [error] table: t1, range: (`id` > ? AND TRUE AND TRUE), args: [1], checksum is not equal, one is 4129747790, another is 0
2019/03/05 14:35:42 diff.go:365: [info] [insert] sql: REPLACE INTO `marvin`.`t1`(`id`,`name`) VALUES (2,'GG');
DEBU[0000] chunks: [{begin:1 end:2 containBegin:true containEnd:true noBegin:false noEnd:false} {begin:{} end:1 containBegin:false containEnd:false noBegin:true noEnd:false} {begin:2 end:{} containBegin:false containEnd:false noBegin:false noEnd:true}] 
DEBU[0000] marvin.t2 create dump job, where: (`id` >= ? AND `id` <= ? AND TRUE), begin: 1, end: 2 
DEBU[0000] marvin.t2 create dump job, where: (TRUE AND `id` < ? AND TRUE), begin: {}, end: 1 
DEBU[0000] marvin.t2 create dump job, where: (`id` > ? AND TRUE AND TRUE), begin: 2, end: {} 
2019/03/05 14:35:42 diff.go:178: [info] total has 3 check jobs, check 3 of them
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t2` WHERE (`id` >= ? AND `id` <= ? AND TRUE);, args: [1 2] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t2` WHERE (`id` > ? AND TRUE AND TRUE);, args: [2] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t2` WHERE (TRUE AND `id` < ? AND TRUE);, args: [1] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t2` WHERE (TRUE AND `id` < ? AND TRUE);, args: [1] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t2` WHERE (`id` > ? AND TRUE AND TRUE);, args: [2] 
DEBU[0000] checksum sql: SELECT BIT_XOR(CAST(CRC32(CONCAT_WS(',', `id`, `name`, CONCAT(ISNULL(`id`), ISNULL(`name`))))AS UNSIGNED)) AS checksum FROM `marvin`.`t2` WHERE (`id` >= ? AND `id` <= ? AND TRUE);, args: [1 2] 
2019/03/05 14:35:42 diff.go:254: [info] table: t2, range: (`id` > ? AND TRUE AND TRUE), args: [2], checksum is equal, checksum: 0
2019/03/05 14:35:42 diff.go:254: [info] table: t2, range: (TRUE AND `id` < ? AND TRUE), args: [1], checksum is equal, checksum: 0
2019/03/05 14:35:42 diff.go:254: [info] table: t2, range: (`id` >= ? AND `id` <= ? AND TRUE), args: [1 2], checksum is equal, checksum: 837294749
INFO[0000] 
check result: fail!
1 tables' check passed, 2 tables' check failed.

table: marvin.tidb
table's struct equal
table's data not equal

table: marvin.t1
table's struct equal
table's data not equal

table: marvin.t2
table's struct equal
table's data equal

INFO[0000] check data finished, all cost 23.971543ms    
FATA[0000] sourceDB don't equal targetDB

版权声明:本文为博主原创文章,未经博主允许不得转载。

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
Marvinn

Marvinn

关注

路漫漫其修远兮、吾将上下而求索

  • 99
    原创
  • 0
    翻译
  • 2
    转载
  • 36
    评论
  • 访问:458387次
  • 积分:449
  • 等级:中级会员
  • 排名:第12名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ