一套DM8的 DMWatch 集群,有大量的Get svr info time used 告警,如下:
[dmdba@www.cndba.cn log]$ pwd
/data/dm/dmdbms/log
[dmdba@www.cndba.cn log]$ ls
DmAPService.log dm_dmap_202302.log dm_dmwatcher_DM1_202302.log dmsvc_sh.log install_ant.log
dm_BAKRES_202302.log dm_dmap_br_202302.log dm_SBTTRACE_202302.log dm_unknown_202302.log install.log
dm_DM1_202302.log dm_dmrman_202302.log DmServicedm1.log DmWatcherServicedm1.log
[dmdba@www.cndba.cn log]$ tail -20 dm_DM1_202302.log
2023-02-22 09:58:58.261 [WARNING] database P0000024216 T0000000000000024259 Get svr info time used:4
2023-02-22 10:00:37.520 [WARNING] database P0000024216 T0000000000000024259 Get svr info time used:3
2023-02-22 10:01:02.131 [WARNING] database P0000024216 T0000000000000024259 Get svr info time used:3
2023-02-22 10:01:05.018 [WARNING] database P0000024216 T0000000000000024259 Get svr info time used:3
2023-02-22 10:01:14.072 [INFO] database P0000024216 T0000000000000024259 utsk_get_dw_svr_info used 5 seconds
2023-02-22 10:01:14.072 [WARNING] database P0000024216 T0000000000000024259 Get svr info time used:5
2023-02-22 10:01:25.116 [INFO] database P0000024216 T0000000000000024259 utsk_get_dw_svr_info used 4 seconds
2023-02-22 10:01:25.116 [WARNING] database P0000024216 T0000000000000024259 Get svr info time used:4
2023-02-22 10:01:28.418 [WARNING] database P0000024216 T0000000000000024259 Get svr info time used:3
2023-02-22 10:01:43.656 [WARNING] database P0000024216 T0000000000000024259 Get svr info time used:3
2023-02-22 10:02:45.812 [INFO] database P0000024216 T0000000000000024297 checkpoint requested by CKPT_INTERVAL, rlog free space[4294084608], used space[874496]
2023-02-22 10:02:45.812 [INFO] database P0000024216 T0000000000000024297 checkpoint generate by ckpt_interval
2023-02-22 10:02:45.812 [INFO] database P0000024216 T0000000000000024252 checkpoint begin, used_space[874496], free_space[4294084608]...
[dmdba@www.cndba.cn log]$
该告警表示守护进程监听服务有延时。
按照官方手册,DMWatch 集群建议采用两张网卡,一张跑数据,一张MAL 内部通信。 如果是一张网卡可以设置如下2个参数为30,如果是2张网卡,可以设置为10. 一般在虚拟机或者磁盘IO比较差的情况下出现的比较多。
DW_ERROR_TIME = 30
INST_ERROR_TIME = 30
修改之后,重启watcher进程:
[root@www.cndba.cn ~]# vim /data/dm/dmdata/dm/dmwatcher.ini
[dmdba@www.cndba.cn log]$ cat /data/dm/dmdata/dm/dmwatcher.ini
[GRP1]
DW_TYPE = GLOBAL #全局守护类型
DW_MODE = MANUAL #自动切换模式
DW_ERROR_TIME = 30 #远程守护进程故障认定时间
INST_RECOVER_TIME = 60 #主库守护进程启动恢复的间隔时间
INST_ERROR_TIME = 30 #本地实例故障认定时间
INST_OGUID = 453331 #守护系统唯一 OGUID 值
INST_INI = /data/dm/dmdata/dm/dm.ini #dm.ini 配置文件路径
INST_AUTO_RESTART = 1 #打开实例的自动启动功能
INST_STARTUP_CMD = /data/dm/dmdbms/bin/dmserver #命令行方式启动
RLOG_SEND_THRESHOLD = 0 #指定主库发送日志到备库的时间阀值,默认关闭
RLOG_APPLY_THRESHOLD = 0
[dmdba@www.cndba.cn log]$
[root@www.cndba.cn ~]# systemctl restart DmWatcherServicedm2.service
[root@www.cndba.cn ~]#
版权声明:本文为博主原创文章,未经博主允许不得转载。