在之前的博客我们看了openGauss 主从集群的搭建,如下:
openGauss 5.0 一主两从 复制环境 搭建手册
https://www.cndba.cn/dave/article/116528
本篇我们看下主从集群的维护。
1 查看集群状态
查看集群所有节点:
[dave@www.cndba.cn ~]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 oracle 192.168.56.105 1 /data/openGauss/data/cmserver/cm_server Primary
2 oracle2 192.168.56.106 2 /data/openGauss/data/cmserver/cm_server Standby
3 oracle3 192.168.56.107 3 /data/openGauss/data/cmserver/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
---------------------------------------------------------------------------------
1 oracle 192.168.56.105 6001 /data/openGauss/install/data/dn P Standby Normal
2 oracle2 192.168.56.106 6002 /data/openGauss/install/data/dn S Primary Normal
3 oracle3 192.168.56.107 6003 /data/openGauss/install/data/dn S Standby Normal
[dave@www.cndba.cn ~]$
查看单个节点:
[dave@www.cndba.cn ~]$ gs_om -t status -h oracle
-----------------------------------------------------------------------
cluster_state : Normal
redistributing : No
balanced : No
-----------------------------------------------------------------------
node : 1
node_name : oracle
node : 1
instance_id : 1
node_ip : 192.168.56.105
data_path : /data/openGauss/data/cmserver/cm_server
type : CMServer
instance_state : Primary
node : 1
instance_id : 6001
node_ip : 192.168.56.105
data_path : /data/openGauss/install/data/dn
type : Datanode
instance_state : Standby
dcf_role : FOLLOWER
static_connections : 2
HA_state : Normal
reason : Normal
sender_sent_location : 0/6011EA8
sender_write_location : 0/6011EA8
sender_flush_location : 0/6011EA8
sender_replay_location : 0/6011EA8
receiver_received_location: 0/6011EA8
receiver_write_location : 0/6011EA8
receiver_flush_location : 0/6011EA8
receiver_replay_location : 0/6011E08
sync_state : Async
node : 1
node_name : oracle
node : 1
instance_id : 1
node_ip : 192.168.56.105
data_path : /data/openGauss/data/cmserver/cm_server
type : CMServer
instance_state : Primary
node : 1
node_ip : 192.168.56.105
type : Fenced UDF
state : Normal
-----------------------------------------------------------------------
node_state : Normal
-----------------------------------------------------------------------
2 集群启停
在集群的任一主节点上以omm用户进行操作。
[dave@www.cndba.cn ~]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.
[dave@www.cndba.cn ~]$ gs_om -t start
Starting cluster.
======================================================================
Successfully started primary instance. Wait for standby instance.
======================================================================
.
Successfully started cluster.
======================================================================
cluster_state : Normal
redistributing : No
node_count : 3
Datanode State
primary : 1
standby : 2
secondary : 0
cascade_standby : 0
building : 0
abnormal : 0
down : 0
Successfully started cluster.
[dave@www.cndba.cn ~]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 oracle 192.168.56.105 1 /data/openGauss/data/cmserver/cm_server Primary
2 oracle2 192.168.56.106 2 /data/openGauss/data/cmserver/cm_server Standby
3 oracle3 192.168.56.107 3 /data/openGauss/data/cmserver/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
---------------------------------------------------------------------------------
1 oracle 192.168.56.105 6001 /data/openGauss/install/data/dn P Standby Normal
2 oracle2 192.168.56.106 6002 /data/openGauss/install/data/dn S Primary Normal
3 oracle3 192.168.56.107 6003 /data/openGauss/install/data/dn S Standby Normal
[dave@www.cndba.cn ~]$
3 switchover 切换
先查看集群状态:
[dave@www.cndba.cn ~]$ gs_om -t status --detail
……
cluster_state : Normal
redistributing : No
balanced : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
---------------------------------------------------------------------------------
1 oracle 192.168.56.105 6001 /data/openGauss/install/data/dn P Standby Normal
2 oracle2 192.168.56.106 6002 /data/openGauss/install/data/dn S Primary Normal
3 oracle3 192.168.56.107 6003 /data/openGauss/install/data/dn S Standby Normal
[dave@www.cndba.cn ~]$
我们这里主库是192.168.56.106,我们将192.168.56.105激活成主库,在56.105 上用omm 执行:
[dave@www.cndba.cn ~]$ gs_ctl switchover -D /data/openGauss/install/data/dn
[2023-04-07 17:55:53.995][16727][][gs_ctl]: gs_ctl switchover ,datadir is /data/openGauss/install/data/dn
[2023-04-07 17:55:53.995][16727][][gs_ctl]: switchover term (1)
[2023-04-07 17:55:54.008][16727][][gs_ctl]: waiting for server to switchover........
[2023-04-07 17:55:59.069][16727][][gs_ctl]: done
[2023-04-07 17:55:59.069][16727][][gs_ctl]: switchover completed (/data/openGauss/install/data/dn)
对于同一数据库,上一次主备切换未完成,不能执行下一次切换。当业务正在操作时,发起switchover,可能主机的线程无法停止导致switchover显示超时,实际后台仍然在运行,等主机线程停止后,switchover即可完成。比如在主机删除一个大的分区表时,可能无法响应switchover发起的信号。
switchover或failover成功后,执行如下命令记录当前主备机器信息:
[dave@www.cndba.cn ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.
[dave@www.cndba.cn ~]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 oracle 192.168.56.105 1 /data/openGauss/data/cmserver/cm_server Primary
2 oracle2 192.168.56.106 2 /data/openGauss/data/cmserver/cm_server Standby
3 oracle3 192.168.56.107 3 /data/openGauss/data/cmserver/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
---------------------------------------------------------------------------------
1 oracle 192.168.56.105 6001 /data/openGauss/install/data/dn P Primary Normal
2 oracle2 192.168.56.106 6002 /data/openGauss/install/data/dn S Standby Normal
3 oracle3 192.168.56.107 6003 /data/openGauss/install/data/dn S Standby Normal
[dave@www.cndba.cn ~]$
注意这里有一个小细节,就是在集群正常的情况下,kill gaussdb 进程或者用gs_ctl 关闭主库,都会自动发生switchover,并作为备库自动拉起来:
[dave@www.cndba.cn ~]$ ps -ef|grep openG
omm 4154 1 2 10:54 ? 00:11:15 /data/openGauss/install/app/bin/om_monitor -L /var/log/omm/omm/cm/om_monitor
omm 13943 4154 25 17:50 ? 00:03:07 /data/openGauss/install/app/bin/cm_agent
omm 13963 1 15 17:50 ? 00:01:50 /data/openGauss/install/app/bin/cm_server
omm 21983 1 15 18:02 ? 00:00:05 /data/openGauss/install/app/bin/gaussdb -D /data/openGauss/install/data/dn -M pending
omm 22761 1 0 18:03 ? 00:00:00 python3 /data/openGauss/install/om/script/local/CheckSshAgent.py
omm 22805 4878 0 18:03 pts/1 00:00:00 grep --color=auto openG
[dave@www.cndba.cn ~]$ kill -9 21983
[dave@www.cndba.cn ~]$ gs_ctl stop -D /data/openGauss/install/data/dn
[2023-04-07 18:06:25.733][24541][][gs_ctl]: gs_ctl stopped ,datadir is /data/openGauss/install/data/dn
waiting for server to shut down..... done
server stopped
[dave@www.cndba.cn ~]$
[dave@www.cndba.cn ~]$ ps -ef|grep openG
omm 4154 1 2 10:54 ? 00:11:15 /data/openGauss/install/app/bin/om_monitor -L /var/log/omm/omm/cm/om_monitor
omm 13943 4154 25 17:50 ? 00:03:11 /data/openGauss/install/app/bin/cm_agent
omm 13963 1 15 17:50 ? 00:01:52 /data/openGauss/install/app/bin/cm_server
omm 22968 1 57 18:03 ? 00:00:01 /data/openGauss/install/app/bin/gaussdb -D /data/openGauss/install/data/dn -M pending
omm 22991 4878 0 18:03 pts/1 00:00:00 grep --color=auto openG
[dave@www.cndba.cn ~]$
[dave@www.cndba.cn2 ~]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 oracle 192.168.56.105 1 /data/openGauss/data/cmserver/cm_server Primary
2 oracle2 192.168.56.106 2 /data/openGauss/data/cmserver/cm_server Standby
3 oracle3 192.168.56.107 3 /data/openGauss/data/cmserver/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
---------------------------------------------------------------------------------
1 oracle 192.168.56.105 6001 /data/openGauss/install/data/dn P Standby Normal
2 oracle2 192.168.56.106 6002 /data/openGauss/install/data/dn S Primary Normal
3 oracle3 192.168.56.107 6003 /data/openGauss/install/data/dn S Standby Normal
[dave@www.cndba.cn2 ~]$
4 failover 切换
上节看到的是正常的情况,但如果主机故障时,则需要在备机执行failover命令。
在原主库正常的情况下,执行failover命令,可以正常成功,也会自动恢复高可用。
[dave@www.cndba.cn ~]$ gs_ctl failover -D /data/openGauss/install/data/dn
[2023-04-07 18:37:21.152][9364][][gs_ctl]: gs_ctl failover ,datadir is /data/openGauss/install/data/dn
[2023-04-07 18:37:21.152][9364][][gs_ctl]: failover term (1)
[2023-04-07 18:37:21.163][9364][][gs_ctl]: waiting for server to failover...
.[2023-04-07 18:37:22.193][9364][][gs_ctl]: done
[2023-04-07 18:37:22.193][9364][][gs_ctl]: failover completed (/data/openGauss/install/data/dn)
[dave@www.cndba.cn ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.
[dave@www.cndba.cn ~]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 oracle 192.168.56.105 1 /data/openGauss/data/cmserver/cm_server Standby
2 oracle2 192.168.56.106 2 /data/openGauss/data/cmserver/cm_server Primary
3 oracle3 192.168.56.107 3 /data/openGauss/data/cmserver/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
---------------------------------------------------------------------------------
1 oracle 192.168.56.105 6001 /data/openGauss/install/data/dn P Primary Normal
2 oracle2 192.168.56.106 6002 /data/openGauss/install/data/dn S Standby Normal
3 oracle3 192.168.56.107 6003 /data/openGauss/install/data/dn S Standby Normal
[dave@www.cndba.cn ~]$
在集群正常运行的情况下,切换后对有自动回复主从关系,如果节点是:Standby Need repair(Disconnected)
,不能自动恢复,那么就需要重构该节点。
在需要重建备库实例的节点执行重构命令:
[dave@www.cndba.cn3 ~]$ gs_ctl build -b auto -D /data/openGauss/install/data/dn
5 双主异常处理
如果在切换过程中,因网络故障、磁盘满等原因造成主备实例连接断开,出现双主现象时,可以参考如下步骤处理:
1.查询数据库当前的实例状态:
gs_om -t status —detail
若查询结果显示两个实例的状态都为Primary,这种状态为异常状态。
2.确定降为备机的节点,在节点上执行如下命令关闭服务。
gs_ctl stop -D /home/omm/cluster/dn1/
3.执行以下命令,以standby模式启动备节点。
gs_ctl start -D /home/omm/cluster/dn1/ -M standby
4.保存数据库主备机器信息。
gs_om -t refreshconf
5.查看数据库状态,确认实例状态恢复。
版权声明:本文为博主原创文章,未经博主允许不得转载。