在之前的博客我们看了PG 主从复制的相关概念,如下:
PostgreSQL 主从复制(Standby)相关概念 说明
https://www.cndba.cn/dave/article/116391
本篇我们来看下一主两从同步复制的搭建。
1 环境说明
我们这里搭建一主两从的架构,主库101,从库102 和 103, PG 版本14.6, 操作系统版本Redhat 7.9。
[dave@www.cndba.cn ~]# cat /etc/hosts
127.0.0.1 localhost
192.168.56.101 db1
192.168.56.102 db2
192.168.56.103 db3
[dave@www.cndba.cn ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.9 (Maipo)
[dave@www.cndba.cn ~]#
[dave@www.cndba.cn data]# su - postgres
Last login: Tue Dec 13 18:46:46 CST 2022 on pts/2
-bash-4.2$ psql -V
psql (PostgreSQL) 14.6
-bash-4.2$
2 主库操作
2.1 安装并初始化实例
直接参考之前的博客,这里不再单独描述。
PostgreSQL 学习笔记(3) — PG 单实例安装手册
https://www.cndba.cn/dave/article/116374
2.2 配置复制槽
standby端如果长时间停机,重启后standby可能因缺少相应的WAL日志无法连接primary。此时可以通过启用max_replication_slots参数启用复制槽来解决此问题。
primary端实例会一直保留预写日志(WAL)文件,直到所有备库所需的插槽都确认已接收到特定段为止。只有完成此操作后,主库实例才会移除相应的WAL文件。
主库端复制槽要和从库数量对应,我这里有2个从库,创建2个复制槽:
cndba=# select * from pg_create_physical_replication_slot('pgsql_db2');
-[ RECORD 1 ]--------
slot_name | pgsql_db2
lsn |
cndba=# select * from pg_create_physical_replication_slot('pgsql_db3');
-[ RECORD 1 ]--------
slot_name | pgsql_db3
lsn |
查看复制槽:
postgres=# select * from pg_replication_slots;
2.3 开启归档模式
具体操作参考之前的博客,这里不再演示:
PostgreSQL 配置 归档模式
https://www.cndba.cn/dave/article/116385
postgres=# select name,setting from pg_settings where name in ('archive_mode','archive_command','archive_timeout');
name | setting
-----------------+--------------------------------------
archive_command | /var/lib/pgsql/14/data/arch.sh %f %p
archive_mode | on
archive_timeout | 0
(3 rows)
2.4 其他相关配置
创建复制用户
postgres=# create role repl login replication encrypted password 'repl';
CREATE ROLE
配置权限文件pg_hba.conf,添加如下内容:
host replication repl 0.0.0.0/0 md5
配置参数文件postgresql.conf,添加如下内容:
listen_addresses = '*'
wal_level = replica
hot_standby = on #standby时为可读模式
max_wal_senders = 10 #同步最大的进程数量
wal_sender_timeout = 60s #流复制主机发送数据的超时时间
max_connections = 200 #最大连接数,从库的max_connections必须要大于主库的,否则启动报错
上面这个是异步复制的配置,如果配置同步复制,还需要添加如下2个参数:
synchronous_commit = on #配置同步模式
synchronous_standby_names = 'ANY 1 (db1, db2)'
#只要primary端wal传到任意一台standby并commit就OK,这里填写的是备库primary_conninfo中application_name参数值。。
注意,如果配置同步复制,建议是一主二从,防止从库同步异常导致主库hang住。
重启PG使参数生效
-bash-4.2$ ps -ef|grep pg
postgres 3088 1940 0 05:10 pts/3 00:00:00 grep --color=auto pg
postgres 10620 1 0 Dec13 ? 00:00:08 /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data/
-bash-4.2$ pg_ctl restart -D /var/lib/pgsql/14/data/
waiting for server to shut down.... done
server stopped
waiting for server to start....2022-12-15 05:10:59.367 CST [3104] LOG: redirecting log output to logging collector process
2022-12-15 05:10:59.367 CST [3104] HINT: Future log output will appear in directory "pg_log".
done
server started
-bash-4.2$
3 从库操作
3.1 安装数据库
从库只需要安装软件,不需要进行初始化,直接参考之前的博客,这里不再单独描述。
PostgreSQL 学习笔记(3) — PG 单实例安装手册
https://www.cndba.cn/dave/article/116374
如果是旧环境,先 删除实例数据:
-bash-4.2$ pg_ctl stop -D /var/lib/pgsql/14/data/
waiting for server to shut down.... done
server stopped
-bash-4.2$
-bash-4.2$ rm -rf /var/lib/pgsql/14/data/*
3.2 同步数据
使用pg_basebackup 命令在从库同步主库数据。
-bash-4.2$ pg_basebackup -h 192.168.56.101 -U repl -D /var/lib/pgsql/14/data/ -X stream -P -R
Password:
WARNING: skipping special file "./.s.PGSQL.5432"
WARNING: skipping special file "./.s.PGSQL.5432"
68298/68298 kB (100%), 2/2 tablespaces
-bash-4.2$ pwd
/var/lib/pgsql/14/data
-bash-4.2$ ll
total 352
drwx------. 3 postgres postgres 22 Dec 15 20:48 archivelog
-rwxr-xr-x. 1 postgres postgres 179 Dec 15 20:48 arch.sh
-rw-------. 1 postgres postgres 227 Dec 15 20:48 backup_label
-rw-------. 1 postgres postgres 293822 Dec 15 20:48 backup_manifest
drwx------. 8 postgres postgres 80 Dec 15 20:48 base
-rw-------. 1 postgres postgres 33 Dec 15 20:48 current_logfiles
drwx------. 2 postgres postgres 4096 Dec 15 20:48 global
drwx------. 2 postgres postgres 162 Dec 15 20:48 log
drwx------. 2 postgres postgres 6 Dec 15 20:48 pg_commit_ts
drwx------. 2 postgres postgres 6 Dec 15 20:48 pg_dynshmem
-rw-------. 1 postgres postgres 383 Dec 15 20:48 pg_hba.conf
-rw-------. 1 postgres postgres 1636 Dec 15 20:48 pg_ident.conf
drwx------. 2 postgres postgres 84 Dec 15 20:48 pg_log
drwx------. 4 postgres postgres 68 Dec 15 20:48 pg_logical
drwx------. 4 postgres postgres 36 Dec 15 20:48 pg_multixact
drwx------. 2 postgres postgres 6 Dec 15 20:48 pg_notify
drwx------. 2 postgres postgres 6 Dec 15 20:48 pg_replslot
drwx------. 2 postgres postgres 6 Dec 15 20:48 pg_serial
drwx------. 2 postgres postgres 6 Dec 15 20:48 pg_snapshots
drwx------. 2 postgres postgres 6 Dec 15 20:48 pg_stat
drwx------. 2 postgres postgres 6 Dec 15 20:48 pg_stat_tmp
drwx------. 2 postgres postgres 6 Dec 15 20:48 pg_subtrans
drwx------. 2 postgres postgres 19 Dec 15 20:48 pg_tblspc
drwx------. 2 postgres postgres 6 Dec 15 20:48 pg_twophase
-rw-------. 1 postgres postgres 3 Dec 15 20:48 PG_VERSION
drwx------. 3 postgres postgres 60 Dec 15 20:48 pg_wal
drwx------. 2 postgres postgres 18 Dec 15 20:48 pg_xact
-rw-------. 1 postgres postgres 483 Dec 15 20:48 postgresql.auto.conf
-rw-------. 1 postgres postgres 29389 Dec 15 20:48 postgresql.conf
-rw-------. 1 postgres postgres 0 Dec 15 20:48 standby.signal
-bash-4.2$
注意:
-R选项用于创建用于replication的配置文件,生成$PGDATA/standby.signal文件,该文件必须存在,用来标识这是一个备库。
3.3 修改从库参数
在postgresql.conf文件中添加如下内容。
102 从库:
primary_conninfo = 'host=192.168.56.101 port=5432 user=repl password= repl application_name=db2'
recovery_target_timeline = 'latest'
primary_slot_name = 'pgsql_db2'
hot_standby = on #配置standby端可读
103 从库:
primary_conninfo = 'host=192.168.56.101 port=5432 user=repl password= repl application_name=db3'
recovery_target_timeline = 'latest'
primary_slot_name = 'pgsql_db3'
hot_standby = on #配置standby端可读
注意两点:
- application_name=db2 每个从库要设置不同。
- primary_conninfo 这里也可以不用配置,在初始化备库的时候,会自动在postgresql.auto.conf 文件中添加这块的信息,但要补充application_name=db2。
3.4 启动从库
-bash-4.2$ pg_ctl start -D /var/lib/pgsql/14/data
waiting for server to start....2022-12-15 21:01:21.489 CST [2682] LOG: redirecting log output to logging collector process
2022-12-15 21:01:21.489 CST [2682] HINT: Future log output will appear in directory "pg_log".
done
server started
-bash-4.2$
4 查看主从状态
4.1 主库
postgres=# /x
Expanded display is on.
cndba=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 19491
usesysid | 24947
usename | repl
application_name | db3
client_addr | 192.168.56.103
client_hostname |
client_port | 35298
backend_start | 2022-12-15 09:03:45.606295+08
backend_xmin |
state | streaming
sent_lsn | 0/29000060
write_lsn | 0/29000060
flush_lsn | 0/29000060
replay_lsn | 0/29000060
write_lag |
flush_lag |
replay_lag |
sync_priority | 1
sync_state | quorum
reply_time | 2022-12-15 23:24:39.924794+08
-[ RECORD 2 ]----+------------------------------
pid | 19505
usesysid | 24947
usename | repl
application_name | db2
client_addr | 192.168.56.102
client_hostname |
client_port | 60748
backend_start | 2022-12-15 09:03:59.556951+08
backend_xmin |
state | streaming
sent_lsn | 0/29000060
write_lsn | 0/29000060
flush_lsn | 0/29000060
replay_lsn | 0/29000060
write_lag | 00:00:00.249779
flush_lag | 00:00:00.250127
replay_lag | 00:00:00.250297
sync_priority | 1
sync_state | quorum
reply_time | 2022-12-15 23:24:33.949931+08
cndba=#
注意这里sync_state 状态不对,可能是synchronous_standby_names 参数配置得不对。
-bash-4.2$ pg_controldata /var/lib/pgsql/14/data
pg_control version number: 1300
Catalog version number: 202107181
Database system identifier: 7129316018383056294
Database cluster state: in production # 主库
pg_control last modified: Thu 15 Dec 2022 06:44:05 AM CST
Latest checkpoint location: 0/19000028
Latest checkpoint's REDO location: 0/19000028
Latest checkpoint's REDO WAL file: 000000010000000000000019
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
……
主库查看备库落后主库wal的字节数:
postgres=# select pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) from pg_stat_replication;
pg_wal_lsn_diff
-----------------
(0 rows)
之前复制槽没值,现在也有值了:
postgres=# /x
Expanded display is on.
postgres=# select * from pg_replication_slots;
-[ RECORD 1 ]-------+-----------
slot_name | pg_primary
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 9052
xmin |
catalog_xmin |
restart_lsn | 0/190001C0
confirmed_flush_lsn |
wal_status | reserved
safe_wal_size |
two_phase | f
4.2 从库
-bash-4.2$ psql -h 127.0.0.1
psql (14.6)
Type "help" for help.
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
查看是否处于recovery状态,值为“t”说明是当前数据库是备库,值为“f”说明当前数据库是主库(也可能复制流出现异常出现双主)
-bash-4.2$ pg_controldata /var/lib/pgsql/14/data
pg_control version number: 1300
Catalog version number: 202107181
Database system identifier: 7129316018383056294
Database cluster state: in archive recovery #从库
pg_control last modified: Thu 15 Dec 2022 09:06:21 PM CST
Latest checkpoint location: 0/19000028
Latest checkpoint's REDO location: 0/19000028
Latest checkpoint's REDO WAL file: 000000010000000000000019
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1486
Latest checkpoint's NextOID: 24948
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 727
Latest checkpoint's oldestXID's DB: 1
……
查看wal接收情况:
postgres=# /x
Expanded display is on.
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 2329
status | streaming
receive_start_lsn | 0/29000000
receive_start_tli | 1
written_lsn | 0/29000148
flushed_lsn | 0/29000148
received_tli | 1
last_msg_send_time | 2022-12-15 09:06:20.804418+08
last_msg_receipt_time | 2022-12-15 23:26:54.969142+08
latest_end_lsn | 0/29000148
latest_end_time | 2022-12-15 09:04:50.687171+08
slot_name | pgsql_db3
sender_host | 192.168.56.101
sender_port | 5432
conninfo | user=repl password=******** channel_binding=prefer dbname=replication host=192.168.56.101 port=5432 application_name=db3 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres target_session_attrs=any
postgres=#
5 验证数据同步
5.1 主库
postgres=# /c cndba
You are now connected to database "cndba" as user "postgres".
cndba=# select * from dave;
id | url
----+----------------------
1 | https://www.cndba.cn
2 | https://www.cndba.cn
3 | https://www.cndba.cn
4 | https://www.cndba.cn
(4 rows)
cndba=# insert into dave values(5,'https://www.cndba.cn');
INSERT 0 1
cndba=# select * from dave;
id | url
----+----------------------
1 | https://www.cndba.cn
2 | https://www.cndba.cn
3 | https://www.cndba.cn
4 | https://www.cndba.cn
5 | https://www.cndba.cn
(5 rows)
5.2 从库
cndba=# select * from dave;
id | url
----+----------------------
1 | https://www.cndba.cn
2 | https://www.cndba.cn
3 | https://www.cndba.cn
4 | https://www.cndba.cn
5 | https://www.cndba.cn
(5 rows)
6 主从切换验证
6.1 查看当前状态
主库:
-bash-4.2$ pg_controldata | grep "Database cluster state:"
Database cluster state: in production
从库102:
-bash-4.2$ pg_controldata | grep "Database cluster state:"
Database cluster state: in archive recovery
-bash-4.2$
从库103:
-bash-4.2$ pg_controldata | grep "Database cluster state:"
Database cluster state: in archive recovery
-bash-4.2$
6.2 激活备库
我们这里直接关闭主库,来模拟故障:
-bash-4.2$ pg_ctl stop -D /var/lib/pgsql/14/data/
waiting for server to shut down.... done
server stopped
-bash-4.2$
将从库102 激活为主库:
-bash-4.2$ pg_ctl status -D /var/lib/pgsql/14/data
pg_ctl: server is running (PID: 4489)
/usr/pgsql-14/bin/postgres "-D" "/var/lib/pgsql/14/data"
-bash-4.2$ pg_ctl promote -D /var/lib/pgsql/14/data
waiting for server to promote.... done
server promoted
-bash-4.2$
提升备库为主库之后,后台进程中不再有startup recovering进程了,同时,多了postgres: walwriter 写进程。
-bash-4.2$ ps -ef|grep postgres
root 2060 1969 0 20:15 pts/1 00:00:00 su - postgres
postgres 2061 2060 0 20:15 pts/1 00:00:00 -bash
root 3498 1760 0 22:11 pts/0 00:00:00 su - postgres
postgres 3499 3498 0 22:11 pts/0 00:00:00 -bash
postgres 3924 2061 0 22:39 pts/1 00:00:00 psql -h 127.0.0.1
postgres 4489 1 0 23:24 ? 00:00:00 /usr/pgsql-14/bin/postgres -D /var/lib/pgsql/14/data
postgres 4490 4489 0 23:24 ? 00:00:00 postgres: logger
postgres 4492 4489 0 23:24 ? 00:00:00 postgres: checkpointer
postgres 4493 4489 0 23:24 ? 00:00:00 postgres: background writer
postgres 4494 4489 0 23:24 ? 00:00:00 postgres: stats collector
postgres 4532 4489 0 23:28 ? 00:00:00 postgres: postgres cndba 127.0.0.1(60386) idle
postgres 4684 4489 0 23:42 ? 00:00:00 postgres: walwriter
postgres 4685 4489 0 23:42 ? 00:00:00 postgres: autovacuum launcher
postgres 4686 4489 0 23:42 ? 00:00:00 postgres: archiver failed on 00000002.history
postgres 4687 4489 0 23:42 ? 00:00:00 postgres: logical replication launcher
postgres 4709 3499 0 23:43 pts/0 00:00:00 ps -ef
postgres 4710 3499 0 23:43 pts/0 00:00:00 grep --color=auto postgres
-bash-4.2$
切换成功后, $PGDATA/standby.signal文件自动消失了,表示该库现在是主库,可以对位提供服务:
-bash-4.2$ pg_controldata | grep "Database cluster state:"
Database cluster state: in production
-bash-4.2$
6.3 恢复主从架构
我们这里是同步复制,并且配置了复制槽,所以会稍微麻烦一些。
6.3.1 新主库操作
创建复制槽
cndba=# select * from pg_create_physical_replication_slot('pgsql_db1');
-[ RECORD 1 ]--------
slot_name | pgsql_db1
lsn |
cndba=# select * from pg_create_physical_replication_slot('pgsql_db3');
-[ RECORD 1 ]--------
slot_name | pgsql_db3
lsn |
查看复制槽:
postgres=# select * from pg_replication_slots;
修改postgresql.conf文件中得如下内容。
synchronous_standby_names = 'ANY 1 (db1, db3)'
6.3.2 恢复原从库103
这里只需要修改配置文件中得信息,指向102即可。
修改postgresql.conf文件中得如下内容。
primary_conninfo = 'host=192.168.56.102 port=5432 user=repl password= repl application_name=db3'
recovery_target_timeline = 'latest'
primary_slot_name = 'pgsql_db3'
hot_standby = on #配置standby端可读
重新加载参数
-bash-4.2$ pg_ctl reload -D /var/lib/pgsql/14/data
server signaled
-bash-4.2$
postgres=# /x
Expanded display is on.
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+----------------------------------------------------------------------------
pid | 4324
status | streaming
receive_start_lsn | 0/2A000000
receive_start_tli | 2
written_lsn | 0/2A0001B8
flushed_lsn | 0/2A0001B8
received_tli | 2
last_msg_send_time | 2022-12-15 23:55:47.454425+08
last_msg_receipt_time | 2022-12-15 23:55:47.485687+08
latest_end_lsn | 0/2A0001B8
latest_end_time | 2022-12-15 23:55:17.453162+08
slot_name | pgsql_db3
sender_host | 192.168.56.102
sender_port | 5432
conninfo | host=192.168.56.102 port=5432 user=repl password= repl application_name=db3
postgres=#
6.3.3 将原主库改成从库
创建$PGDATA/standby.signal文件,直接touch即可,本文件本身也是个空文件。
-bash-4.2$ pwd
/var/lib/pgsql/14/data
-bash-4.2$ touch standby.signal
-bash-4.2$ ll standby.signal
-rw-r--r--. 1 postgres postgres 0 Dec 15 09:38 standby.signal
-bash-4.2$
修改postgresql.conf文件中得如下内容。
primary_conninfo = 'host=192.168.56.102 port=5432 user=repl password= repl application_name=db1'
recovery_target_timeline = 'latest'
primary_slot_name = 'pgsql_db1'
hot_standby = on #配置standby端可读
启动新从库:
-bash-4.2$ pg_ctl start -D /var/lib/pgsql/14/data
waiting for server to start....2022-12-15 09:40:49.456 CST [21790] LOG: redirecting log output to logging collector process
2022-12-15 09:40:49.456 CST [21790] HINT: Future log output will appear in directory "pg_log".
done
server started
-bash-4.2$
postgres=# /x
Expanded display is on.
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+---------------------------------------------------------------------------
pid | 21796
status | streaming
receive_start_lsn | 0/2A000000
receive_start_tli | 2
written_lsn | 0/2A0001B8
flushed_lsn | 0/2A0001B8
received_tli | 2
last_msg_send_time | 2022-12-16 00:01:53.730101+08
last_msg_receipt_time | 2022-12-15 09:41:19.592767+08
latest_end_lsn | 0/2A0001B8
latest_end_time | 2022-12-16 00:01:23.636035+08
slot_name | pgsql_db1
sender_host | 192.168.56.102
sender_port | 5432
conninfo | host=192.168.56.102 port=5432 user=repl password=repl application_name=db1
postgres=#
6.3.4 验证新环境同步情况
主库:
cndba=# /x
Expanded display is on.
cndba=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 4952
usesysid | 24947
usename | repl
application_name | db3
client_addr | 192.168.56.103
client_hostname |
client_port | 49426
backend_start | 2022-12-15 23:55:17.432406+08
backend_xmin |
state | streaming
sent_lsn | 0/2A0001B8
write_lsn | 0/2A0001B8
flush_lsn | 0/2A0001B8
replay_lsn | 0/2A0001B8
write_lag |
flush_lag |
replay_lag |
sync_priority | 1
sync_state | quorum
reply_time | 2022-12-16 00:02:38.000231+08
-[ RECORD 2 ]----+------------------------------
pid | 5102
usesysid | 24947
usename | repl
application_name | db1
client_addr | 192.168.56.101
client_hostname |
client_port | 39138
backend_start | 2022-12-16 00:01:23.617207+08
backend_xmin |
state | streaming
sent_lsn | 0/2A0001B8
write_lsn | 0/2A0001B8
flush_lsn | 0/2A0001B8
replay_lsn | 0/2A0001B8
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2022-12-15 09:41:59.725968+08
cndba=#
cndba=# /c cndba
You are now connected to database "cndba" as user "postgres".
cndba=# select * from dave;
-[ RECORD 1 ]-------------
id | 1
url | https://www.cndba.cn
-[ RECORD 2 ]-------------
id | 2
url | https://www.cndba.cn
-[ RECORD 3 ]-------------
id | 3
url | https://www.cndba.cn
-[ RECORD 4 ]-------------
id | 4
url | https://www.cndba.cn
cndba=# delete from dave where id=4;
DELETE 1
cndba=# select * from dave;
-[ RECORD 1 ]-------------
id | 1
url | https://www.cndba.cn
-[ RECORD 2 ]-------------
id | 2
url | https://www.cndba.cn
-[ RECORD 3 ]-------------
id | 3
url | https://www.cndba.cn
cndba=#
从库101:
postgres=# /c cndba
You are now connected to database "cndba" as user "postgres".
cndba=# select * from dave;
-[ RECORD 1 ]-------------
id | 1
url | https://www.cndba.cn
-[ RECORD 2 ]-------------
id | 2
url | https://www.cndba.cn
-[ RECORD 3 ]-------------
id | 3
url | https://www.cndba.cn
cndba=#
从库103:
postgres=# /c cndba
You are now connected to database "cndba" as user "postgres".
cndba=# select * from dave;
-[ RECORD 1 ]-------------
id | 1
url | https://www.cndba.cn
-[ RECORD 2 ]-------------
id | 2
url | https://www.cndba.cn
-[ RECORD 3 ]-------------
id | 3
url | https://www.cndba.cn
cndba=#
版权声明:本文为博主原创文章,未经博主允许不得转载。