REPMGR-PG高可用搭建(三)

太阳2年前 (2023-09-08)技术文章1258

2.2.2repmgr安装

兼容性

3节点均安装repmgr

1.安装依赖
# yum install flex
2.下载解压
# wget -c https://repmgr.org/download/repmgr-5.2.1.tar.gz
# tar xf repmgr-5.2.1.tar.gz -C /usr/local
3.编译安装
# cd /usr/local/repmgr-5.2.1
# ./configure && make install
4.创建数据目录
# mkdir /data/repmgr
# chown -R postgres:postgres /data/repmgr
5. 创建配置文件、日志文件
# touch /data/repmgr/repmgr.conf
# touch /data/repmgr/repmgrd.log
6.修改配置文件，路径、IP等按需修改
# cat repmgr.conf
node_id=1                                       # 节点ID，高可用集群各节点标识
node_name='172.16.104.5'                        # 节点IP或名称
conninfo='host=172.16.104.5 user=repmgr dbname=repmgr connect_timeout=2'    # 本节点数据库连接信息
data_directory='/data/pgsql12/data'          # pg数据目录
replication_user='repmgr'                       # 流复制数据库用户，默认使用repmgr
repmgr_bindir='/usr/local/repmgr-5.2.1'          # repmgr软件目录
pg_bindir='/usr/local/pgsql/bin'              # pg软件目录
#shutdown_check_timeout=10
# pg、repmgr服务管理命令
service_start_command='sudo systemctl start postgres12'
service_stop_command='sudo systemctl stop postgres12'
service_restart_command='sudo systemctl restart postgres12'
service_reload_command='sudo systemctl reload postgres12'
repmgrd_service_start_command='sudo systemctl start repmgr12'
repmgrd_service_stop_command='sudo systemctl stop repmgr12'
# 日志管理
log_level=INFO
log_file='/data/repmgr/repmgrd.log'
log_status_interval=10
# failover设置
failover='automatic'
promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /data/repmgr/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /data/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
# 高可用参数设置
location='location1'
priority=100
monitoring_history=yes
reconnect_interval=5
reconnect_attempts=3
monitor_interval_secs=5
use_replication_slots=true

2.2.3systemctl设置

repmgr、PG服务的systemctl启动设置（3节点均配置）

systemctl管理脚本：/usr/lib/systemd/system目录下postgres12.service或repmgr12.service

systemctl管理命令：systemctl {status|restart|start|stop|reload} postgres12或repmgr12.service

1.在/usr/lib/systemd/system目录下，创建postgres12.service文件，内容如下，路径按需修改：
# It's not recommended to modify this file in-place, because it will be
# overwritten during package upgrades.  It is recommended to use systemd
# "dropin" feature;  i.e. create file with suffix .conf under
# /etc/systemd/system/postgresql-12.service.d directory overriding the
# unit's defaults. You can also use "systemctl edit postgresql-12"
# Look at systemd.unit(5) manual page for more info.
# Note: changing PGDATA will typically require adjusting SELinux
# configuration as well.
# Note: do not use a PGDATA pathname containing spaces, or you will
# break postgresql-setup.
[Unit]
Description=PostgreSQL 12 database server
Documentation=https://www.postgresql.org/docs/12/static/
After=syslog.target
After=network.target
[Service]
Type=forking
User=postgres
Group=postgres
# Note: avoid inserting whitespace in these Environment= lines, or you may
# break postgresql-setup.
# Location of database directory
Environment=PGDATA=/data/pgsql12/data
Environment=PGPORT=5432
# Where to send early-startup messages from the server (before the logging
# options of postgresql.conf take effect)
# This is normally controlled by the global default set by systemd
# StandardOutput=syslog
# Disable OOM kill on the postmaster
#OOMScoreAdjust=-1000
#Environment=PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
#Environment=PG_OOM_ADJUST_VALUE=0
#ExecStartPre=/usr/pgsql-12/bin/postgresql-12-check-db-dir ${PGDATA}
#ExecStart=/usr/pgsql-12/bin/postmaster -D ${PGDATA}
#ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/pgsql/bin/pg_ctl start -D /data/pgsql12/data
ExecStop=/usr/local/pgsql/bin/pg_ctl stop -D /data/pgsql12/data
ExecReload=/usr/local/pgsql/bin/pg_ctl reload -D /data/pgsql12/data
KillMode=mixed
KillSignal=SIGINT
# Do not set any timeout value, so that systemd will not kill postmaster
# during crash recovery.
TimeoutSec=0
[Install]
WantedBy=multi-user.target
2.在/usr/lib/systemd/system目录下，创建repmgr12.service文件，内容如下，路径按需修改：
# It's not recommended to modify this file in-place, because it will be
# overwritten during package upgrades.  If you want to customize, the
# best way is to create a file "/etc/systemd/system/repmgr.service",
# containing
#       .include /lib/systemd/system/repmgr.service
#       ...make your changes here...
# For more info about custom unit files, see
# http://fedoraproject.org/wiki/Systemd#How_do_I_customize_a_unit_file.2F_add_a_custom_unit_file.3F
[Unit]
Description=A replication manager, and failover management tool for PostgreSQL
After=syslog.target
After=network.target
After=postgresql-12.service
[Service]
Type=forking
User=postgres
Group=postgres
# PID file
PIDFile=/data/repmgr/repmgrd-12.pid
# Location of repmgr conf file:
Environment=REPMGRDCONF=/data/repmgr/repmgr.conf
Environment=PIDFILE=/data/repmgr/repmgrd-12.pid
# Where to send early-startup messages from the server
# This is normally controlled by the global default set by systemd
# StandardOutput=syslog
ExecStart=/usr/local/pgsql/bin/repmgrd -f ${REPMGRDCONF} -p ${PIDFILE} -d --verbose
ExecStop=/usr/bin/kill -TERM $MAINPID
ExecReload=/usr/bin/kill -HUP $MAINPID
# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=300
[Install]
WantedBy=multi-user.target

2.3注册主、备节点

2.3.1配置主节点

repmgr注册主节点：

[root@172-16-104-5 repmgr]# su - postgres
上一次登录：四 8月 19 16:14:09 CST 2021pts/1 上
[postgres@172-16-104-5 ~]$ repmgr -f /data/repmgr/repmgr.conf primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered

查看集群状态：

$ repmgr -f /data/repmgr/repmgr.conf cluster show
 ID | Name         | Role    | Status    | Upstream | Location  | Priority | Timeline | Connection string                                            
----+--------------+---------+-----------+----------+-----------+----------+----------+---------------------------------------------------------------
 1  | 172.16.104.5 | primary | * running |          | location1 | 100      | 1        | host=172.16.104.5 user=repmgr dbname=repmgr connect_timeout=2

2.3.2克隆并注册从节点

备服务器节点克隆

备节点服务器不需初始化，可用rempgr将主节点数据拷贝至备节点

备节点测试数据库连通性：

$ psql 'host=${主服务器_ip} user=repmgr dbname=repmgr connect_timeout=2'

克隆命令测试：-- dry-run表示命令测试，并不会实际执行，可用于验证是否会出现一些基本错误

$ repmgr -h ${主服务器_ip} -U repmgr -d repmgr -f /data/repmgr/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/data/pgsql12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=172.16.104.5 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: parameter "max_replication_slots" set to 10
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
INFO: replication slots will be created by user "repmgr"
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met

实际克隆数据：

$ repmgr -h ${主服务器_ip} -U repmgr -d repmgr -f /data/repmgr/repmgr.conf standby clone
NOTICE: destination directory "/data/pgsql12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=172.16.104.5 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/data/pgsql12/data"
INFO: creating replication slot as user "repmgr"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /data/pgsql12/data -h 172.16.104.5 -p 5432 -U repmgr -X stream -S repmgr_slot_2 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: sudo systemctl start postgres12
HINT: after starting the server, you need to register this standby with "repmgr standby register"

启动备节点PG：

$ pg_ctl -D /data/pgsql12/data start
waiting for server to start....2021-08-20 16:19:05.633 CST [69753] LOG:  starting PostgreSQL 12.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2021-08-20 16:19:05.635 CST [69753] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-08-20 16:19:05.635 CST [69753] LOG:  listening on IPv6 address "::", port 5432
2021-08-20 16:19:05.641 CST [69753] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-08-20 16:19:05.671 CST [69753] LOG:  redirecting log output to logging collector process
2021-08-20 16:19:05.671 CST [69753] HINT:  Future log output will appear in directory "log".
 done
server started

注册备用服务器：

将该服务器注册为集群的备节点：

$ repmgr -f /data/repmgr/repmgr.conf standby register
INFO: connecting to local node "172.16.104.6" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
INFO: standby registration complete
NOTICE: standby node "172.16.104.6" (ID: 2) successfully registered

查看当前集群状态：

$ repmgr -f /data/repmgr/repmgr.conf cluster show
 ID | Name         | Role    | Status    | Upstream     | Location  | Priority | Timeline | Connection string                                            
----+--------------+---------+-----------+--------------+-----------+----------+----------+---------------------------------------------------------------
 1  | 172.16.104.5 | primary | * running |              | location1 | 100      | 1        | host=172.16.104.5 user=repmgr dbname=repmgr connect_timeout=2
 2  | 172.16.104.6 | standby |   running | 172.16.104.5 | location1 | 100      | 1        | host=172.16.104.6 user=repmgr dbname=repmgr connect_timeout=2

2.3.3添加见证服务器

-- 见证服务器注册
$ repmgr -f /tpsys/repmgr/conf/repmgr.conf witness register -h${pg_primary_ip} -Urepmgr -drepmgr
-- 检查当前集群各节点状态
$ repmgr -f /tpsys/repmgr/conf/repmgr.conf cluster show

2.4手动故障切换

将standby主服务器节点提升为primary备用服务器节点

--查看集群状态
[postgres@172-16-104-5 ~]$ repmgr -f /data/repmgr/repmgr.conf cluster show
 ID | Name         | Role    | Status    | Upstream     | Location  | Priority | Timeline | Connection string                                            
----+--------------+---------+-----------+--------------+-----------+----------+----------+---------------------------------------------------------------
 1  | 172.16.104.5 | primary | * running |              | location1 | 100      | 1        | host=172.16.104.5 user=repmgr dbname=repmgr connect_timeout=2
 2  | 172.16.104.6 | standby |   running | 172.16.104.5 | location1 | 100      | 1        | host=172.16.104.6 user=repmgr dbname=repmgr connect_timeout=2
 
 -- 手动关闭primary节点数据库
 [postgres@172-16-104-5 ~]$ sudo systemctl stop postgres12
 
 --查看当前集群状态
 [postgres@172-16-104-6 ~]$ repmgr -f /data/repmgr/repmgr.conf cluster show
 ID | Name         | Role    | Status        | Upstream       | Location  | Priority | Timeline | Connection string                                            
----+--------------+---------+---------------+----------------+-----------+----------+----------+---------------------------------------------------------------
 1  | 172.16.104.5 | primary | ? unreachable | ?              | location1 | 100      |          | host=172.16.104.5 user=repmgr dbname=repmgr connect_timeout=2
 2  | 172.16.104.6 | standby |   running     | ? 172.16.104.5 | location1 | 100      | 1        | host=172.16.104.6 user=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
  - unable to connect to node "172.16.104.5" (ID: 1)
  - node "172.16.104.5" (ID: 1) is registered as an active primary but is unreachable
  - unable to connect to node "172.16.104.6" (ID: 2)'s upstream node "172.16.104.5" (ID: 1)
  - unable to determine if node "172.16.104.6" (ID: 2) is attached to its upstream node "172.16.104.5" (ID: 1)
HINT: execute with --verbose option to see connection error messages
--手动提升备为主节点
$ repmgr -f /data/repmgr/repmgr.conf standby promote
-- 查看集群状态
$ repmgr -f /data/repmgr/repmgr.conf cluster show

2.5自动failover

-- 模拟宕机
$ sudo systemctl stop postgres
-- 备服务器节点强制重新注册
$ repmgr -f /data/repmgr/repmgr.conf node rejoin -d'host=${primary_host} user=repmgr dbname=repmgr connect_timeout=2' --force-rewind
-- 备份节点重新注册 
$ repmgr -f /data/repmgr/repmgr.conf  standby register --force

标签: PostgreSQL 数据库高可用

返回列表

上一篇：REPMGR-PG高可用搭建(二)

下一篇：某网络环境下访问业务异常问题排查

REPMGR-PG高可用搭建(一)

PG高可用对比数据库复制的术语和定义这些术语和定义应该有助于讨论复制。在与其他Postgres开发人员进行了大量讨论之后，我编译了它们，但是这些定义应该是普遍可用的，并且也应该适用于其他RDBMS。复...

REPMGR-PG高可用搭建(二)

REPMGR搭建步骤一、介绍repmgr是第二象限开源的一套流复制集群管理工具，用于管理PostgreSQL服务器群集中的复制和故障转移。支持故障自动转移和手动切换；支持分布式管理集群节点，易扩展，...

REPMGR-PG高可用搭建(三)

2.2.2repmgr安装

2.2.3systemctl设置

2.3注册主、备节点

2.3.1配置主节点

2.3.2克隆并注册从节点

2.3.3添加见证服务器

2.4手动故障切换

2.5自动failover

相关文章

REPMGR-PG高可用搭建(一)

REPMGR-PG高可用搭建(二)

发表评论

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号

REPMGR-PG高可用搭建(三)

2.2.2repmgr安装

2.2.3systemctl设置

2.3注册主、备节点

2.3.1配置主节点

2.3.2克隆并注册从节点

2.3.3添加见证服务器

2.4手动故障切换

2.5自动failover

相关文章

REPMGR-PG高可用搭建(一)

REPMGR-PG高可用搭建(二)

发表评论 取消回复

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号var _hmt = _hmt || [];(function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?dcf8139ce75b768b71dccc5e589b983c"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s);})();

发表评论

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号