REPMGR-PG高可用搭建(三)
2.2.2repmgr安装
兼容性
3节点均安装repmgr
1.安装依赖 # yum install flex 2.下载解压 # wget -c https://repmgr.org/download/repmgr-5.2.1.tar.gz # tar xf repmgr-5.2.1.tar.gz -C /usr/local 3.编译安装 # cd /usr/local/repmgr-5.2.1 # ./configure && make install 4.创建数据目录 # mkdir /data/repmgr # chown -R postgres:postgres /data/repmgr 5. 创建配置文件、日志文件 # touch /data/repmgr/repmgr.conf # touch /data/repmgr/repmgrd.log 6.修改配置文件,路径、IP等按需修改 # cat repmgr.conf node_id=1 # 节点ID,高可用集群各节点标识 node_name='172.16.104.5' # 节点IP或名称 conninfo='host=172.16.104.5 user=repmgr dbname=repmgr connect_timeout=2' # 本节点数据库连接信息 data_directory='/data/pgsql12/data' # pg数据目录 replication_user='repmgr' # 流复制数据库用户,默认使用repmgr repmgr_bindir='/usr/local/repmgr-5.2.1' # repmgr软件目录 pg_bindir='/usr/local/pgsql/bin' # pg软件目录 #shutdown_check_timeout=10 # pg、repmgr服务管理命令 service_start_command='sudo systemctl start postgres12' service_stop_command='sudo systemctl stop postgres12' service_restart_command='sudo systemctl restart postgres12' service_reload_command='sudo systemctl reload postgres12' repmgrd_service_start_command='sudo systemctl start repmgr12' repmgrd_service_stop_command='sudo systemctl stop repmgr12' # 日志管理 log_level=INFO log_file='/data/repmgr/repmgrd.log' log_status_interval=10 # failover设置 failover='automatic' promote_command='/usr/local/pgsql/bin/repmgr standby promote -f /data/repmgr/repmgr.conf --log-to-file' follow_command='/usr/local/pgsql/bin/repmgr standby follow -f /data/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n' # 高可用参数设置 location='location1' priority=100 monitoring_history=yes reconnect_interval=5 reconnect_attempts=3 monitor_interval_secs=5 use_replication_slots=true
2.2.3systemctl设置
repmgr、PG服务的systemctl启动设置(3节点均配置)
systemctl管理脚本:/usr/lib/systemd/system目录下postgres12.service或repmgr12.service
systemctl管理命令:systemctl {status|restart|start|stop|reload} postgres12或repmgr12.service
1.在/usr/lib/systemd/system目录下,创建postgres12.service文件,内容如下,路径按需修改: # It's not recommended to modify this file in-place, because it will be # overwritten during package upgrades. It is recommended to use systemd # "dropin" feature; i.e. create file with suffix .conf under # /etc/systemd/system/postgresql-12.service.d directory overriding the # unit's defaults. You can also use "systemctl edit postgresql-12" # Look at systemd.unit(5) manual page for more info. # Note: changing PGDATA will typically require adjusting SELinux # configuration as well. # Note: do not use a PGDATA pathname containing spaces, or you will # break postgresql-setup. [Unit] Description=PostgreSQL 12 database server Documentation=https://www.postgresql.org/docs/12/static/ After=syslog.target After=network.target [Service] Type=forking User=postgres Group=postgres # Note: avoid inserting whitespace in these Environment= lines, or you may # break postgresql-setup. # Location of database directory Environment=PGDATA=/data/pgsql12/data Environment=PGPORT=5432 # Where to send early-startup messages from the server (before the logging # options of postgresql.conf take effect) # This is normally controlled by the global default set by systemd # StandardOutput=syslog # Disable OOM kill on the postmaster #OOMScoreAdjust=-1000 #Environment=PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj #Environment=PG_OOM_ADJUST_VALUE=0 #ExecStartPre=/usr/pgsql-12/bin/postgresql-12-check-db-dir ${PGDATA} #ExecStart=/usr/pgsql-12/bin/postmaster -D ${PGDATA} #ExecReload=/bin/kill -HUP $MAINPID ExecStart=/usr/local/pgsql/bin/pg_ctl start -D /data/pgsql12/data ExecStop=/usr/local/pgsql/bin/pg_ctl stop -D /data/pgsql12/data ExecReload=/usr/local/pgsql/bin/pg_ctl reload -D /data/pgsql12/data KillMode=mixed KillSignal=SIGINT # Do not set any timeout value, so that systemd will not kill postmaster # during crash recovery. TimeoutSec=0 [Install] WantedBy=multi-user.target 2.在/usr/lib/systemd/system目录下,创建repmgr12.service文件,内容如下,路径按需修改: # It's not recommended to modify this file in-place, because it will be # overwritten during package upgrades. If you want to customize, the # best way is to create a file "/etc/systemd/system/repmgr.service", # containing # .include /lib/systemd/system/repmgr.service # ...make your changes here... # For more info about custom unit files, see # http://fedoraproject.org/wiki/Systemd#How_do_I_customize_a_unit_file.2F_add_a_custom_unit_file.3F [Unit] Description=A replication manager, and failover management tool for PostgreSQL After=syslog.target After=network.target After=postgresql-12.service [Service] Type=forking User=postgres Group=postgres # PID file PIDFile=/data/repmgr/repmgrd-12.pid # Location of repmgr conf file: Environment=REPMGRDCONF=/data/repmgr/repmgr.conf Environment=PIDFILE=/data/repmgr/repmgrd-12.pid # Where to send early-startup messages from the server # This is normally controlled by the global default set by systemd # StandardOutput=syslog ExecStart=/usr/local/pgsql/bin/repmgrd -f ${REPMGRDCONF} -p ${PIDFILE} -d --verbose ExecStop=/usr/bin/kill -TERM $MAINPID ExecReload=/usr/bin/kill -HUP $MAINPID # Give a reasonable amount of time for the server to start up/shut down TimeoutSec=300 [Install] WantedBy=multi-user.target
2.3注册主、备节点
2.3.1配置主节点
repmgr注册主节点:
[root@172-16-104-5 repmgr]# su - postgres 上一次登录:四 8月 19 16:14:09 CST 2021pts/1 上 [postgres@172-16-104-5 ~]$ repmgr -f /data/repmgr/repmgr.conf primary register INFO: connecting to primary database... NOTICE: attempting to install extension "repmgr" NOTICE: "repmgr" extension successfully installed NOTICE: primary node record (ID: 1) registered
查看集群状态:
$ repmgr -f /data/repmgr/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+--------------+---------+-----------+----------+-----------+----------+----------+--------------------------------------------------------------- 1 | 172.16.104.5 | primary | * running | | location1 | 100 | 1 | host=172.16.104.5 user=repmgr dbname=repmgr connect_timeout=2
2.3.2克隆并注册从节点
备服务器节点克隆
备节点服务器不需初始化,可用rempgr将主节点数据拷贝至备节点
备节点测试数据库连通性:
$ psql 'host=${主服务器_ip} user=repmgr dbname=repmgr connect_timeout=2'
克隆命令测试:-- dry-run表示命令测试,并不会实际执行,可用于验证是否会出现一些基本错误
$ repmgr -h ${主服务器_ip} -U repmgr -d repmgr -f /data/repmgr/repmgr.conf standby clone --dry-run NOTICE: destination directory "/data/pgsql12/data" provided INFO: connecting to source node DETAIL: connection string is: host=172.16.104.5 user=repmgr dbname=repmgr DETAIL: current installation size is 31 MB INFO: "repmgr" extension is installed in database "repmgr" INFO: parameter "max_replication_slots" set to 10 INFO: parameter "max_wal_senders" set to 10 NOTICE: checking for available walsenders on the source node (2 required) INFO: sufficient walsenders available on the source node DETAIL: 2 required, 10 available NOTICE: checking replication connections can be made to the source server (2 required) INFO: required number of replication connections could be made to the source server DETAIL: 2 replication connections required INFO: replication slots will be created by user "repmgr" NOTICE: standby will attach to upstream node 1 HINT: consider using the -c/--fast-checkpoint option INFO: all prerequisites for "standby clone" are met
实际克隆数据:
$ repmgr -h ${主服务器_ip} -U repmgr -d repmgr -f /data/repmgr/repmgr.conf standby clone NOTICE: destination directory "/data/pgsql12/data" provided INFO: connecting to source node DETAIL: connection string is: host=172.16.104.5 user=repmgr dbname=repmgr DETAIL: current installation size is 31 MB NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: checking and correcting permissions on existing directory "/data/pgsql12/data" INFO: creating replication slot as user "repmgr" NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup" -D /data/pgsql12/data -h 172.16.104.5 -p 5432 -U repmgr -X stream -S repmgr_slot_2 NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: sudo systemctl start postgres12 HINT: after starting the server, you need to register this standby with "repmgr standby register"
启动备节点PG:
$ pg_ctl -D /data/pgsql12/data start waiting for server to start....2021-08-20 16:19:05.633 CST [69753] LOG: starting PostgreSQL 12.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit 2021-08-20 16:19:05.635 CST [69753] LOG: listening on IPv4 address "0.0.0.0", port 5432 2021-08-20 16:19:05.635 CST [69753] LOG: listening on IPv6 address "::", port 5432 2021-08-20 16:19:05.641 CST [69753] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2021-08-20 16:19:05.671 CST [69753] LOG: redirecting log output to logging collector process 2021-08-20 16:19:05.671 CST [69753] HINT: Future log output will appear in directory "log". done server started
注册备用服务器:
将该服务器注册为集群的备节点:
$ repmgr -f /data/repmgr/repmgr.conf standby register INFO: connecting to local node "172.16.104.6" (ID: 2) INFO: connecting to primary database WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1) INFO: standby registration complete NOTICE: standby node "172.16.104.6" (ID: 2) successfully registered
查看当前集群状态:
$ repmgr -f /data/repmgr/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+--------------+---------+-----------+--------------+-----------+----------+----------+--------------------------------------------------------------- 1 | 172.16.104.5 | primary | * running | | location1 | 100 | 1 | host=172.16.104.5 user=repmgr dbname=repmgr connect_timeout=2 2 | 172.16.104.6 | standby | running | 172.16.104.5 | location1 | 100 | 1 | host=172.16.104.6 user=repmgr dbname=repmgr connect_timeout=2
2.3.3添加见证服务器
-- 见证服务器注册 $ repmgr -f /tpsys/repmgr/conf/repmgr.conf witness register -h${pg_primary_ip} -Urepmgr -drepmgr -- 检查当前集群各节点状态 $ repmgr -f /tpsys/repmgr/conf/repmgr.conf cluster show
2.4手动故障切换
将standby主服务器节点提升为primary备用服务器节点
--查看集群状态 [postgres@172-16-104-5 ~]$ repmgr -f /data/repmgr/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+--------------+---------+-----------+--------------+-----------+----------+----------+--------------------------------------------------------------- 1 | 172.16.104.5 | primary | * running | | location1 | 100 | 1 | host=172.16.104.5 user=repmgr dbname=repmgr connect_timeout=2 2 | 172.16.104.6 | standby | running | 172.16.104.5 | location1 | 100 | 1 | host=172.16.104.6 user=repmgr dbname=repmgr connect_timeout=2 -- 手动关闭primary节点数据库 [postgres@172-16-104-5 ~]$ sudo systemctl stop postgres12 --查看当前集群状态 [postgres@172-16-104-6 ~]$ repmgr -f /data/repmgr/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+--------------+---------+---------------+----------------+-----------+----------+----------+--------------------------------------------------------------- 1 | 172.16.104.5 | primary | ? unreachable | ? | location1 | 100 | | host=172.16.104.5 user=repmgr dbname=repmgr connect_timeout=2 2 | 172.16.104.6 | standby | running | ? 172.16.104.5 | location1 | 100 | 1 | host=172.16.104.6 user=repmgr dbname=repmgr connect_timeout=2 WARNING: following issues were detected - unable to connect to node "172.16.104.5" (ID: 1) - node "172.16.104.5" (ID: 1) is registered as an active primary but is unreachable - unable to connect to node "172.16.104.6" (ID: 2)'s upstream node "172.16.104.5" (ID: 1) - unable to determine if node "172.16.104.6" (ID: 2) is attached to its upstream node "172.16.104.5" (ID: 1) HINT: execute with --verbose option to see connection error messages --手动提升备为主节点 $ repmgr -f /data/repmgr/repmgr.conf standby promote -- 查看集群状态 $ repmgr -f /data/repmgr/repmgr.conf cluster show
2.5自动failover
-- 模拟宕机 $ sudo systemctl stop postgres -- 备服务器节点强制重新注册 $ repmgr -f /data/repmgr/repmgr.conf node rejoin -d'host=${primary_host} user=repmgr dbname=repmgr connect_timeout=2' --force-rewind -- 备份节点重新注册 $ repmgr -f /data/repmgr/repmgr.conf standby register --force