PG的锁(三)
六、锁的维护
6.1 锁相关参数
deadlock_timeout(integer):默认1s,表示pg数据库仅对锁超时大于1s的情况进行死锁检测。
log_lock_waits : 默认关闭,若打开该参数则表示会将锁超时超过deadlock_timeout的信息记录到日志中。
lock_timeout : 锁等待超时,默认为0表示禁用锁超时。若该参数设置>0则表示会话等待锁资源Ns后如果仍无法获取到相关锁资源则终止相关语句执行。
idle_in_transaction_session_timeout : 对于一个空闲的事务,超过多少时间后自动终止,单位为毫秒。默认为0表示禁用该参数。会话终止会释放该会话所有锁资源。
6.2 锁监控
1、PG中锁相关的两个重要的视图
2、查看谁锁了谁
当一个进程处于等待(被堵塞)状态时,是谁干的?可以使用如下函数,快速得到捣蛋(堵塞别人)的PID。
1)请求锁时被堵,是哪些PID堵的?
pg_blocking_pids(int)int[]Process ID(s) that are blocking specified server process ID from acquiring a lock
2)请求safe快照时被堵(SSI隔离级别,请求安全快照冲突),是哪些PID堵的?
pg_safe_snapshot_blocking_pids(int)int[]Process ID(s) that are blocking specified server process ID from acquiring a safe snapshot
示例:
select pid,pg_blocking_pids(pid),wait_event_type,wait_event,query from pg_stat_activity ; - pid : 当前会话,被阻塞者 - pg_blocking_pids(pid) : 阻塞者 - wait_event_type : 会话状态,Lock表示被阻塞无法获取锁资源 - wait_event : 等待事件 - query : 会话执行相关查询
事务一 | 事务二 | 事务三 |
begin; | begin | - |
update t1 set info ='aaaaaaaaabbbbbbbbb' where id=2; | - | - |
- | update t1 set info ='aaaaaaaaa' where id=2;//被夯住 | - |
- | - | select pid,pg_blocking_pids(pid),wait_event_type,wait_event,query from pg_stat_activity ; |
- | update执行成功, info ='aaaaaaaaa' | - |
commit//执行报错 | commit | - |
## 事务三查看锁等待 ## 可以看到28363会话正在等待27152会话 db1=# select pid,pg_blocking_pids(pid),wait_event_type,wait_event,query from pg_stat_activity ; pid | pg_blocking_pids | wait_event_type | wait_event | query -------+------------------+-----------------+---------------------+------------------------------------------------------------------------------------------- 9020 | {} | Activity | LogicalLauncherMain | 9017 | {} | Activity | AutoVacuumMain | 28363 | {27152} | Lock | transactionid | update t1 set info ='aaaaaaaaa' where id=2; 27219 | {} | | | select pid,pg_blocking_pids(pid),wait_event_type,wait_event,query from pg_stat_activity ; 27152 | {} | Client | ClientRead | update t1 set info ='aaaaaaaaabbbbbbbbb' where id=2; 9015 | {} | Activity | BgWriterHibernate | 9014 | {} | Activity | CheckpointerMain | 9016 | {} | Activity | WalWriterMain | (8 rows) ## 事务一commit相关报错 db1=# commit; FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded.
2、通过数据库日志(开启lock_timeout, log_lockwait参数)(csvlog)跟踪锁等待信息
3、通过SQL查看锁冲突具体信息
1)锁冲突查询SQL
with t_wait as ( select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted, a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath, b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted ), t_run as ( select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted, a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath, b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted ), t_overlap as ( select r.* from t_wait w join t_run r on ( r.locktype is not distinct from w.locktype and r.database is not distinct from w.database and r.relation is not distinct from w.relation and r.page is not distinct from w.page and r.tuple is not distinct from w.tuple and r.virtualxid is not distinct from w.virtualxid and r.transactionid is not distinct from w.transactionid and r.classid is not distinct from w.classid and r.objid is not distinct from w.objid and r.objsubid is not distinct from w.objsubid and r.pid <> w.pid ) ), t_unionall as ( select r.* from t_overlap r union all select w.* from t_wait w ) select locktype,datname,relation::regclass,page,tuple,virtualxid,transactionid::text,classid::regclass,objid,objsubid, string_agg( 'Pid: '||case when pid is null then 'NULL' else pid::text end||chr(10)|| 'Lock_Granted: '||case when granted is null then 'NULL' else granted::text end||' , Mode: '||case when mode is null then 'NULL' else mode::text end||' , FastPath: '||case when fastpath is null then 'NULL' else fastpath::text end||' , VirtualTransaction: '||case when virtualtransaction is null then 'NULL' else virtualtransaction::text end||' , Session_State: '||case when state is null then 'NULL' else state::text end||chr(10)|| 'Username: '||case when usename is null then 'NULL' else usename::text end||' , Database: '||case when datname is null then 'NULL' else datname::text end||' , Client_Addr: '||case when client_addr is null then 'NULL' else client_addr::text end||' , Client_Port: '||case when client_port is null then 'NULL' else client_port::text end||' , Application_Name: '||case when application_name is null then 'NULL' else application_name::text end||chr(10)|| 'Xact_Start: '||case when xact_start is null then 'NULL' else xact_start::text end||' , Query_Start: '||case when query_start is null then 'NULL' else query_start::text end||' , Xact_Elapse: '||case when (now()-xact_start) is null then 'NULL' else (now()-xact_start)::text end||' , Query_Elapse: '||case when (now()-query_start) is null then 'NULL' else (now()-query_start)::text end||chr(10)|| 'SQL (Current SQL in Transaction): '||chr(10)|| case when query is null then 'NULL' else query::text end, chr(10)||'--------'||chr(10) order by ( case mode when 'INVALID' then 0 when 'AccessShareLock' then 1 when 'RowShareLock' then 2 when 'RowExclusiveLock' then 3 when 'ShareUpdateExclusiveLock' then 4 when 'ShareLock' then 5 when 'ShareRowExclusiveLock' then 6 when 'ExclusiveLock' then 7 when 'AccessExclusiveLock' then 8 else 0 end ) desc, (case when granted then 0 else 1 end) ) as lock_conflict from t_unionall group by locktype,datname,relation,page,tuple,virtualxid,transactionid::text,classid,objid,objsubid ;
2)如果觉得写SQL麻烦,可以将它创建为视图,直接查询视图
create view v_locks_monitor as with t_wait as ( select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted, a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath, b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted ), t_run as ( select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted, a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath, b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted ), t_overlap as ( select r.* from t_wait w join t_run r on ( r.locktype is not distinct from w.locktype and r.database is not distinct from w.database and r.relation is not distinct from w.relation and r.page is not distinct from w.page and r.tuple is not distinct from w.tuple and r.virtualxid is not distinct from w.virtualxid and r.transactionid is not distinct from w.transactionid and r.classid is not distinct from w.classid and r.objid is not distinct from w.objid and r.objsubid is not distinct from w.objsubid and r.pid <> w.pid ) ), t_unionall as ( select r.* from t_overlap r union all select w.* from t_wait w ) select locktype,datname,relation::regclass,page,tuple,virtualxid,transactionid::text,classid::regclass,objid,objsubid, string_agg( 'Pid: '||case when pid is null then 'NULL' else pid::text end||chr(10)|| 'Lock_Granted: '||case when granted is null then 'NULL' else granted::text end||' , Mode: '||case when mode is null then 'NULL' else mode::text end||' , FastPath: '||case when fastpath is null then 'NULL' else fastpath::text end||' , VirtualTransaction: '||case when virtualtransaction is null then 'NULL' else virtualtransaction::text end||' , Session_State: '||case when state is null then 'NULL' else state::text end||chr(10)|| 'Username: '||case when usename is null then 'NULL' else usename::text end||' , Database: '||case when datname is null then 'NULL' else datname::text end||' , Client_Addr: '||case when client_addr is null then 'NULL' else client_addr::text end||' , Client_Port: '||case when client_port is null then 'NULL' else client_port::text end||' , Application_Name: '||case when application_name is null then 'NULL' else application_name::text end||chr(10)|| 'Xact_Start: '||case when xact_start is null then 'NULL' else xact_start::text end||' , Query_Start: '||case when query_start is null then 'NULL' else query_start::text end||' , Xact_Elapse: '||case when (now()-xact_start) is null then 'NULL' else (now()-xact_start)::text end||' , Query_Elapse: '||case when (now()-query_start) is null then 'NULL' else (now()-query_start)::text end||chr(10)|| 'SQL (Current SQL in Transaction): '||chr(10)|| case when query is null then 'NULL' else query::text end, chr(10)||'--------'||chr(10) order by ( case mode when 'INVALID' then 0 when 'AccessShareLock' then 1 when 'RowShareLock' then 2 when 'RowExclusiveLock' then 3 when 'ShareUpdateExclusiveLock' then 4 when 'ShareLock' then 5 when 'ShareRowExclusiveLock' then 6 when 'ExclusiveLock' then 7 when 'AccessExclusiveLock' then 8 else 0 end ) desc, (case when granted then 0 else 1 end) ) as lock_conflict from t_unionall group by locktype,datname,relation,page,tuple,virtualxid,transactionid::text,classid,objid,objsubid ;
3)锁冲突视图查询事务查询
-[ RECORD 1 ]-+------------------------------------------------------------------------------------------------------------------------------------------------------ locktype | transactionid datname | db1 relation | page | tuple | virtualxid | transactionid | 1738627 classid | objid | objsubid | lock_conflict | Pid: 28363 + | Lock_Granted: true , Mode: ExclusiveLock , FastPath: false , VirtualTransaction: 3/1372985 , Session_State: idle in transaction + | Username: postgres , Database: db1 , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql + | Xact_Start: 2020-10-19 22:22:40.71729+08 , Query_Start: 2020-10-19 22:22:53.991278+08 , Xact_Elapse: 00:00:57.879438 , Query_Elapse: 00:00:44.60545 + | SQL (Current SQL in Transaction): + | select pg_backend_pid(); + | -------- + | Pid: 31024 + | Lock_Granted: false , Mode: ShareLock , FastPath: false , VirtualTransaction: 6/1244302 , Session_State: active + | Username: postgres , Database: db1 , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql + | Xact_Start: 2020-10-19 22:23:02.130503+08 , Query_Start: 2020-10-19 22:23:24.715491+08 , Xact_Elapse: 00:00:36.466225 , Query_Elapse: 00:00:13.881237+ | SQL (Current SQL in Transaction): + | update t1 set info ='cccccc' where id=2;
4)通过锁冲突视图查询,已经清晰看到每一个发生了锁等待的对象,按锁的大小排序展示出来。只需要找到terminate最大的锁对应的PID即可。
db1=# select pg_terminate_backend(28363); //找到terminate最大的PID并进行kill pg_terminate_backend ---------------------- t (1 row) db1=# \x 1 Expanded display is on. db1=# select * from v_locks_monitor ; //锁冲突恢复 (0 rows)
4、Lock trace,通过审计日志分析事务锁等待情况
1)开启审计日志
log_destination = 'csvlog' logging_collector = on log_truncate_on_rotation = on log_statement = 'all'
2)psql登录到对应数据库中,挂一个打印锁等待的窗口
select * from v_locks_monitor; //锁冲突查询视图 \watch 0.2 //查询间隔
3)tail 挂一个日志观测窗口
for ((i=1;i>0;i=1)); do grep RowExclusiveLock *.csv ; sleep 0.2; done 或 for ((i=1;i>0;i=1)); do grep acquired *.csv ; sleep 0.2; done
4)通过步骤二发现阻塞会话PID,然后根据PID信息在步骤三中的审计日志查看具体的相关操作,对锁阻塞进行具体分析