SQL Server优化入门系列(二)—— 等待事件
在上一篇文章中(SQL Server优化入门系列(一)——快速定位阻塞SQL),我们介绍了如何快速定位SQL Server中当前正在执行的SQL,以及被阻塞的SQL。
这里,我们将介绍如何通过等待事件来优化数据库。
说明
用户的SQL提交给服务器后,以TASK的形式运行。Task运行过程中,可能在多个状态之间转换,可能会等待某些资源(如IO、网络请求、锁、内存分配),通过等待事件,可以分析服务器在不同资源上消耗的时间,有助于了解系统运行状况,进行针对性的优化。
本文将介绍:
如何查看系统中的当前top等待事件
常见等待事件的含义和分析
SQL Server Task状态转换
SQL Server Task运行过程中,会在如下状态之间转换。
Running:运行中
Suspended:等待某个资源(如IO、锁、Latch、内存、网络请求)
等待事物锁
等待Latch
等待IO请求
等待网络请求
等待内存分配
等待checkpoint完成
其他
Runnable: 等等执行(等待CPU)。表现为signal wait比较高。
Pending:等待worker线程
通过等待事件,可以了解SQL Server在不同的状态上消耗的时间。
等待事件分析
SQL Server提供了DMV视图,可用于分析等待事件
sys.dm_os_wait_stats:累计的各等待事件的等待时间。需要注意的是,这个DMV不包含当前处于等待中的事件。只有一个事件完成的时候,才会记录到该DMV。
sys.dm_os_waiting_tasks:当前处于等待状态的事件。
获取累计的等待事件信息
可通过如下SQL获取数据库的等待事件。需要注意的是,这里查到的是实例启动以来的累计值。
SELECT SUM (waiting_tasks_count) AS waiting_tasks_count, SUM (signal_wait_time_ms) AS signal_wait_time_ms, SUM (wait_time_ms) AS wait_time_ms, SUM (raw_wait_time_ms) AS raw_wait_time_ms FROM ( -- global server wait stats (completed waits only) SELECT wait_type, waiting_tasks_count, (wait_time_ms - signal_wait_time_ms) AS wait_time_ms, signal_wait_time_ms, wait_time_ms AS raw_wait_time_ms FROM sys.dm_os_wait_stats WHERE waiting_tasks_count > 0 OR wait_time_ms > 0 OR signal_wait_time_ms > 0 UNION ALL -- threads in an in-progress wait (not yet completed waits) SELECT wait_type, 1 AS waiting_tasks_count, wait_duration_ms AS wait_time_ms, 0 AS signal_wait_time_ms, wait_duration_ms AS raw_wait_time_ms FROM sys.dm_os_waiting_tasks -- Very brief waits quickly will roll into dm_os_wait_stats; we only need to -- query dm_os_waiting_tasks to handle longer-lived waits. WHERE wait_duration_ms > 1000 ) AS merged_wait_stats GROUP BY merged_wait_stats.wait_type;
获取增量的等待事件信息
一般情况下,我们需要查看数据库当前的等待情况,可以通过如下的方式获取:
drop table #wait_stat_start; drop table #wait_stat_end; SELECT wait_type, CURRENT_TIMESTAMP as check_time, SUM (waiting_tasks_count) AS waiting_tasks_count, SUM (signal_wait_time_ms) AS signal_wait_time_ms, SUM (wait_time_ms) AS wait_time_ms, SUM (raw_wait_time_ms) AS raw_wait_time_ms into #wait_stat_start FROM ( -- global server wait stats (completed waits only) SELECT wait_type, waiting_tasks_count, (wait_time_ms - signal_wait_time_ms) AS wait_time_ms, signal_wait_time_ms, wait_time_ms AS raw_wait_time_ms FROM sys.dm_os_wait_stats WHERE waiting_tasks_count > 0 OR wait_time_ms > 0 OR signal_wait_time_ms > 0 UNION ALL -- threads in an in-progress wait (not yet completed waits) SELECT wait_type, 1 AS waiting_tasks_count, wait_duration_ms AS wait_time_ms, 0 AS signal_wait_time_ms, wait_duration_ms AS raw_wait_time_ms FROM sys.dm_os_waiting_tasks -- Very brief waits quickly will roll into dm_os_wait_stats; we only need to -- query dm_os_waiting_tasks to handle longer-lived waits. WHERE wait_duration_ms > 1000 ) AS merged_wait_stats GROUP BY merged_wait_stats.wait_type; waitfor delay '00:00:10'; SELECT wait_type, CURRENT_TIMESTAMP as check_time, SUM (waiting_tasks_count) AS waiting_tasks_count, SUM (signal_wait_time_ms) AS signal_wait_time_ms, SUM (wait_time_ms) AS wait_time_ms, SUM (raw_wait_time_ms) AS raw_wait_time_ms into #wait_stat_end FROM ( -- global server wait stats (completed waits only) SELECT wait_type, waiting_tasks_count, (wait_time_ms - signal_wait_time_ms) AS wait_time_ms, signal_wait_time_ms, wait_time_ms AS raw_wait_time_ms FROM sys.dm_os_wait_stats WHERE waiting_tasks_count > 0 OR wait_time_ms > 0 OR signal_wait_time_ms > 0 UNION ALL -- threads in an in-progress wait (not yet completed waits) SELECT wait_type, 1 AS waiting_tasks_count, wait_duration_ms AS wait_time_ms, 0 AS signal_wait_time_ms, wait_duration_ms AS raw_wait_time_ms FROM sys.dm_os_waiting_tasks -- Very brief waits quickly will roll into dm_os_wait_stats; we only need to -- query dm_os_waiting_tasks to handle longer-lived waits. WHERE wait_duration_ms > 1000 ) AS merged_wait_stats GROUP BY merged_wait_stats.wait_type;
这里的等待事件中,有一部分是后台等待事件,一般不需要关注。需要关注的是前台等待事件。
当然,在实际运维过程中,我们会通过监控系统定时采集这些信息。
常见等待事件
下表列举了比较常见的等待事件,以及相关的优化方法
等待事件 | 描述 | (可能的)优化方法 |
CXPACKET | 出现CXPACKET说明有并行查询。需要同时查看其他top等待事件。 |
|
SOS_SCHEDULER_YIELD | 可能存在CPU资源不足的情况 |
|
THREADPOOL | 等待worker线程 |
|
LCK_* | 事物锁引起 |
|
PAGEIOLATCH_*, IO_COMPLETION, WRITELOG | 等待IO传输 |
|
PAGELATCH_* | Page争用 |
|
LATCH_* | ||
ASYNC_NETWORK_IO | ||
OLEDB |