Oracle等待事件 - xuwang.online

已索引

等待事件是衡量数据库运行状况的重要依据及指标，优化的时候越来越关注等待事件。

Oracle 提供了大量的等待事件用来做故障或者性能分析，V$EVENT_NAME 视图记录了所有的等待事件（包括事件的NAME、PARAMETER[123]、WAIT_CLASS等）。在10g中有800多个等待事件，在11g中有1000多个等待事件。尝试把每个等待事件的来龙去脉都搞清楚是不太可能的，但是处理方法是固定的。

官方文档

Reference（展开即可）
Performance Tuning Guide > Instance Tuning Using Performance Views > Wait Events Statistics > 搜索

等待事件的分类

我们将oracle的等待事件从大的方向上分为两大类：空闲（idle）等待事件和非空闲（non-idle）等待事件。

空闲等待事件例，如session已经建立，但不做任何事情，此时该session就处于SQL*Net message from/to client等待事件状态，等待用户发出命令。
非空闲等待事件，是指会话因为资源不能得到而产生的等待。

Oracle 官方把所有的等待事件进行了分类，可以通过 v$event_name 查询到：

SQL> select WAIT_CLASS,COUNT(*) 
from v$EVENT_NAME 
group by WAIT_CLASS 
order by COUNT(*);

WAIT_CLASS                       COUNT(*)                                       
------------------------------ ----------                                       
Commit                                  4                                       
Queueing                                9                                       
Scheduler                               9                                       
Application                            17                                       
Configuration                          26                                       
Network                                28                                       
System I/O                             35                                       
Concurrency                            38                                       
User I/O                               56                                       
Administrative                         57                                       
Cluster                                64                                       
Idle                                  121                                       
Other                                1186

13 rows selected.

Oracle官方文档《Reference: Oracle Wait Events》中对这些分类做了解释说明：

Administrative
Waits resulting from DBA commands that cause users to wait (for example, an index rebuild)

Application
Waits resulting from user application code (for example, lock waits caused by row level locking or explicit lock commands)

Cluster
Waits related to Real Application Clusters resources (for example, global cache resources such as 'gc cr block busy')

Commit
This wait class only comprises one wait event - wait for redo log write confirmation after a commit (that is, 'log file sync')

Concurrency
Waits for internal database resources (for example, latches)

Configuration
Waits caused by inadequate configuration of database or instance resources (for example, undersized log file sizes, shared pool size)

Idle
Waits that signify the session is inactive, waiting for work (for example, 'SQL*Net message from client')

Network
Waits related to network messaging (for example, 'SQL*Net more data to dblink')

Other
Waits which should not typically occur on a system (for example, 'wait for EMON to spawn')

Queue
Contains events that signify delays in obtaining additional data in a pipelined environment. The time spent on these wait events indicates inefficiency or other problems in the pipeline. It affects features such as Oracle Streams, parallel queries, or DBMS_PIPE PL/SQL packages.

Scheduler
Resource Manager related waits (for example, 'resmgr: become active')

System I/O
Waits for background process I/O (for example, DBWR wait for 'db file parallel write')

User I/O
Waits for user I/O (for example 'db file sequential read')

找出等待事件

我们可以：

通过 AWR 报告对某一段时间的等待事件进行统计分析。
通过 V$SYSTEM_EVENT 视图(实例级别的累积值)对实例整个生命周期的等待事件进行统计分析。
通过 V$SESSION_EVENT 对每个会话在实例生命周期的等待事件进行统计分析（累积值）。
通过 V$SESSION 或者 V$SESSION_WAIT 观察当前所有会话的等待事件情况(正在等待或者刚刚完成等待)。
通过 10046 event 来分析某个会话在执行某条 SQL 时遭遇的等待事件。

当定位到一个等待事件后(知道了事件名及其P1,P2,P3)，我们可以先查询其所属分类，以有一个大致的方向。
再结合官方文档或者v$event_name来查看P1,P2,P3的含义以及等待事件产生的原因和解决办法。

处理等待事件

找出等待事件后，我们就可以在MOS中搜索该等待事件名，找标题有 troubleshooting 或者 resolve/resolving 关键字的文章，查看官方的解释办法。（我们以后文的'buffer busy waits'做一个示例）

可以使用如下语句查看与等待事件相关的会话、SQL、对象。

找出导致某等待事件高的会话

select t1.sid, t2.name, t1.value
  from v$sesstat t1, v$statname t2
  where lower(t2.name) like '%user commits%'
   and t1.statistic# = t2.statistic#
   and t1.value >= 10000
  order by t1.value desc;

v$sesstat存储所有session从login到logout期间的Statistic number(编号)与Statistic value(值)的使用统计。
v$statname 记录了the name of the statistic associated with each statistic number。
注意，Statistics numbers 在各版中并不保证是不变的. 因此，应该使用 statistics name rather than its number.

找出了会话，就可以进一步去分析会话的SQL语句。

找出某段时间导致某等待事件高且执行次数多的SQL语句

select sql_id, count(*) cnt 
from dba_hist_active_sess_history
where snap_id between 172 and 173
and event_id in (1478861578)
group by sql_id
having count(*) > 1000
order by 2;

查询导致等待事件的对象

这儿的方法并不是通用的，只是一种思路。具体还是要根据具体的等待事件来进行分析。

方法一：

col owner for a10;
col object_name for a20;
col object_type for a20;
select owner, object_name, object_type, statistic_name, value
from v$segment_statistics
where statistic_name = 'byffer busy waits' order by value;

方法二：

第1步：根据等待事件的 P1,P2,P3 找到对象特征（如文件号与块号）
第2步：根据对象特征（如文件号与块号）定位对象

找到对象特征（如文件号与块号）：
select inst_id, event, p1 file_number, p2 block_number, wait_time
from gv$session_wait
where event in ('buffer busy global cr','global cache busy','buffer busy global cache');

根据对象特征定位对象：
SELECT owner, segment_name, file_id, block_id starting_block_id, block_id + blocks ending_block_id, blocks
FROM dba_extents
WHERE file_id = &file_num AND ( block_id <= &block_id AND (&block_id < (block_id + blocks)) )
 
 OWNER      SEGMENT_NAME            FILE_ID STARTING_BLOCK_ID ENDING_BLOCK_ID     BLOCKS
---------- -------------------- ---------- ----------------- --------------- ----------
SCOTT      STOCK_PRICES                  4             78385           78393          8

几个典型的等待事件

Idle 类的等待
DB CPU
rdbms ipc message
SQL*Net message from client
SQL*Net more data from client
control file sequential read
db file sequential read
db file scattered read
direct path read
direct path write
log file sync
free buffer waits
buffer busy waits
gc buffer busy
latch free