首页 > 代码库 > 人人都是 DBA(XI)I/O 信息收集脚本汇编

人人都是 DBA(XI)I/O 信息收集脚本汇编

什么?有个 SQL 执行了 8 秒!

哪里出了问题?臣妾不知道啊,得找 DBA 啊。

DBA 人呢?离职了!!擦!!!

程序员在无处寻求帮助时,就得想办法自救,努力让自己变成 "伪 DBA"。

索引

  1. 数据文件和日志文件位置和大小
  2. 查看指定数据库文件的大小和可用空间
  3. 服务器 Disk 容量和挂载信息
  4. 查看 Disk 剩余空间
  5. 查询数据库设置的 Recovery Model
  6. 查看最近的 Full Backup 信息
  7. 获取所有数据库的 VLF 数量
  8. SQL Server 的错误日志位置
  9. 查询近期的 Error Log 信息
  10. 在错误日志中查询 I/O 超过 15s 的请求
  11. 查询 Disk 的性能指标
  12. 查看哪个数据库文件 I/O 瓶颈最严重
  13. 按照 Write I/O 进行排名
  14. 获取数据库的 I/O 使用率
  15. 查看指定数据库文件的 I/O 状况
  16. 找出 I/O 平均使用最多的语句
  17. 查询正在等待 I/O 的请求等待时间

数据文件和日志文件位置和大小

SELECT DB_NAME([database_id]) AS [Database Name]    ,[file_id]    ,[name]    ,physical_name    ,type_desc    ,state_desc    ,is_percent_growth    ,growth    ,CONVERT(BIGINT, growth / 128.0) AS [Growth in MB]    ,CONVERT(BIGINT, size / 128.0) AS [Total Size in MB]FROM sys.master_files WITH (NOLOCK)WHERE [database_id] > 4    AND [database_id] <> 32767    OR [database_id] = 2ORDER BY DB_NAME([database_id])OPTION (RECOMPILE);

通常会查看:

  • 数据文件是否与日志文件放到了不同的磁盘上?
  • 难道都装到了 C 盘上?
  • tempdb 是否指定了独立的磁盘?
  • 有几个 tempdb 文件?
  • 这些 tempdb 都多大了?
  • 数据库是否包含多个文件?
  • 数据文件的增长步长合适吗?

查看指定数据库文件的大小和可用空间

需要指定数据库,或使用 use 指定,例如 use TEST。

SELECT f.[name] AS [File Name]    ,f.physical_name AS [Physical Name]    ,CAST((f.size / 128.0) AS DECIMAL(15, 2)) AS [Total Size in MB]    ,CAST(f.size / 128.0 - CAST(FILEPROPERTY(f.[name], SpaceUsed) AS INT) / 128.0 AS DECIMAL(15, 2)) AS [Available Space In MB]    ,[file_id]    ,fg.[name] AS [Filegroup Name]FROM sys.database_files AS f WITH (NOLOCK)LEFT OUTER JOIN sys.data_spaces AS fg WITH (NOLOCK) ON f.data_space_id = fg.data_space_idOPTION (RECOMPILE);

服务器 Disk 容量和挂载信息

SELECT DISTINCT vs.volume_mount_point    ,vs.file_system_type    ,vs.logical_volume_name    ,CONVERT(DECIMAL(18, 2), vs.total_bytes / 1073741824.0) AS [Total Size (GB)]    ,CONVERT(DECIMAL(18, 2), vs.available_bytes / 1073741824.0) AS [Available Size (GB)]    ,CAST(CAST(vs.available_bytes AS FLOAT) / CAST(vs.total_bytes AS FLOAT) AS DECIMAL(18, 2)) * 100 AS [Space Free %]FROM sys.master_files AS f WITH (NOLOCK)CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.[file_id]) AS vsOPTION (RECOMPILE);

SELECT db_name(vs.database_id) AS DatabaseName    ,vs.file_id    ,vs.volume_mount_point    ,vs.volume_id    ,vs.logical_volume_name    ,vs.file_system_type    ,(vs.total_bytes / 1024 / 1024 / 1024) AS [TotalSize(GB)]    ,(vs.available_bytes / 1024 / 1024 / 1024) AS [AvailableSize(GB)]    ,vs.supports_compression    ,vs.supports_alternate_streams    ,vs.supports_sparse_files    ,vs.is_read_only    ,vs.is_compressedFROM sys.master_files mfCROSS APPLY sys.dm_os_volume_stats(mf.database_id, mf.file_id) vs;

查看 Disk 剩余空间

EXEC master.dbo.xp_fixeddrives

SELECT DISTINCT SUBSTRING(volume_mount_point, 1, 1) AS Volume_mount_point    ,total_bytes / 1024 / 1024 AS Total_MB    ,available_bytes / 1024 / 1024 AS Available_MBFROM sys.master_files AS fCROSS APPLY sys.dm_os_volume_stats(f.database_id, f.file_id);

查询数据库设置的 Recovery Model

SELECT db.[name] AS [Database Name]    ,db.recovery_model_desc AS [Recovery Model]    ,db.state_desc    ,db.log_reuse_wait_desc AS [Log Reuse Wait Description]    ,CONVERT(DECIMAL(18, 2), ls.cntr_value / 1024.0) AS [Log Size (MB)]    ,CONVERT(DECIMAL(18, 2), lu.cntr_value / 1024.0) AS [Log Used (MB)]    ,CAST(CAST(lu.cntr_value AS FLOAT) / CAST(ls.cntr_value AS FLOAT) AS DECIMAL(18, 2)) * 100 AS [Log Used %]    ,db.[compatibility_level] AS [DB Compatibility Level]    ,db.page_verify_option_desc AS [Page Verify Option]    ,db.is_auto_create_stats_on    ,db.is_auto_update_stats_on    ,db.is_auto_update_stats_async_on    ,db.is_parameterization_forced    ,db.snapshot_isolation_state_desc    ,db.is_read_committed_snapshot_on    ,db.is_auto_close_on    ,db.is_auto_shrink_on    ,db.target_recovery_time_in_seconds    ,db.is_cdc_enabledFROM sys.databases AS db WITH (NOLOCK)INNER JOIN sys.dm_os_performance_counters AS lu WITH (NOLOCK) ON db.NAME = lu.instance_nameINNER JOIN sys.dm_os_performance_counters AS ls WITH (NOLOCK) ON db.NAME = ls.instance_nameWHERE lu.counter_name LIKE NLog File(s) Used Size (KB)%    AND ls.counter_name LIKE NLog File(s) Size (KB)%    AND ls.cntr_value > 0OPTION (RECOMPILE);

通常会关注:

  • 数据库实例(Instance)上建立了多少个数据库?
  • 它们都分别使用了什么恢复模型(Recovery Model)?
  • Log 重用是如何设置的?
  • 事务日志(Transaction Log)现在多大了?
  • 兼容等级(Compatibility Level)是怎么配置的?
  • 页验证选项(Page Verify Option)的设置是什么?通常为 CHECKSUM。
  • 是否设置了 Auto Update Statistics Asynchronously 选项?
  • 确保未开启 auto_shrink 和 auto_close 选项。

查看最近的 Full Backup 信息

SELECT TOP (30) bs.machine_name    ,bs.server_name    ,bs.database_name AS [Database Name]    ,bs.recovery_model    ,CONVERT(BIGINT, bs.backup_size / 1048576) AS [Uncompressed Backup Size (MB)]    ,CONVERT(BIGINT, bs.compressed_backup_size / 1048576) AS [Compressed Backup Size (MB)]    ,CONVERT(NUMERIC(20, 2), (CONVERT(FLOAT, bs.backup_size) / CONVERT(FLOAT, bs.compressed_backup_size))) AS [Compression Ratio]    ,DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date) AS [Backup Elapsed Time (sec)]    ,bs.backup_finish_date AS [Backup Finish Date]FROM msdb.dbo.backupset AS bs WITH (NOLOCK)WHERE DATEDIFF(SECOND, bs.backup_start_date, bs.backup_finish_date) > 0    AND bs.backup_size > 0    AND bs.type = D -- Change to L if you want Log backups    AND database_name = DB_NAME(DB_ID())ORDER BY bs.backup_finish_date DESCOPTION (RECOMPILE);

获取所有数据库的 VLF 数量

VLF :Virtual Log File

SQL Server 将日志文件 LDF 划分为多个 VLF 以提高日志处理效率。VLF 的数量和大小不能通过配置指定,SQL Server 会按情况自行判断处理,而如果生成了过多的 VLF 则会产生性能问题。通过指定合理的日志文件初始大小和增长步长,可以有效的防止过多 VLF 的产生。

  • 1M-64M   4
  • 64M-1GB     8
  • >1GB          16
CREATE TABLE #VLFInfo (    RecoveryUnitID INT    ,FileID INT    ,FileSize BIGINT    ,StartOffset BIGINT    ,FSeqNo BIGINT    ,[Status] BIGINT    ,Parity BIGINT    ,CreateLSN NUMERIC(38)    );CREATE TABLE #VLFCountResults (    DatabaseName SYSNAME    ,VLFCount INT    );EXEC sp_MSforeachdb NUse [?];                 INSERT INTO #VLFInfo                 EXEC sp_executesql N‘‘DBCC LOGINFO([?])‘‘;                      INSERT INTO #VLFCountResults                 SELECT DB_NAME(), COUNT(*)                 FROM #VLFInfo;                 TRUNCATE TABLE #VLFInfo;SELECT DatabaseName    ,VLFCountFROM #VLFCountResultsORDER BY VLFCount DESC;DROP TABLE #VLFInfo;DROP TABLE #VLFCountResults;

较高的 VLF 数量会影响写入性能,并且会使数据库的恢复过程变慢,通常需要保持 VLF Counts 在 200 以下。

SQL Server 的错误日志位置

SELECT is_enabled    ,[path]    ,max_size    ,max_filesFROM sys.dm_os_server_diagnostics_log_configurations WITH (NOLOCK)OPTION (RECOMPILE);

查询近期的 Error Log 信息

DECLARE @Time_Start DATETIME;DECLARE @Time_End DATETIME;SET @Time_Start = getdate() - 2;SET @Time_End = getdate();-- Create the temporary tableCREATE TABLE #ErrorLog (    logdate DATETIME    ,processinfo VARCHAR(255)    ,Message VARCHAR(500)    )-- Populate the temporary tableINSERT #ErrorLog (    logdate    ,processinfo    ,Message    )EXEC master.dbo.xp_readerrorlog 0    ,1    ,NULL    ,NULL    ,@Time_Start    ,@Time_End    ,Ndesc;-- Filter the temporary tableSELECT LogDate    ,MessageFROM #ErrorLogWHERE (        Message LIKE %error%        OR Message LIKE %failed%        )    AND processinfo NOT LIKE logonORDER BY logdate DESC-- Drop the temporary table DROP TABLE #ErrorLog

在错误日志中查询 I/O 超过 15s 的请求

CREATE TABLE #IOWarningResults (    LogDate DATETIME    ,ProcessInfo SYSNAME    ,LogText NVARCHAR(1000)    );INSERT INTO #IOWarningResultsEXEC xp_readerrorlog 0    ,1    ,Ntaking longer than 15 seconds;INSERT INTO #IOWarningResultsEXEC xp_readerrorlog 1    ,1    ,Ntaking longer than 15 seconds;INSERT INTO #IOWarningResultsEXEC xp_readerrorlog 2    ,1    ,Ntaking longer than 15 seconds;INSERT INTO #IOWarningResultsEXEC xp_readerrorlog 3    ,1    ,Ntaking longer than 15 seconds;INSERT INTO #IOWarningResultsEXEC xp_readerrorlog 4    ,1    ,Ntaking longer than 15 seconds;SELECT LogDate    ,ProcessInfo    ,LogTextFROM #IOWarningResultsORDER BY LogDate DESC;DROP TABLE #IOWarningResults;

如果能够查询出结果,可以说明 I/O 性能存在问题,但是哪里引起的还需进一步探索。

查询 Disk 的性能指标

SELECT [Drive]    ,CASE         WHEN num_of_reads = 0            THEN 0        ELSE (io_stall_read_ms / num_of_reads)        END AS [Read Latency (ms)]    ,CASE         WHEN io_stall_write_ms = 0            THEN 0        ELSE (io_stall_write_ms / num_of_writes)        END AS [Write Latency (ms)]    ,CASE         WHEN (                num_of_reads = 0                AND num_of_writes = 0                )            THEN 0        ELSE (io_stall / (num_of_reads + num_of_writes))        END AS [Overall Latency (ms)]    ,CASE         WHEN num_of_reads = 0            THEN 0        ELSE (num_of_bytes_read / num_of_reads)        END AS [Avg Bytes/Read]    ,CASE         WHEN io_stall_write_ms = 0            THEN 0        ELSE (num_of_bytes_written / num_of_writes)        END AS [Avg Bytes/Write]    ,CASE         WHEN (                num_of_reads = 0                AND num_of_writes = 0                )            THEN 0        ELSE ((num_of_bytes_read + num_of_bytes_written) / (num_of_reads + num_of_writes))        END AS [Avg Bytes/Transfer]FROM (    SELECT LEFT(UPPER(mf.physical_name), 2) AS Drive        ,SUM(num_of_reads) AS num_of_reads        ,SUM(io_stall_read_ms) AS io_stall_read_ms        ,SUM(num_of_writes) AS num_of_writes        ,SUM(io_stall_write_ms) AS io_stall_write_ms        ,SUM(num_of_bytes_read) AS num_of_bytes_read        ,SUM(num_of_bytes_written) AS num_of_bytes_written        ,SUM(io_stall) AS io_stall    FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs    INNER JOIN sys.master_files AS mf WITH (NOLOCK) ON vfs.database_id = mf.database_id        AND vfs.file_id = mf.file_id    GROUP BY LEFT(UPPER(mf.physical_name), 2)    ) AS tabORDER BY [Overall Latency (ms)]OPTION (RECOMPILE);

通常 Latency 的值大于 20-25 ms 时可考虑有性能问题。

查看哪个数据库文件 I/O 瓶颈最严重

SELECT DB_NAME(fs.database_id) AS [Database Name]    ,CAST(fs.io_stall_read_ms / (1.0 + fs.num_of_reads) AS NUMERIC(10, 1)) AS [avg_read_stall_ms]    ,CAST(fs.io_stall_write_ms / (1.0 + fs.num_of_writes) AS NUMERIC(10, 1)) AS [avg_write_stall_ms]    ,CAST((fs.io_stall_read_ms + fs.io_stall_write_ms) / (1.0 + fs.num_of_reads + fs.num_of_writes) AS NUMERIC(10, 1)) AS [avg_io_stall_ms]    ,CONVERT(DECIMAL(18, 2), mf.size / 128.0) AS [File Size (MB)]    ,mf.physical_name    ,mf.type_desc    ,fs.io_stall_read_ms    ,fs.num_of_reads    ,fs.io_stall_write_ms    ,fs.num_of_writes    ,fs.io_stall_read_ms + fs.io_stall_write_ms AS [io_stalls]    ,fs.num_of_reads + fs.num_of_writes AS [total_io]FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fsINNER JOIN sys.master_files AS mf WITH (NOLOCK) ON fs.database_id = mf.database_id    AND fs.[file_id] = mf.[file_id]ORDER BY avg_io_stall_ms DESCOPTION (RECOMPILE);

考虑将数据库文件移动到不同的磁盘上,或更快的磁盘阵列上以改进性能。

按照 Write I/O 进行排名

SELECT [ReadLatency] = CASE         WHEN [num_of_reads] = 0            THEN 0        ELSE ([io_stall_read_ms] / [num_of_reads])        END    ,[WriteLatency] = CASE         WHEN [num_of_writes] = 0            THEN 0        ELSE ([io_stall_write_ms] / [num_of_writes])        END    ,[Latency] = CASE         WHEN (                [num_of_reads] = 0                AND [num_of_writes] = 0                )            THEN 0        ELSE ([io_stall] / ([num_of_reads] + [num_of_writes]))        END    ,[AvgBytesPerRead] = CASE         WHEN [num_of_reads] = 0            THEN 0        ELSE ([num_of_bytes_read] / [num_of_reads])        END    ,[AvgBytesPerWrite] = CASE         WHEN [num_of_writes] = 0            THEN 0        ELSE ([num_of_bytes_written] / [num_of_writes])        END    ,[AvgBytesPerTransfer] = CASE         WHEN (                [num_of_reads] = 0                AND [num_of_writes] = 0                )            THEN 0        ELSE (([num_of_bytes_read] + [num_of_bytes_written]) / ([num_of_reads] + [num_of_writes]))        END    ,LEFT([mf].[physical_name], 2) AS [Drive]    ,DB_NAME([vfs].[database_id]) AS [DB]    ,[mf].[physical_name]    ,[mf].file_idFROM sys.dm_io_virtual_file_stats(NULL, NULL) AS [vfs]JOIN sys.master_files AS [mf] ON [vfs].[database_id] = [mf].[database_id]    AND [vfs].[file_id] = [mf].[file_id]ORDER BY [WriteLatency] DESC;

获取数据库的 I/O 使用率

WITH Aggregate_IO_StatisticsAS (    SELECT DB_NAME(database_id) AS [Database Name]        ,CAST(SUM(num_of_bytes_read + num_of_bytes_written) / 1048576 AS DECIMAL(12, 2)) AS io_in_mb    FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS [DM_IO_STATS]    GROUP BY database_id    )SELECT ROW_NUMBER() OVER (        ORDER BY io_in_mb DESC        ) AS [I/O Rank]    ,[Database Name]    ,io_in_mb AS [Total I/O (MB)]    ,CAST(io_in_mb / SUM(io_in_mb) OVER () * 100.0 AS DECIMAL(5, 2)) AS [I/O Percent]FROM Aggregate_IO_StatisticsORDER BY [I/O Rank]OPTION (RECOMPILE);

查看指定数据库文件的 I/O 状况

需要指定数据库,或使用 use 指定,例如 use TEST。

SELECT DB_NAME(DB_ID()) AS [Database Name]    ,df.[name] AS [Logical Name]    ,vfs.[file_id]    ,df.physical_name AS [Physical Name]    ,vfs.num_of_reads    ,vfs.num_of_writes    ,vfs.io_stall_read_ms    ,vfs.io_stall_write_ms    ,CAST(100. * vfs.io_stall_read_ms / (vfs.io_stall_read_ms + vfs.io_stall_write_ms) AS DECIMAL(10, 1)) AS [IO Stall Reads Pct]    ,CAST(100. * vfs.io_stall_write_ms / (vfs.io_stall_write_ms + vfs.io_stall_read_ms) AS DECIMAL(10, 1)) AS [IO Stall Writes Pct]    ,(vfs.num_of_reads + vfs.num_of_writes) AS [Writes + Reads]    ,CAST(vfs.num_of_bytes_read / 1048576.0 AS DECIMAL(10, 2)) AS [MB Read]    ,CAST(vfs.num_of_bytes_written / 1048576.0 AS DECIMAL(10, 2)) AS [MB Written]    ,CAST(100. * vfs.num_of_reads / (vfs.num_of_reads + vfs.num_of_writes) AS DECIMAL(10, 1)) AS [# Reads Pct]    ,CAST(100. * vfs.num_of_writes / (vfs.num_of_reads + vfs.num_of_writes) AS DECIMAL(10, 1)) AS [# Write Pct]    ,CAST(100. * vfs.num_of_bytes_read / (vfs.num_of_bytes_read + vfs.num_of_bytes_written) AS DECIMAL(10, 1)) AS [Read Bytes Pct]    ,CAST(100. * vfs.num_of_bytes_written / (vfs.num_of_bytes_read + vfs.num_of_bytes_written) AS DECIMAL(10, 1)) AS [Written Bytes Pct]FROM sys.dm_io_virtual_file_stats(DB_ID(), NULL) AS vfsINNER JOIN sys.database_files AS df WITH (NOLOCK) ON vfs.[file_id] = df.[file_id]OPTION (RECOMPILE);

协助从 I/O 角度来观察数据库文件所承载的压力。

找出 I/O 平均使用最多的语句

SELECT TOP (50) OBJECT_NAME(qt.objectid, dbid) AS [SP Name]    ,(qs.total_logical_reads + qs.total_logical_writes) / qs.execution_count AS [Avg IO]    ,qs.execution_count AS [Execution Count]    ,SUBSTRING(qt.[text], qs.statement_start_offset / 2 + 1, (            CASE                 WHEN qs.statement_end_offset = - 1                    THEN LEN(CONVERT(NVARCHAR(max), qt.[text])) * 2                ELSE qs.statement_end_offset                END - qs.statement_start_offset            ) / 2) AS [Query Text]FROM sys.dm_exec_query_stats AS qs WITH (NOLOCK)CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qtWHERE qt.[dbid] = DB_ID()ORDER BY [Avg IO] DESCOPTION (RECOMPILE);

参考资料:

  • How To: Troubleshooting SQL Server I/O bottlenecks

查询正在等待 I/O 的请求等待时间

SELECT DB_NAME(database_id) AS [DBNAME]    ,file_id    ,io_stall    ,io_pending_ms_ticks    ,scheduler_addressFROM sys.dm_io_virtual_file_stats(NULL, NULL) iovfs    ,sys.dm_io_pending_io_requests AS iopiorWHERE iovfs.file_handle = iopior.io_handle

 

《人人都是 DBA》系列文章索引:

 序号 

 名称 

1

 人人都是 DBA(I)SQL Server 体系结构

2

 人人都是 DBA(II)SQL Server 元数据

3

 人人都是 DBA(III)SQL Server 调度器

4

 人人都是 DBA(IV)SQL Server 内存管理

5

 人人都是 DBA(V)SQL Server 数据库文件

6

 人人都是 DBA(VI)SQL Server 事务日志

7

 人人都是 DBA(VII)B 树和 B+ 树

8

 人人都是 DBA(VIII)SQL Server 页存储结构

9

 人人都是 DBA(IX)服务器信息收集脚本汇编

10

 人人都是 DBA(X)资源信息收集脚本汇编

11

 人人都是 DBA(XI)I/O 信息收集脚本汇编

12

 人人都是 DBA(XII)查询信息收集脚本汇编

13

 人人都是 DBA(XIII)索引信息收集脚本汇编

14

 人人都是 DBA(XIV)存储过程信息收集脚本汇编 

15

 人人都是 DBA(XV)锁信息收集脚本汇编

本系列文章《人人都是 DBA》由 Dennis Gao 发表自博客园,未经作者本人同意禁止任何形式的转载,任何自动或人为的爬虫转载行为均为耍流氓。

人人都是 DBA(XI)I/O 信息收集脚本汇编