6.3. What's Missing?
All the disk I/O tools on Linux provide information about the utilization of a particular disk or partition. Unfortunately, after you determine that a particular disk is a bottleneck, there are no tools that enable you to figure out which process is causing all the I/O traffic.
Usually a system administrator has a good idea about what application uses the disk, but not always. Many times, for example, I have been using my Linux system when the disks started grinding for apparently no reason. I can usually run top and look for a process that might be causing the problem. By eliminating processes that I believe are not doing I/O, I can usually find the culprit. However, this requires knowledge of what the various applications are supposed to do. It is also error prone, because the guess about which processes are not causing the problem might be wrong. In addition, for a system with many users or many running applications, it is not always practical or easy to determine which application might be causing the problem. Other UNIXes support the inblk and oublk parameters to ps, which show you the amount of disk I/O issued on behalf of a particular process. Currently, the Linux kernel does not track the I/O of a process, so the ps tool has no way to gather this information.
You can use lsof to determine which processes are accessing files on a particular partition. After you list all PIDs accessing the files, you can then attach to each of the PIDs with strace and figure out which one is doing a significant amount of I/O. Although this method works, it is really a Band-Aid solution, because the number of processes accessing a partition could be large and it is time-consuming to attach and analyze the system calls of each process. This may also miss short-lived processes, and may unacceptably slow down processes when they are being traced.
This is an area where the Linux kernel could be improved. The ability to quickly track which processes are generating I/O would allow for much quicker diagnosis of I/O performance-related problems.
|