Understand, Learn and Implement.. LINUX / UNIX DEVOPS CONTAINERS OPENSHIFT KUBERNETES DEVOPS TOOLS VIRTUALIZATION STORAGE BACKUP SCRIPTS MONITORING MIDDLEWARE

Wednesday, August 13, 2014

High I/OWait utilization



The first indication of a high IO wait issue is normally system load average. The load average is computed based on CPU utilization, and includes the number of processes using or waiting to use the CPU, and, importantly on Linux, process that are in uninterruptible sleep. The load average can be interpreted on a basic level as being a CPU core at full utilization has a system load average of one. So, for a quad-core machine, a system load average of 4 would mean that the machine had adequate resources to handle the work it needed to do, but just barely. On the same quad-core system, a load average of eight would mean that if the server had eight cores instead of four, it would have been able to handle the work, but it is now overloaded.

If the system is showing high load average, but the CPU system and user utilization is low, it is time to start looking at IO wait. IO wait shows up in system load on Linux because one or more of the cores is busy waiting on something having to do with either disk, or network input or output to finish before it can continue. These processes can be found using ps aux and looking for processes with a state of “D”.

Finding processes waiting for IO is one thing, verifying that they are the cause of high IO wait is a separate task. For this, start out with iostat -x 1 which will show your physical storage devices and which ones are being used.

[username@server~]$ iostat -x 1

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

cciss/c0d0 0.08 5.94 1.28 2.75 17.34 69.52 21.60 0.11 26.82 4.12 1.66
cciss/c0d0p1 0.00 0.00 0.00 0.00 0.00 0.00 5.30 0.00 8.76 5.98 0.00
cciss/c0d0p2 0.00 0.00 0.00 0.00 0.00 0.00 58.45 0.00 7.79 3.21 0.00
cciss/c0d0p3 0.08 5.94 1.28 2.75 17.34 69.52 21.60 0.11 26.82 4.12 1.66


From the output above, it is clear that the device /dev/cciss/c0d0p3 is waiting for far too long. However, we do not have that particular device mounted, it is actually an underlying device for LVM. If you are using LVM for your disk, and you should, the output of iostat might be a bit confusing. LVM uses the device mapper subsystem to map the filesystems that are mounted back to the physical devices, so iostat might show several device paths like /dev/dm–0 and /dev/dm–1. Since the output of df -h does not show the device mapper paths, and instead prints the LVM path, the easiest way to get the lvm devices is to add the -N flag to iostat.

[username@server~]$ iostat -xN 1

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
vg1-root 0.00 0.00 0.09 3.01 0.85 24.08 8.05 0.08 24.69 1.79 0.55
vg1-home 0.00 0.00 0.05 1.46 0.97 11.69 8.36 0.03 19.89 3.76 0.57
vg1-opt 0.00 0.00 0.03 1.56 0.46 12.48 8.12 0.05 29.89 3.53 0.56
vg1-tmp 0.00 0.00 0.00 0.06 0.00 0.45 8.00 0.00 24.85 4.90 0.03
vg1-usr 0.00 0.00 0.63 1.41 5.85 11.28 8.38 0.07 32.48 3.11 0.63
vg1-var 0.00 0.00 0.55 1.19 9.21 9.54 10.74 0.04 24.10 4.24 0.74
vg1-swaplv 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 3.98 1.88 0.00


Parts of the output from the iostat commands above was clipped for brevity. Each of the filesystems listed shows an unacceptable level of IO wait, seen in the tenth column labeled “await”. The /usr filesystem shows a bit higher wait time than the others, so lets start there. Running the command fuser -vm /opt will show us a list of processes accessing the filesystem, and the process owner.

[root@server:/root > fuser -vm /opt

USER PID ACCESS COMMAND
/opt: db2fenc1 1067 ....m db2fmp
db2fenc1 1071 ....m db2fmp
db2fenc1 2560 ....m db2fmp
db2fenc1 5221 ....m db2fmp

0 comments:

Post a Comment