A kernel panic is an action taken by an operating system upon detecting an
internal fatal error from which it cannot safely recover.
kdump is the Linux Kernel's built-in crash dump mechanism. In the
event of a kernel crash, kdump creates a memory image (also known as vmcore) that
can be analyzed for the purposes of debugging and determining the
cause of a crash.
General causes of
kernel panics
Kernel panics are often
caused by one or more of the following issues.
· Hard disk
corruption, including bad sectors, directory corruption, and other hard-disk
ills.
· Incorrect
permissions on System-related files or folders.
· Insufficient
RAM and available hard disk space.
· Improperly
installed hardware or software.
· Defective
hardware or software. Hardware failures, including a defective CPU, or
programming errors can result in kernel panics.
· Incompatible
hardware. While rare, this is generally the result of a third-party hardware
vendor’s product failing to properly respond to the kernel or a kernel
extension in an expected way.
[root@vinodhkumar ~]#
uname -r
2.6.32-279.19.1.el6.x86_64
[root@vinodhkumar ~]#
cat /etc/redhat-release
CentOS release 6.3 (Final)
First let's install all the necessary packages:
[root@vinodhkumar
~]# yum -y install kexec-tools
[root@vinodhkumar ~]# yum -y
install kernel-debug
[root@vinodhkumar ~]# yum -y
install kernel-debug-devel
[root@vinodhkumar ~]# wget
http://debuginfo.centos.org/6/x86_64/kernel-debuginfo-2.6.32-279.19.1.el6.x86_64.rpm
[root@vinodhkumar ~]# wget
http://debuginfo.centos.org/6/x86_64/kernel-debuginfo-common-x86_64-2.6.32-279.19.1.el6.x86_64.rpm
[root@vinodhkumar ~]# wget
http://debuginfo.centos.org/6/x86_64/kernel-debug-debuginfo-2.6.32-279.19.1.el6.x86_64.rpm
[root@vinodhkumar ~]# rpm
-Uvh kernel-debuginfo-common-x86_64-2.6.32-279.19.1.el6.x86_64.rpm
[root@vinodhkumar ~]# rpm
-Uvh kernel-debuginfo-2.6.32-279.19.1.el6.x86_64.rpm
[root@vinodhkumar ~]# rpm
-Uvh kernel-debug-debuginfo-2.6.32-279.19.1.el6.x86_64.rpm
The debuginfo packages provide the uncompressed vmlinux kernel that the
crash utility depends on. It should be present at
/usr/lib/debug/lib/modules/2.6.32-279.19.1.el6.x86_64/vmlinux.
Next make sure that the kernel is compiled with support for saving crash
dumps etc:
[root@vinodhkumar
~]# egrep
'KEXEC|CRASH|CAL_S' /boot/config-2.6.32-279.el6.x86_64
CONFIG_KEXEC=y
CONFIG_KEXEC_AUTO_RESERVE=y
CONFIG_CRASH_DUMP=y
CONFIG_KEXEC_JUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_CRASH=y
This is already the case with CentOS 6.3, but if the modules are not
enabled you'll have to get the kernel source and enable the options as shown
above.
Another prerequisite for the crash kernel to be loaded at the time of
the problem is to reserve some memory area for it. This needs to be configured
in the original kernel's bootloader and can be done with the bootloader option
crashkernel=XM[ @YM], where X is the size of the reserved area in megabytes and
Y specifies the start address. If you omit the second parameter or set it to
0M, the system automatically selects an appropriate location. This is already
done on CentOS and to verify it run:
[root@vinodhkumar
~]# grep -i crash /proc/iomem
03000000-0b0fffff : Crash kernel
If this is not the case make sure you append crashkernel=auto (or
specify memory as suggested above) in the kernel parameters string in grub.conf
and reboot:
[root@vinodhkumar ~]# cat /etc/grub.conf
[truncated]
kernel
/vmlinuz-2.6.32-279.19.1.el6.x86_64.debug ro root=/dev/mapper/vg_testbed-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_testbed/lv_swap KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rd_LVM_LV=vg_testbed/lv_root
rhgb quiet
[truncated]
Run this to verify that the kernel has reserved memory for the crash
kernel (note the crashkernel=129M@0M parameter):
[root@vinodhkumar ~]# grep crash /proc/cmdline
ro root=/dev/mapper/vg_testbed-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=129M@0M rd_LVM_LV=vg_testbed/lv_swap KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rd_LVM_LV=vg_testbed/lv_root rhgb quiet
Start kdump and make it persist reboots:
[root@vinodhkumar ~]# /etc/init.d/kdump start
[root@vinodhkumar ~]# chkconfig kdump on
Kdump has many usefull configuration options allowing the dump to be
copied using NFS or ssh to a remote host etc, but for this example the defaults
are sufficient. To see the configs cat the following files:
[root@vinodhkumar ~]# cat /etc/kdump.conf
[root@vinodhkumar ~]# cat /etc/sysconfig/kdump
Time to install the crash system utility for diagnosing the core dumps:
[root@vinodhkumar ~]# yum install -y crash
Now that all the prerequisites have been installed and the kdump daemon
is running with the crash kernel resident in memory, let's crash the production
kernel artificially by using the Magic SysRequest Keys.
[root@vinodhkumar ~]# echo 1 > /proc/sys/kernel/sysrq
[root@vinodhkumar ~]# echo c > /proc/sysrq-trigger
Line 1 enables the Magic SysRequest Keys and line 2 triggers kernel
panic.
Now observe the console and you should see the crash kernel dumping the
memory of the production kernel and creating the vmcore file on the specified
file system (by default the file will go in a sub-directory in /var/crash/).
To analyse what caused the crash we can run the crash utility like so:
[root@vinodhkumar ~]# crash /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6.x86_64/vmlinux vmcore
crash 6.0.4-2.el6
KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.19.1.el6.x86_64/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 2
DATE: Tue Jan 29 06:12:56 2013
UPTIME: 00:15:58
LOAD AVERAGE: 0.01, 0.13, 0.08
TASKS: 111
NODENAME: testbed
RELEASE: 2.6.32-279.19.1.el6.x86_64
VERSION: #1 SMP Wed Dec 19 07:05:20 UTC 2012
MACHINE: x86_64 (2128 Mhz)
MEMORY: 2 GB
PANIC: "Oops: 0002 [#1] SMP " (check log for details)
PID: 1295
COMMAND: "bash"
TASK: ffff8800376c3540 [THREAD_INFO: ffff88007c820000]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash>
The three interesting lines are line 15, 16 and 17. The PANIC line shows
the reason for the panic, the PID of the process that caused the panic and the
COMMAND that was running.
We can see a backtrace and the processes by running bt and ps
respectively:
crash> bt
PID: 1295 TASK: ffff8800376c3540 CPU: 1 COMMAND: "bash"
#0 [ffff88007c8219e0] machine_kexec at ffffffff81031f7b
#1 [ffff88007c821a40] crash_kexec at ffffffff810b8c22
#2 [ffff88007c821b10] oops_end at ffffffff814ed6b0
#3 [ffff88007c821b40] no_context at ffffffff81042a0b
#4 [ffff88007c821b90] __bad_area_nosemaphore at ffffffff81042c95
#5 [ffff88007c821be0] bad_area at ffffffff81042dbe
#6 [ffff88007c821c10] __do_page_fault at ffffffff81043570
#7 [ffff88007c821d30] do_page_fault at ffffffff814ef68e
#8 [ffff88007c821d60] page_fault at ffffffff814eca45
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff81321a66 RSP: ffff88007c821e18 RFLAGS: 00010096
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 000000000000abf5
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: ffff88007c821e18 R8: 0000000000000000 R9: 0000000000000004
R10: ffffffff8163cac0 R11: ffff8800373d269f R12: 0000000000000000
R13: ffffffff81afb7a0 R14: 0000000000000286 R15: 0000000000000004
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff88007c821e20] __handle_sysrq at ffffffff81321d22
#10 [ffff88007c821e70] write_sysrq_trigger at ffffffff81321dde
#11 [ffff88007c821ea0] proc_reg_write at ffffffff811db48e
#12 [ffff88007c821ef0] vfs_write at ffffffff81176588
#13 [ffff88007c821f30] sys_write at ffffffff81176e81
#14 [ffff88007c821f80] system_call_fastpath at ffffffff8100b072
RIP: 00007f61ecd44e60 RSP: 00007fff8c5f04f0 RFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffffff8100b072 RCX: 0000000000000063
RDX: 0000000000000002 RSI: 00007f61ed63e000 RDI: 0000000000000001
RBP: 00007f61ed63e000 R8: 000000000000000a R9: 00007f61ed63a700
R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000002
R13: 00007f61ecff7780 R14: 0000000000000002 R15: 00007f61ecff7780
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash>
As you can see this tells us that the kernel panic was caused by the
bash process with id of 1295, executing the sysrq call.
0 comments:
Post a Comment