Configure kdump with SSH
Test System
Section titled “Test System”NAME="Rocky Linux"VERSION="9.4 (Blue Onyx)"ID="rocky"ID_LIKE="rhel centos fedora"VERSION_ID="9.4"PLATFORM_ID="platform:el9"PRETTY_NAME="Rocky Linux 9.4 (Blue Onyx)"ANSI_COLOR="0;32"LOGO="fedora-logo-icon"CPE_NAME="cpe:/o:rocky:rocky:9::baseos"HOME_URL="https://rockylinux.org/"BUG_REPORT_URL="https://bugs.rockylinux.org/"SUPPORT_END="2032-05-31"ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"ROCKY_SUPPORT_PRODUCT_VERSION="9.4"REDHAT_SUPPORT_PRODUCT="Rocky Linux"REDHAT_SUPPORT_PRODUCT_VERSION="9.4"Rocky Linux release 9.4 (Blue Onyx)Rocky Linux release 9.4 (Blue Onyx)Rocky Linux release 9.4 (Blue Onyx)Setup for SSH
Section titled “Setup for SSH”On the Crashing System
Section titled “On the Crashing System”- Install kdump
dnf install -y kexec-tools- To be sure you are at the default config run
kdumpctl reset-crashkernel --kernel=ALL - Configure kdump with
/vim /etc/kdump.confand add the below:
# Specify the path where the vmcore should be saved on the remote machinepath /var/crash
# Specify the SSH targetssh root@172.16.192.128
# Specify the SSH key (optional, if not using the default key)sshkey /root/.ssh/id_rsa
# Core collector to capture the dump, with the -F optioncore_collector makedumpfile -F -l --message-level 7 -d 31- Make sure the grub config is set up correctly in
/etc/default/grub. You should see something like the below.
GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M rd.lvm.lv=rl/root"What this means
-
crashkernel: This parameter reserves memory for the crash kernel, which is used by kdump to capture memory dumps in case of a crash.
-
1G-4G:192M: For systems with total RAM between 1 GB and 4 GB, 192 MB is reserved for the crash kernel.
-
4G-64G:256M: For systems with total RAM between 4 GB and 64 GB, 256 MB is reserved for the crash kernel.
-
64G-:512M: For systems with more than 64 GB of RAM, 512 MB is reserved for the crash kernel.
-
You can make sure kdump is running with
systemctl status kdump -
Next we need to make sure SSH is setup. Do the following on the machine you are debugging:
ssh-keygen -t rsa -b 2048ssh-copy-id root@172.16.192.128 # Update this with your informationssh root@172.16.192.128 # Run a quick test to make sure it workedTest kdump
Section titled “Test kdump”I triggered a kernel dump with echo c > /proc/sysrq-trigger
Interpreting the Files
Section titled “Interpreting the Files”- Install
crashwithdnf install -y crash - The dumps show up on the remote host:
[root@patches 172.16.192.129-2024-07-23-15:13:11]# lsdownload.sh kernel-debuginfo-5.14.0-427.24.1.el9_4.x86_64.rpm kexec-dmesg.log vmcore-dmesg.txt vmcore.flat- What did suck a bit is that Rocky appears to have a bug in their build system where
kernel-debuginfo-commonis missing from their build platform so I haven’t had the chance to go through the dumps. See this bug.- Also unfortunately, the fix the dev in that post mentioned, doesn’t work; I tried it. Even manually searching the repository I couldn’t find the right package so I’ll need to test on RHEL or something.