Ask Your Question

Revision history [back]

Instances reboot and go to kernel panic

On Icehouse, I have an environment where instances randomly reboot and go in kernel panic.

Nova correctly spawns instances from qcow2 images (Linux 3.0.75 x86_64) but, with a non predictable behaviour, libvirtd gets a signal 15 from time to time and restart them.

nova.conf:

virt_type=kvm

/var/log/libvirt/qemu/instance-<id>.log:

qemu: terminating on signal 15 from pid 38461
2014-10-27 15:05:45.524+0000: shutting down
2014-10-27 15:06:15.337+0000: starting up

No other relevant information from libvirt logs.

There's no external API intervention nor human one.

I inspected and I couldn't find relevant information of crashes or problems in any of the Openstack programs logs.

Do you have suggestions on how to further troubleshoot such an issue?

/var/log/kern.log of a crashed instance:
Oct 27 13:08:46 lc-20 kernel: ctx4008000f: no IPv6 routers present
Oct 27 13:10:03 lc-20 kernel: tipc: Resetting link <1.1.20:ethSw0-1.1.10:ethSw0>, peer not responding
Oct 27 13:10:03 lc-20 kernel: tipc: Lost link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:10:03 lc-20 kernel: tipc: Lost contact with <1.1.10>
Oct 27 13:10:14 lc-20 kernel: tipc: Established link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:11:48 lc-20 kernel: tipc: Resetting link <1.1.20:ethSw0-1.1.10:ethSw0>, peer not responding
Oct 27 13:11:48 lc-20 kernel: tipc: Lost link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:11:48 lc-20 kernel: tipc: Lost contact with <1.1.10>
Oct 27 13:11:58 lc-20 kernel: tipc: Established link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:13:56 lc-20 kernel: tipc: Resetting link <1.1.20:ethSw0-1.1.10:ethSw0>, peer not responding
Oct 27 13:13:56 lc-20 kernel: tipc: Lost link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:13:56 lc-20 kernel: tipc: Lost contact with <1.1.10>
Oct 27 13:14:06 lc-20 kernel: tipc: Established link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:17:26 lc-20 kernel: tipc: Resetting link <1.1.20:ethSw0-1.1.11:ethSw0>, peer not responding
Oct 27 13:17:26 lc-20 kernel: tipc: Lost link <1.1.20:ethSw0-1.1.11:ethSw0> on network plane A
Oct 27 13:17:26 lc-20 kernel: tipc: Lost contact with <1.1.11>
Oct 27 13:17:36 lc-20 kernel: tipc: Established link <1.1.20:ethSw0-1.1.11:ethSw0> on network plane A
Oct 27 14:00:14 lc-20 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Oct 27 14:00:14 lc-20 kernel: Initializing cgroup subsys cpuset
Oct 27 14:00:14 lc-20 kernel: Linux version 3.0.75-1263-g337a2d1 (gcc version 4.3.2) #2 SMP PREEMPT Wed Oct 1 16:55:16 PDT 2014
Oct 27 14:00:14 lc-20 kernel: Command line: early_printk=serial console=hvc0 console=tty0 console=ttyS0,115200n8divider=10 e1000.disable_vlan_offload=1 sim_mode=0 initrd /boot/initramfs.cpio
Oct 27 14:00:14 lc-20 kernel: BIOS-provided physical RAM map:
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 0000000000100000 - 00000000bfffe000 (usable)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 00000000bfffe000 - 00000000c0000000 (reserved)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
Oct 27 14:00:14 lc-20 kernel: NX (Execute Disable) protection: active
Oct 27 14:00:14 lc-20 kernel: SMBIOS 2.4 present.
Oct 27 14:00:14 lc-20 kernel: DMI: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
Oct 27 14:00:14 lc-20 kernel: e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
Oct 27 14:00:14 lc-20 kernel: e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
Oct 27 14:00:14 lc-20 kernel: No AGP bridge found
Oct 27 14:00:14 lc-20 kernel: last_pfn = 0x140000 max_arch_pfn = 0x400000000
Oct 27 14:00:14 lc-20 kernel: last_pfn = 0xbfffe max_arch_pfn = 0x400000000
Oct 27 14:00:14 lc-20 kernel: found SMP MP-table at [ffff8800000f0980] f0980
Oct 27 14:00:14 lc-20 kernel: initial memory mapped : 0 - 20000000
Oct 27 14:00:14 lc-20 kernel: Base memory trampoline at [ffff88000009a000] 9a000 size 20480
Oct 27 14:00:14 lc-20 kernel: init_memory_mapping: 0000000000000000-00000000bfffe000
Oct 27 14:00:14 lc-20 kernel:  0000000000 - 00bfe00000 page 2M
Oct 27 14:00:14 lc-20 kernel:  00bfe00000 - 00bfffe000 page 4k
Oct 27 14:00:14 lc-20 kernel: kernel direct mapping tables up to 0xbfffdfff @ [mem 0x1fffb000-0x1fffffff]

Instances reboot and go to kernel panic

On Icehouse, I have an environment where instances randomly reboot and go in kernel panic.

Nova correctly spawns instances from qcow2 images (Linux 3.0.75 x86_64) but, with a non predictable behaviour, libvirtd gets a signal 15 from time to time and restart restarts them.

nova.conf:

virt_type=kvm

/var/log/libvirt/qemu/instance-<id>.log:

qemu: terminating on signal 15 from pid 38461
2014-10-27 15:05:45.524+0000: shutting down
2014-10-27 15:06:15.337+0000: starting up

No other relevant information from libvirt logs.

There's no external API intervention nor human one.

I inspected and I couldn't find relevant information of crashes or problems in any of the Openstack programs logs.

Do you have suggestions on how to further troubleshoot such an issue?

/var/log/kern.log of a crashed instance:
Oct 27 13:08:46 lc-20 kernel: ctx4008000f: no IPv6 routers present
Oct 27 13:10:03 lc-20 kernel: tipc: Resetting link <1.1.20:ethSw0-1.1.10:ethSw0>, peer not responding
Oct 27 13:10:03 lc-20 kernel: tipc: Lost link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:10:03 lc-20 kernel: tipc: Lost contact with <1.1.10>
Oct 27 13:10:14 lc-20 kernel: tipc: Established link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:11:48 lc-20 kernel: tipc: Resetting link <1.1.20:ethSw0-1.1.10:ethSw0>, peer not responding
Oct 27 13:11:48 lc-20 kernel: tipc: Lost link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:11:48 lc-20 kernel: tipc: Lost contact with <1.1.10>
Oct 27 13:11:58 lc-20 kernel: tipc: Established link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:13:56 lc-20 kernel: tipc: Resetting link <1.1.20:ethSw0-1.1.10:ethSw0>, peer not responding
Oct 27 13:13:56 lc-20 kernel: tipc: Lost link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:13:56 lc-20 kernel: tipc: Lost contact with <1.1.10>
Oct 27 13:14:06 lc-20 kernel: tipc: Established link <1.1.20:ethSw0-1.1.10:ethSw0> on network plane A
Oct 27 13:17:26 lc-20 kernel: tipc: Resetting link <1.1.20:ethSw0-1.1.11:ethSw0>, peer not responding
Oct 27 13:17:26 lc-20 kernel: tipc: Lost link <1.1.20:ethSw0-1.1.11:ethSw0> on network plane A
Oct 27 13:17:26 lc-20 kernel: tipc: Lost contact with <1.1.11>
Oct 27 13:17:36 lc-20 kernel: tipc: Established link <1.1.20:ethSw0-1.1.11:ethSw0> on network plane A
Oct 27 14:00:14 lc-20 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Oct 27 14:00:14 lc-20 kernel: Initializing cgroup subsys cpuset
Oct 27 14:00:14 lc-20 kernel: Linux version 3.0.75-1263-g337a2d1 (gcc version 4.3.2) #2 SMP PREEMPT Wed Oct 1 16:55:16 PDT 2014
Oct 27 14:00:14 lc-20 kernel: Command line: early_printk=serial console=hvc0 console=tty0 console=ttyS0,115200n8divider=10 e1000.disable_vlan_offload=1 sim_mode=0 initrd /boot/initramfs.cpio
Oct 27 14:00:14 lc-20 kernel: BIOS-provided physical RAM map:
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 0000000000100000 - 00000000bfffe000 (usable)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 00000000bfffe000 - 00000000c0000000 (reserved)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
Oct 27 14:00:14 lc-20 kernel:  BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
Oct 27 14:00:14 lc-20 kernel: NX (Execute Disable) protection: active
Oct 27 14:00:14 lc-20 kernel: SMBIOS 2.4 present.
Oct 27 14:00:14 lc-20 kernel: DMI: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
Oct 27 14:00:14 lc-20 kernel: e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
Oct 27 14:00:14 lc-20 kernel: e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
Oct 27 14:00:14 lc-20 kernel: No AGP bridge found
Oct 27 14:00:14 lc-20 kernel: last_pfn = 0x140000 max_arch_pfn = 0x400000000
Oct 27 14:00:14 lc-20 kernel: last_pfn = 0xbfffe max_arch_pfn = 0x400000000
Oct 27 14:00:14 lc-20 kernel: found SMP MP-table at [ffff8800000f0980] f0980
Oct 27 14:00:14 lc-20 kernel: initial memory mapped : 0 - 20000000
Oct 27 14:00:14 lc-20 kernel: Base memory trampoline at [ffff88000009a000] 9a000 size 20480
Oct 27 14:00:14 lc-20 kernel: init_memory_mapping: 0000000000000000-00000000bfffe000
Oct 27 14:00:14 lc-20 kernel:  0000000000 - 00bfe00000 page 2M
Oct 27 14:00:14 lc-20 kernel:  00bfe00000 - 00bfffe000 page 4k
Oct 27 14:00:14 lc-20 kernel: kernel direct mapping tables up to 0xbfffdfff @ [mem 0x1fffb000-0x1fffffff]