Ask Your Question
0

ssh to a VM causes kernel panic on Icehouse Neutron host

asked 2014-05-08 11:58:42 -0500

Napo Mokoetle gravatar image

updated 2014-05-08 12:21:53 -0500

darragh-oreilly gravatar image

Hi Everyone,

I'm running Openstack ICEHOUSE on Ubuntu Trusty in lab. The solution consists of Keystone, Glance, Neutron and 3 Compute Nodes all running on HP ProLiant(DL360 G5, BIOS P58 08/03/2008) Openstack ICEHOUSE components installed: Controller, Keystone, Glance, Neutron, Compute Keystone and Glance on same host Neutron is on its own host 3 Compute Nodes on their own hosts

Everything seems to be all good after the installation as I can instantiate VMs and they get an internal ip without a problem, I'm able to associate external IPs to the VMs without a problem. I can ping or telnet successfully to and from the VMs. From one VM instance to the other SSH works just fine.

The problem starts when I try to SSH from at external machine to an Openstack hosted VM, or from an Openstack VM to an external machine as the Neutron server gets a Kernal Panic. I've sorts of things that are getting me nowhere fast including upgrading the Kernel. Has any one dealt successfully with a problem similar or does any have an idea I can try out to resolve the problem? I've pasted the syslog output at the time of the Kernel crash from Neutron below

Start Of syslog Trace ============================+
ig-file=/etc/neutron/dhcp_agent.ini >/dev/null 2>&1; fi)
May  8 18:00:01 ts036945 CRON[3449]: (neutron) CMD (if [ -x /usr/bin/neutron-netns-cleanup ] ; then /usr/bin/neutron-netns-cleanup --config-file=/etc/neutron/neutron.conf --config-file=/etc/neutron/l3_agent.ini >/dev/null 2>&1; fi)
May  8 18:02:07 ts036945 kernel: [55501.391556] ------------[ cut here ]------------
May  8 18:02:07 ts036945 kernel: [55501.391643] kernel BUG at /build/buildd/linux-3.13.0/net/core/skbuff.c:2903!
May  8 18:02:07 ts036945 kernel: [55501.391755] invalid opcode: 0000 [#1] SMP
May  8 18:02:07 ts036945 kernel: [55501.391828] Modules linked in: xt_nat xt_conntrack xt_REDIRECT xt_tcpudp ip6table_filter ip6_tables iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables openvswitch gre vxlan ip_tunnel libcrc32c radeon ttm drm_kms_helper drm gpio_ich serio_raw lpc_ich hpwdt i2c_algo_bit coretemp kvm_intel kvm hpilo i5000_edac edac_core i5k_amb ipmi_si shpchp mac_hid lp parport hpsa hid_generic usbhid hid bnx2 cciss
May  8 18:02:07 ts036945 kernel: [55501.393060] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.13.0-24-generic #47-Ubuntu
May  8 18:02:07 ts036945 kernel: [55501.393175] Hardware name: HP ProLiant DL360 G5, BIOS P58 08/03/2008
May  8 18:02:07 ts036945 kernel: [55501.393277] task: ffff8802245cc7d0 ti: ffff8802245d4000 task.ti: ffff8802245d4000
May  8 18:02:07 ts036945 kernel: [55501.393389] RIP: 0010:[<ffffffff8160e9ba>]  [<ffffffff8160e9ba>] skb_segment+0x95a/0x980
May  8 18:02:07 ts036945 kernel: [55501.393531] RSP: 0018:ffff88022fac34f8  EFLAGS: 00010206
May  8 18:02:07 ts036945 kernel: [55501.393618] RAX: 0000000000000000 RBX: ffff880221bdaa00 RCX: ffff8800cae7b4f0
May  8 18:02:07 ts036945 kernel: [55501.393715] RDX: 0000000000000050 RSI: ffff8800cae7b400 RDI: ffff8800cae7ae00
May  8 18:02:07 ts036945 kernel: [55501.393814] RBP: ffff88022fac35c0 R08: 0000000000000042 R09: 0000000000000000
May  8 18:02 ...
(more)
edit retag flag offensive close merge delete

Comments

just a couple of quick things to try: First try setting ovs_use_veth=True in l3_agent.ini and reboot/retest. If that don't work, try renaming /usr/bin/neutron-netns-cleanup. Other things like disabling offloading on interfaces might help.

darragh-oreilly gravatar imagedarragh-oreilly ( 2014-05-08 12:25:43 -0500 )edit

Hi darragh-oreilly,

Thanks for your response. I set the ovs_use_veth=True on Controller, Neutron and the 3 Compute Nodes. And also renamed the /usr/bin/neutron-netns-cleanup but to no avail.

The "Kernel Panic" still occurs when I SSH to a VM from the Controller, or from a VM to the Controller even after the proposed changes to ovs_use_veth and the renaming of /usr/bin/neutron-netms-cleanup. At least I have a work-around for that part.

Some further discoveries I made earlier. The "Kernel Panic" occurs only when I ssh to a VM from the Controller host, or from the VM to the controller host. When I attempt to SSH the VMs from my laptop, Neutron doesn't crash!!! Wonder what's going on there and how I can get to the bottom of it?

Moreover, I can successfully SSH into a cirros instances from my laptop using keys files. When I ...(more)

Napo Mokoetle gravatar imageNapo Mokoetle ( 2014-05-08 15:12:08 -0500 )edit

would need more info about connectivity between the controller and router, and how br-ex is configured. Is the l3-agent running on the controller?

darragh-oreilly gravatar imagedarragh-oreilly ( 2014-05-09 02:50:42 -0500 )edit

No the neutron-l3-agent is only running on the server hosting Neutron. Is it supposed to run on the controller too?

The server hosting Neutron has two NICs, eth0 used for Management Network and External Network, and eth1 used for internal/data network. Below is ifconfig -a output for Neutron...

root@ts036945:/home/pssuser# ifconfig -a br-ex Link encap:Ethernet HWaddr 00:22:64:9f:04:a2 inet addr:196.13.145.184 Bcast:196.13.145.255 Mask:255.255.255.0 inet6 addr: fe80::3827:54ff:fe48:c1f/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:12725 errors:0 dropped:1333 overruns:0 frame:0 TX packets:963 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:7266031 (7.2 MB) TX bytes:159325 (159.3 KB)

br-int Link encap:Ethernet HWaddr d6:ea:51:d4:dc:45 inet6 addr ...(more)

Napo Mokoetle gravatar imageNapo Mokoetle ( 2014-05-09 04:57:24 -0500 )edit

I have the same issue as the OP. We are also using HP ProLiant DL 380 G5 servers for our nodes. We run Openstack Icehouse on Ubuntu 14.04. I tried the suggestions made by darragh-oreilly to no avail.

The kernel panic occurs when:

  • SSH Into / from VM from controller / compute / external node
    • Running apt-get update
    • Using wget to download a file from our external mirror server

So basically, whenver the VM tries to make contact with something not in it's own network. I could not find a specific cause in either the neutron logs nor the system logs.

When the server crashes I can see a bunch of "Failed reporting state"-errors (in the neutron logs) which I assume is a direct consequence of my controller being taken down. The actual syslog-messages from when the kernel panic occurs:

hp1 kernel: [    9.408031] [drm] ib test succeeded in 0 ...
(more)
KT gravatar imageKT ( 2014-05-27 09:05:50 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2014-06-08 13:47:21 -0500

jason-bishop gravatar image

updated 2014-06-08 13:49:27 -0500

i experienced this crash as well. as darragh-oreilly had suggested, these two settings fixed it:

ethtool -K eth3 gro off
ethtool -K eth3 gso off

there seems to be an ubuntu bug opened for this: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1313591?comments=all (https://bugs.launchpad.net/ubuntu/+so...)

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2014-05-08 11:58:42 -0500

Seen: 1,413 times

Last updated: Jun 08 '14