Ask Your Question

ross-golder's profile - activity

2016-10-12 07:32:59 -0500 received badge  Teacher (source)
2016-04-28 14:09:24 -0500 answered a question Horizon slow --> causes ?

Also considering the same question myself at the moment...

3 nodes (DELL R420s w/16GB RAM):

  • Controller/Network : running nova/neutron/glance/cinder etc, plus horizon, mysql, rabbitmq and memcache
  • 2x Compute nodes: running nova/neutron/cinder

I believe the problem is that Mysql and RabbitMQ are using inordinate amounts of CPU and memory, and causing the Controller1 node to constantly run into swap memory. The page requests from Horizon query APIs which are backed by MySQL, and so I suspect it is down to slow db queries.

The compute nodes and the VMs themselves are running along nicely. It's just Horizon, which can take 15-30 seconds between page clicks on some occassions. We're only running 5-6 VMs, so I can't figure out how/why the mysql and rabbitmq processes can be using the kind of CPU/memory that the process list is reporting they are.

9595 999 20 0 6690688 1.537g 4408 S 56.4 9.8 35:37.97 beam.smp
23598 999 20 0 13.317g 412044 12116 S 1.3 2.5 21:13.65 mysqld

It's been like this since Icehouse (on Mitaka now), although I suspect it's less to do with Openstack itself, and more to do with these services on which it relies. I've read other articles that all suggest it's related to stale keystone tokens, but I've cleared them and it's still the same.

Here's a typical Nova API call from the logs... (15 secs!?)

2016-04-28 06:33:17.438 16517 INFO nova.osapi_compute.wsgi.server [req-4e774ef7-447f-4ec7-9035-4456d1035e95 47ed0175d27e4485b91fee5d076e8aae 9809424358874fe189b6392b8468f177 - - -] "GET /v2/9809424358874fe189b6392b8468fabc/servers/detail?project_id=9809424358874fe189b6392b8468f177 HTTP/1.1" status: 200 len: 3581 time: 15.7520719

Surely this isn't 'standard' performance for all Openstack deploys?

EDIT: Seems like I've been barking up the wrong tree. The majority of the memory was actually being consumed by the various API worker processes. Most API services were spawning one worker process for each CPU core. We don't need 70+ nova-api processes to serve a handful of VMs for a handful of staff, so I discovered the '*_workers' configuration parameters for the nova-api, nova-conductor, glance-api, cinder-api and neutron-api/metadata (IIRC) services. Setting these to more modest values led to less unnecessary processes being spawned, and considerably less memory (and swap) being consumed. The Horizon dashboard, and things in general on the controller node, are now running a lot more happily.

-- Ross

2014-04-06 05:33:30 -0500 received badge  Good Question (source)
2014-04-05 21:37:40 -0500 commented answer Neutron/OVS VLAN-tagging of DHCP requests?

FTR, I found this article helpful too, in understanding how the ports on the bridges interact with each other...

2014-04-05 21:34:33 -0500 commented question Neutron/OVS VLAN-tagging of DHCP requests?

Andrew Kinney: The situation described in that bug appears to be specific to (XenServer?) cases where two interfaces are configured per VM. In my configuration, as far as I can see, only the tap interface is being created, and it is being tagged successfully according to ovs-vsctl. FTR, I'm using QEMU/KVM.

2014-04-05 21:34:04 -0500 commented question Neutron/OVS VLAN-tagging of DHCP requests?

Thanks, darrah-oreilly. You helped nudge me in the right direction there.

2014-03-05 01:23:54 -0500 answered a question Neutron/OVS VLAN-tagging of DHCP requests?

In the end, I saw that you can see the VLAN (802.1Q) tag for a packet in tcpdump. I was able to see tag information for the DHCP packets I was looking for using, e.g.

# tcpdump -i em2 -n -e port 67
tcpdump: WARNING: em2: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em2, link-type EN10MB (Ethernet), capture size 65535 bytes
02:28:00.596374 fa:16:3e:9e:d6:26 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 346: vlan 1003, p 0, ethertype IPv4, > BOOTP/DHCP, Request from fa:16:3e:9e:d6:26, length 300

At this point, I began to realise that I'd somehow managed to get the meanings/uses of the 'br-ex' and 'br-int' bridges confused. So I created a couple of new bridges for each host, named after the physical interfaces (i.e. 'br-em1' and 'br-em2'), and moved the physical ports to those bridges. This clarified things a little.

Most importantly, in doing so (and having read a ton more docs by then), I started to figure out what the 'int-*' ports were about and realised that I'd put 'em2' (physical) port on the 'integration' bridge, which was causing two DHCP packets per request to be sent out.

Anyway, the situation now is that it is sending correctly tagged DHCP packets out via the em2 port. The problem seems to be that the switch (NetGear GS724T v3) is not forwarding tagged packets to the other two hosts, as they are not receiving them on their 'em2' interfaces.

Unfortunately, in trying to configure 'VLAN trunking' between the ports on that switch, I've inadvertently locked myself out of it, and am waiting for some on-site staff to unwind a paperclip and do a factory reset :)

2014-01-19 01:14:31 -0500 received badge  Famous Question (source)
2014-01-18 00:04:45 -0500 answered a question Openstack installation problem

Also check your 'enabled_apis' value. For me, as I'd converted a controller node to a compute node, I'd forgotten to remove the 'enabled_apis=metadata' so the other two services were not being started where they were expected to be.

2014-01-18 00:03:29 -0500 received badge  Enthusiast
2014-01-12 21:41:35 -0500 received badge  Notable Question (source)
2014-01-10 15:05:00 -0500 received badge  Nice Question (source)
2014-01-09 20:38:28 -0500 received badge  Popular Question (source)
2014-01-07 00:10:26 -0500 received badge  Student (source)
2014-01-06 21:38:58 -0500 asked a question Neutron/OVS VLAN-tagging of DHCP requests?

Hi guys,

I'm having trouble figuring out why a Cirros test VM on my compute node is not obtaining a DHCP address from my controller node, over a Neutron/OVS/VLAN arrangement. So, the scenario in more detail...

2x DELL PowerEdge R420s, running stock Ubuntu 13.10 Saucy (OpenStack Havana). 2x NetGear GS724 switches

Primary switch is for traffic, with gateway router at 1.1. Both DELLs connected via their primary (em0) interface.

Secondary switch is for traffic, with no gateway. Both DELLs connected via their secondary (em1) interface.

First DELL is called 'controller1'. Second DELL is called 'compute1'. I'll focus on the compute node's configuration for now, as I'm having trouble tracing where the DHCP request packet goes within this domain.

First, the underlying /etc/network/interfaces network config...

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto em1
iface em1 inet static

# The primary network interface
auto br-ex
iface br-ex inet static

auto em2
iface em2 inet static

auto br-int
iface br-int inet static

OVS bridges configured as per 'ovs-vsctl show'...

    Bridge br-int
        Port int-br-ex
            Interface int-br-ex
        Port "em2"
            Interface "em2"
        Port phy-br-int
            Interface phy-br-int
        Port br-int
            Interface br-int
                type: internal
        Port "tapc015b393-e7"
            tag: 3
            Interface "tapc015b393-e7"
        Port int-br-int
            Interface int-br-int
    Bridge br-ex
        Port phy-br-ex
            Interface phy-br-ex
        Port "em1"
            Interface "em1"
        Port br-ex
            Interface br-ex
                type: internal
    ovs_version: "1.10.2"

The tap interface belongs to my 'test1' Cirros guest, and I can see DHCP request packets on it if I tcpdump it as the VM boots.

I have the OpenVSwitch (OVS) plugin configured as follows:

tenant_network_type = vlan
network_vlan_ranges = physnet2:1000:2999
local_ip =
bridge_mappings = physnet2:br-int


firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver


FTR, the controller node is the same, with the local_ip being

On Neutron, there is a virtual 'testnet':

| Field                     | Value                                |
| admin_state_up            | True                                 |
| id                        | c7497b77-a716-4ad6-8d44-3fa9b2dcfaf0 |
| name                      | testnet                              |
| provider:network_type     | vlan                                 |
| provider:physical_network | physnet2                             |
| provider:segmentation_id  | 1001                                 |
| router:external           | False                                |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   | 8821440b-839b-4e91-85ca-1f2651ed6896 |
| tenant_id                 | 83b3ca4a4ec94070904d5112aeb1baab     |

And a virtual 'testsubnet':

| Field            | Value                                            |
| allocation_pools | {"start": "", "end": ""} |
| cidr             |                                   |
| dns_nameservers  |                                          |
| enable_dhcp      | True                                             |
| gateway_ip       |                                      |
| host_routes      |                                                  |
| id               | 8821440b-839b-4e91-85ca-1f2651ed6896             |
| ip_version       | 4                                                |
| name             | testsubnet                                       |
| network_id       | c7497b77-a716-4ad6-8d44-3fa9b2dcfaf0             |
| tenant_id        | 83b3ca4a4ec94070904d5112aeb1baab                 |

My confusion seems to arise from not being quite clear about where the VLAN id is being set and what to exactly, so I can trace the packet beyond the tap interface.

It seems I am not quite grokking the relationship between the 'provider:segmentation_id', and the OVS 'tag id'. The segmentation_id seems to have been correctly taken from ... (more)