Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Nova-Compute Ceph backend issues

Hello everyone,

Currently I'm setting up an OpenStack Queens test environment with a Ceph Jewel backend (had some configuration issues with Ceph Luminous) on Ubuntu 16.04. The Glance and Cinder service are both successfully connected to Ceph, but Nova-Compute is giving me issues. My nova.conf and nova-compute.conf on the compute node are as followed:

nova.conf

[DEFAULT]
log_dir = /var/log/nova
lock_path = /var/lock/nova
state_path = /var/lib/nova

my_ip = 192.168.223.185

use_neutron = True
firewall_driver = nova.virt.firewall.NoopFirewallDriver

transport_url = rabbit://openstack:xxxxxxxx@192.168.223.182

[api]
auth_strategy = keystone

[glance]
api_servers = http://192.168.223.182:9292

[keystone_authtoken]
auth_uri = http://192.168.223.184:5000
auth_url = http://192.168.223.184:5000
memcached_servers = 192.168.223.184:11211
region_name = RegionOne
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = nova
password = xxxxxxx

[oslo_concurrency]
lock_path = /var/lib/nova/tmp

[placement]
os_region_name = RegionOne
project_domain_name = Default
project_name = service
auth_type = password
user_domain_name = Default
auth_url = http://192.168.223.184:5000/v3
username = placement
password = xxxxxxx

[vnc]
enabled = true
server_listen = 0.0.0.0
server_proxyclient_address = 192.168.223.185
novncproxy_base_url = http://192.168.223.182:6080/vnc_auto.html

nova-compute.conf

[DEFAULT]
compute_driver=libvirt.LibvirtDriver

[libvirt]
images_type = rbd
images_rbd_pool = vm_01
images_rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinderusr01
rbd_secret_uuid = xxxxxxx
disk_cachemodes="network=writeback"
inject_password = false
inject_key = false
inject_partition = -2
live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED"
hw_disk_discard = unmap
virt_type=kvm

The configuration option rbd_user is giving problems here. Each time I comment this line and restart the nova-compute service everything works just fine, as soon as I uncomment this line and restart the nova-compute service an issue with RabbitMQ occurs.

When running the command openstack hypervisor list the state of the hypervisor keeps changing from up to down. This only happens when the rbd_user line is uncommented. The nova-compute log states the following:

2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager [req-f6a76a06-df19-46ed-938b-baa7807a9ee4 - - - - -] Error updating resources for node srv-hrln-hyp-01.: TimedOut: [errno 110] error connecting to the cluster
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager Traceback (most recent call last):
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 7254, in update_available_resource_for_node
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager     rt.update_available_resource(context, nodename)
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py", line 664, in update_available_resource
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager     resources = self.driver.get_available_resource(nodename)
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6321, in get_available_resource
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager     disk_info_dict = self._get_local_gb_info()
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5599, in _get_local_gb_info
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager     info = LibvirtDriver._get_rbd_driver().get_pool_info()
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 368, in get_pool_info
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager     with RADOSClient(self) as client:
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 102, in __init__
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager     self.cluster, self.ioctx = driver._connect_to_rados(pool)
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager   File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/storage/rbd_utils.py", line 133, in _connect_to_rados
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager     client.connect()
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager   File "rados.pyx", line 875, in rados.Rados.connect (/build/ceph-7hHYeL/ceph-12.2.4/obj-x86_64-linux-gnu/src/pybind/rados/pyrex/rados.c:10952)
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager TimedOut: [errno 110] error connecting to the cluster
2018-04-11 16:01:19.970 29943 ERROR nova.compute.manager
2018-04-11 16:01:19.976 29943 ERROR oslo.messaging._drivers.impl_rabbit [req-f6a76a06-df19-46ed-938b-baa7807a9ee4 - - - - -] [b026ed3f-209d-45d2-a601-2d9210ec326b] AMQP server on 192.168.223.182:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: 38830: error: [Errno 104] Connection reset by peer
2018-04-11 16:01:19.976 29943 ERROR oslo.messaging._drivers.impl_rabbit [req-f6a76a06-df19-46ed-938b-baa7807a9ee4 - - - - -] [676acf8d-723e-4840-8ec7-523cd8825d42] AMQP server on 192.168.223.182:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: 38828: error: [Errno 104] Connection reset by peer
2018-04-11 16:01:19.977 29943 ERROR oslo.messaging._drivers.impl_rabbit [req-f6a76a06-df19-46ed-938b-baa7807a9ee4 - - - - -] [effa0b1f-2dee-43fe-895c-603cc2118774] AMQP server on 192.168.223.182:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: 38826: error: [Errno 104] Connection reset by peer
2018-04-11 16:01:19.979 29943 ERROR oslo.messaging._drivers.impl_rabbit [req-f6a76a06-df19-46ed-938b-baa7807a9ee4 - - - - -] [7d79ba61-5821-4a9d-b2c3-36fac583ab3e] AMQP server on 192.168.223.182:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: 38832: error: [Errno 104] Connection reset by peer
2018-04-11 16:01:21.020 29943 INFO oslo.messaging._drivers.impl_rabbit [req-f6a76a06-df19-46ed-938b-baa7807a9ee4 - - - - -] [b026ed3f-209d-45d2-a601-2d9210ec326b] Reconnected to AMQP server on 192.168.223.182:5672 via [amqp] client with port 38834.
2018-04-11 16:01:21.025 29943 INFO oslo.messaging._drivers.impl_rabbit [req-f6a76a06-df19-46ed-938b-baa7807a9ee4 - - - - -] [7d79ba61-5821-4a9d-b2c3-36fac583ab3e] Reconnected to AMQP server on 192.168.223.182:5672 via [amqp] client with port 38840.
2018-04-11 16:01:21.030 29943 INFO oslo.messaging._drivers.impl_rabbit [req-f6a76a06-df19-46ed-938b-baa7807a9ee4 - - - - -] [effa0b1f-2dee-43fe-895c-603cc2118774] Reconnected to AMQP server on 192.168.223.182:5672 via [amqp] client with port 38838.
2018-04-11 16:01:21.036 29943 INFO oslo.messaging._drivers.impl_rabbit [req-f6a76a06-df19-46ed-938b-baa7807a9ee4 - - - - -] [676acf8d-723e-4840-8ec7-523cd8825d42] Reconnected to AMQP server on 192.168.223.182:5672 via [amqp] client with port 38836.

The RabbitMQ log on the controller node states the following:

=INFO REPORT==== 11-Apr-2018::16:13:23 ===
connection <0.18264.0> (192.168.223.185:40080 -> 192.168.223.182:5672 - nova-compute:29943:7d79ba61-5821-4a9d-b2c3-36fac583ab3e): user 'openstack' authenticated and granted access to vhost '/'

=INFO REPORT==== 11-Apr-2018::16:13:23 ===
connection <0.18261.0> (192.168.223.185:40078 -> 192.168.223.182:5672 - nova-compute:29943:effa0b1f-2dee-43fe-895c-603cc2118774): user 'openstack' authenticated and granted access to vhost '/'

=INFO REPORT==== 11-Apr-2018::16:13:23 ===
connection <0.18258.0> (192.168.223.185:40076 -> 192.168.223.182:5672 - nova-compute:29943:b026ed3f-209d-45d2-a601-2d9210ec326b): user 'openstack' authenticated and granted access to vhost '/'

=ERROR REPORT==== 11-Apr-2018::16:13:32 ===
closing AMQP connection <0.18246.0> (192.168.223.185:40068 -> 192.168.223.182:5672):
{handshake_timeout,frame_header}

=ERROR REPORT==== 11-Apr-2018::16:13:32 ===
closing AMQP connection <0.18243.0> (192.168.223.185:40066 -> 192.168.223.182:5672):
{handshake_timeout,frame_header}

=ERROR REPORT==== 11-Apr-2018::16:13:32 ===
closing AMQP connection <0.18249.0> (192.168.223.185:40070 -> 192.168.223.182:5672):
{handshake_timeout,frame_header}

=ERROR REPORT==== 11-Apr-2018::16:13:32 ===
closing AMQP connection <0.18252.0> (192.168.223.185:40072 -> 192.168.223.182:5672):
{handshake_timeout,frame_header}

=ERROR REPORT==== 11-Apr-2018::16:16:23 ===
closing AMQP connection <0.18255.0> (192.168.223.185:40074 -> 192.168.223.182:5672 - nova-compute:29943:676acf8d-723e-4840-8ec7-523cd8825d42):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 11-Apr-2018::16:16:23 ===
closing AMQP connection <0.18264.0> (192.168.223.185:40080 -> 192.168.223.182:5672 - nova-compute:29943:7d79ba61-5821-4a9d-b2c3-36fac583ab3e):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 11-Apr-2018::16:16:23 ===
closing AMQP connection <0.18261.0> (192.168.223.185:40078 -> 192.168.223.182:5672 - nova-compute:29943:effa0b1f-2dee-43fe-895c-603cc2118774):
missed heartbeats from client, timeout: 60s

=ERROR REPORT==== 11-Apr-2018::16:16:23 ===
closing AMQP connection <0.18258.0> (192.168.223.185:40076 -> 192.168.223.182:5672 - nova-compute:29943:b026ed3f-209d-45d2-a601-2d9210ec326b):
missed heartbeats from client, timeout: 60s

The Ceph keyring files are both on the controllerand the hypervisor, the correct rights have been given to the ceph.conf and keyring files and a secret has been added to libvirt.