Ask Your Question
0

Can't start nova-compute after configuring ceph

asked 2015-03-12 04:37:17 -0500

AndrewH gravatar image

I am building a Juno cluster on Debian Wheezy to use Ceph for storage. This is a 4 node cluster: 1 controller and 3 compute/storage (just for testing).

I have configured the systems based on the docs at:

  • docs.openstack.org/juno/install-guide/install/apt-debian/content/
  • ceph.com/docs/master/rbd/rbd-openstack/

Since configuring nova.conf on the compute nodes to use rbd the nova-compute service will not start,

# /etc/nova/nova.conf
[libvirt]
images_type = rbd
images_rbd_pool = vms
images_rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinder
rbd_secret_uuid = 603ce210-facb-45cd-bf79-5fac2c583f3b

inject_password = false
inject_key = false
inject_partition = -2 
live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST"

I have configured the secret in libvirt

root@os-host1:~# virsh secret-list
 UUID                                  Usage
--------------------------------------------------------------------------------
 603ce210-facb-45cd-bf79-5fac2c583f3b  ceph client.cinder secret

Ceph is running

root@os-host1:~# ceph -w
    cluster 47d83b75-8023-45dc-a5eb-b823f860ac43
     health HEALTH_WARN clock skew detected on mon.os-host2, mon.os-host3
     monmap e3: 3 mons at {os-host1=172.50.2.101:6789/0,os-host2=172.50.2.102:6789/0,os-host3=172.50.2.103:6789/0}, election epoch 18, quorum 0,1,2 os-host1,os-host2,os-host3
     osdmap e44: 3 osds: 3 up, 3 in
      pgmap v92: 576 pgs, 5 pools, 0 bytes data, 0 objects
            15473 MB used, 884 GB / 899 GB avail
                 576 active+clean

On startup nova-compute.log shows these messages:

2015-03-12 09:16:30.328 4439 ERROR nova.openstack.common.threadgroup [-] error calling connect
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 125, in wait
    x.wait()
  File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 47, in wait
    return self.thread.wait()
  File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 173, in wait
    return self._exit_event.wait()
  File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 121, in wait
    return hubs.get_hub().switch()
  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 293, in switch
    return self.greenlet.switch()
  File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 212, in main
    result = function(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/openstack/common/service.py", line 490, in run_service
    service.start()
  File "/usr/lib/python2.7/dist-packages/nova/service.py", line 181, in start
    self.manager.pre_start_hook()
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1155, in pre_start_hook
    self.update_available_resource(nova.context.get_admin_context())
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5972, in update_available_resource
    nodenames = set(self.driver.get_available_nodes())
  File "/usr/lib/python2.7/dist-packages/nova/virt/driver.py", line 1237, in get_available_nodes
    stats = self.get_host_stats(refresh=refresh)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5803, in get_host_stats
    return self.host_state.get_host_stats(refresh=refresh)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 475, in host_state
    self._host_state = HostState(self)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6369, in __init__
    self.update_status()
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6400, in update_status
    disk_info_dict = self.driver._get_local_gb_info()
  File ...
(more)
edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted
0

answered 2015-03-26 20:47:04 -0500

Tusker gravatar image

Hi AndrewH,

I had the exact same issue, as we probably followed the same instructions.

What I did to resolve the issue is to write a test client to test it, and tail the ceph logs on each ceph server.

import rados, sys

cluster = rados.Rados(rados_id='cinder', conffile='/etc/ceph/ceph.conf')
print "\nlibrados version: " + str(cluster.version())
print "Will attempt to connect to: " + str(cluster.conf_get('mon initial members'))

cluster.connect()
print "\nCluster ID: " + cluster.get_fsid()

print "\n\nCluster Statistics"
print "=================="
cluster_stats = cluster.get_cluster_stats()

for key, value in cluster_stats.iteritems():
        print key, value

The error that I saw in the ceph logs was:

cephx server client.cinder:  unexpected key: req.key=XXXXXXXXXXXX expected_key=YYYYYYYYYYYYY

Which led me to look at the key configuration for the different users, and in essence, added the following to all of the ceph.conf (ie, on cinder-volume, and compute nodes):

[client.cinder]
        keyring = /etc/ceph/ceph.client.cinder.keyring

[client.cinder-backup]
        keyring = /etc/ceph/ceph.client.cinder-backup.keyring

[client.glance]
        keyring = /etc/ceph/ceph.client.glance.keyring

These would contain the appropriate key for that user:

[client.cinder]
        key = XYXYXYXYXYXYXYXYXYX==

If you don't know the correct key for a specific user, you can use the following to find out what you need to put.

# ceph auth list
client.cinder
        key: XYXYXYXYXYXYXYXYXYX==
        caps: [mon] allow r
        caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images

Cheers,

Damien

edit flag offensive delete link more

Comments

Nice test script. You should execute it as the nova user to uncover permission issues that are not revealed when running as root. In my case, I found that the nova user did not have permissions to read /etc/ceph/ceph.client.cinder.keyring file on the compute node which was causing the error.

dale-aavang gravatar imagedale-aavang ( 2016-02-05 12:57:58 -0500 )edit
0

answered 2015-03-13 16:36:32 -0500

wp_bengleman gravatar image

Hi AndrewH,

Make sure that you have ceph-common installed on your compute node and that you have a valid copy of your ceph.conf file available so that rados is able to communicate with the cluster. I wasn't able to tell from your question if os-host1 was the same node as your compute node.

For example, on our compute nodes, we install both ceph-common and python-ceph and replicate /etc/ceph/ceph.conf for cluster config.

compute:~# dpkg -l | grep ceph
ii  ceph                                  0.80.8-1trusty                        amd64        distributed storage and file system
ii  ceph-common                           0.80.8-1trusty                        amd64        common utilities to mount and interact with a ceph storage cluster
ii  python-ceph                           0.80.8-1trusty                        amd64        Python libraries for the Ceph distributed filesystem
edit flag offensive delete link more

Comments

The os-hostx servers are both compute and storage. All of the listed packages are installed.

AndrewH gravatar imageAndrewH ( 2015-03-22 14:47:26 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2015-03-12 04:32:25 -0500

Seen: 2,538 times

Last updated: Mar 13 '15