Ask Your Question
0

[SOLVED] Pike: Cinder Volume services are down on Controller Node without any apparent error in logs

asked 2018-08-31 07:56:34 -0500

sebastien gravatar image

updated 2018-09-04 03:30:06 -0500

Greetings,

I am experiencing issues regarding Cinder. Here are the configuration elements:

From storage node:

[DEFAULT]
osapi_volume_listen = 192.168.10.60
api_paste_config = /etc/cinder/api-paste.ini
glance_host = 192.168.10.60
auth_strategy = keystone
debug = False
use_syslog = False
my_ip = 192.168.10.59
transport_url = rabbit://openstack:0e932761f8ddf9d9f175@192.168.10.60:5672//openstack
log_dir = /var/log/cinder
state_path = /var/lib/cinder
volumes_dir = /var/lib/cinder/volumes/
rootwrap_config = /etc/cinder/rootwrap.conf
default_volume_type = lvm-192.168.10.59
glance_api_servers = http://192.168.10.60:9292
enabled_backends =lvm-192.168.10.59,nfs-192.168.10.59
volume_clear = zero
volume_clear_size = 50
volume_clear_ionice = -c3
nas_secure_file_operations = false
nas_secure_file_permissions = false
nova_catalog_info = compute:nova:internalURL
nova_catalog_admin_info = compute:nova:adminURL
os_region_name = Bagneux
notification_driver = messagingv2
control_exchange = cinder
[backend]
[backend_defaults]
[barbican]
[brcd_fabric_example]
[cisco_fabric_example]
[coordination]
[cors]
[database]
connection = mysql+pymysql://cinderdbuser:680b5b9b4b557830bfdd@192.168.10.60:3306/cinderdb
retry_interval = 10
idle_timeout = 3600
min_pool_size = 1
max_pool_size = 10
max_retries = 100
pool_timeout = 10
[fc-zone-manager]
[healthcheck]
[key_manager]
[keystone_authtoken]
auth_uri = http://192.168.10.60:5000
auth_url = http://192.168.10.60:35357
auth_type = password
memcached_servers = 192.168.10.60:11211
project_domain_name = default
user_domain_name = default
project_name = services
username = cinder
password = *snipped*
region_name = Bagneux
project_domain_id = default
user_domain_id = default
[matchmaker_redis]
[nova]
[oslo_concurrency]
lock_path = /var/oslock/cinder
[oslo_messaging_amqp]
[oslo_messaging_kafka]
[oslo_messaging_notifications]
driver = messagingv2
[oslo_messaging_rabbit]
[oslo_messaging_zmq]
[oslo_middleware]
[oslo_policy]
[oslo_reports]
[oslo_versionedobjects]
[profiler]
[ssl]
[lvm-192.168.10.59]
volume_group = centos
volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver
iscsi_protocol = iscsi
iscsi_helper = tgtadm
iscsi_ip_address = 192.168.10.59
volume_backend_name = LVM_iSCSI-192.168.10.59
volume_clear = zero
volume_clear_size = 50
volume_clear_ionice = -c3
lvm_type = default
[nfs-192.168.10.59]
volume_driver = cinder.volume.drivers.nfs.NfsDriver
nfs_shares_config = /etc/cinder/nfs_shares
nfs_mount_point_base = /var/lib/cinder/nfs
nsf_disk_util = df
nfs_sparsed_volumes = True
nfs_mount_options = rw,hard,intr,timeo=90,bg,vers=3,proto=tcp,rsize=32768,wsize=32768
volume_backend_name = NFS-192.168.10.59
nfs_qcow2_volumes = True
nfs_snapshot_support = True
nas_secure_file_operations = false
nas_secure_file_permissions = false
volume_clear = zero
volume_clear_size = 50
volume_clear_ionice = -c3

from the controller:

[DEFAULT]
osapi_volume_listen = 192.168.10.60
api_paste_config = /etc/cinder/api-paste.ini
glance_host = 192.168.10.60
auth_strategy = keystone
debug = False
use_syslog = False
my_ip = 192.168.10.60
transport_url = rabbit://openstack:0e932761f8ddf9d9f175@192.168.10.60:5672//openstack
log_dir = /var/log/cinder
state_path = /var/lib/cinder
volumes_dir = /var/lib/cinder/volumes/
rootwrap_config = /etc/cinder/rootwrap.conf
default_volume_type = lvm-192.168.10.60
glance_api_servers = http://192.168.10.60:9292
enabled_backends =
nova_catalog_info = compute:nova:internalURL
nova_catalog_admin_info = compute:nova:adminURL
os_region_name = Bagneux
notification_driver = messagingv2
control_exchange = cinder
[backend]
[backend_defaults]
[barbican]
[brcd_fabric_example]
[cisco_fabric_example]
[coordination]
[cors]
[database]
connection = mysql+pymysql://cinderdbuser:680b5b9b4b557830bfdd@192.168.10.60:3306/cinderdb
retry_interval = 10
idle_timeout = 3600
min_pool_size = 1
max_pool_size = 10
max_retries = 100
pool_timeout = 10
[fc-zone-manager]
[healthcheck]
[key_manager]
[keystone_authtoken]
auth_uri = http://192.168.10.60:5000
auth_url = http://192.168.10.60:35357
auth_type = password
memcached_servers = 192.168.10.60:11211
project_domain_name = default
user_domain_name = default
project_name = services
username = cinder
password = *snipped*
region_name = Bagneux
project_domain_id = default
user_domain_id = default
[matchmaker_redis]
[nova]
[oslo_concurrency]
lock_path = /var/oslock/cinder
[oslo_messaging_amqp]
[oslo_messaging_kafka]
[oslo_messaging_notifications]
driver = messagingv2
[oslo_messaging_rabbit]
[oslo_messaging_zmq]
[oslo_middleware]
[oslo_policy]
[oslo_reports]
[oslo_versionedobjects]
[profiler]
[ssl]

here is the output from controller:

$ openstack volume service list
+------------------+-----------------------------------------------------+------+---------+-------+----------------------------+
| Binary           | Host                                                | Zone | Status  | State | Updated At                 |
+------------------+-----------------------------------------------------+------+---------+-------+----------------------------+
| cinder-volume    | srv-heb-stack101@lvm-192.168.10.59 ...
(more)
edit retag flag offensive close merge delete

Comments

Enable debug logging and check the logs again.

Assuming that you use systemd to manage services, try systemctl status on the relevant Cinder services.

Bernd Bausch gravatar imageBernd Bausch ( 2018-08-31 09:27:45 -0500 )edit

Thank you for your advice, Bernd!

I have added the systemctl output from both servers, and here is a pastebin from the DEBUG logs.

http://paste.openstack.org/show/729309/
http://paste.openstack.org/show/729311/
http://paste.openstack.org/show/729312/

Unfortunately, I'm still missing the issue here

sebastien gravatar imagesebastien ( 2018-09-03 04:38:55 -0500 )edit

Has this setup worked before or is it a fresh install? Are the services still up and running? 27 seconds is not much to wait for a service to fail, just double check that. Can you run cinder-manage db sync? Does it work?

eblock gravatar imageeblock ( 2018-09-03 05:25:59 -0500 )edit

The Cinder Volume processes are running, yet Cinder thinks they are down. This points towards some communication problem, e.g. the volume services were unable to contact Cinder API. I would have thought that this leaves some trace in the log files, though. All I see is startup messages.

Bernd Bausch gravatar imageBernd Bausch ( 2018-09-03 06:29:58 -0500 )edit

@eblock and @Bernd, I am sorry; it seems the openstack pastebin has a line limit. I ran the services more than 27 seconds, obviously. Here are the "complete" logs:

https://pastebin.com/QmWMzk76
https://pastebin.com/KrGxHntS
https://pastebin.com/7wwdqkdD

It's a fresh install, it never worked so far

sebastien gravatar imagesebastien ( 2018-09-03 08:47:07 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
1

answered 2018-08-31 10:41:35 -0500

game-on gravatar image

I have seen this on a Pike installation, there are a few things to check.

NTP, or time being out of sync can cause the volume service to die.

If you're running Ceph (your config suggests that are are not), then the calculation of RBD sizes looping through all volumes can also cause the service to die. In this instance, a cronjob to stagger restarts of the Cinder volume service is a workaround.

edit flag offensive delete link more

Comments

1

Thank you!

The issue was NTP-related and despite of chronyd successfully started on both servers, the date/time between the 2 nodes were 5min apart.

That's the second time I miss checking on the date/time for an Openstack-related issue. On the third, I will got it tattooed on my forearm!

Cheers!

sebastien gravatar imagesebastien ( 2018-09-04 03:33:09 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2018-08-31 07:56:34 -0500

Seen: 111 times

Last updated: Sep 04