Ask Your Question
0

Openstack instance rebuild is failing [closed]

asked 2019-04-21 10:34:15 -0500

arslankhan gravatar image

updated 2019-04-23 03:42:09 -0500

Hello
Suddenly openstack is started throwing Unknown Error (HTTP 504) error on VM Rebuild. Command for rebuilding: openstack --os-auth-url=http://{URL}:5000/v2.0 --os-project-name=proj_qa --os-username={USER} --os-password=* server rebuild --image SA-3.0-27.qcow2 qa-sa-quintet-sa --debug --wait

I tried to reboot instance using nova reboot --hard qa-sa-quintet-sa it throws following error

Unknown Error (HTTP 504) ERROR (CommandError): Unable to reboot the specified server(s).

and instance is now stuck in Hard Rebooting state.

nova reset-state --active qa-sa-quintet-sa also throws following error Reset state for server qa-sa-quintet-sa failed: Unknown Error (HTTP 504) ERROR (CommandError): Unable to reset the state for the specified server(s).

nova --version 7.1.2 openstack --version openstack 3.8.1

sudo pcs status shows following errors Cluster name: tripleo_cluster

Failed Actions: * galera_monitor_10000 on controller-56 'unknown error' (1): call=444, status=Timed Out, exitreason='none', last-rc-change='Sun Apr 14 19:02:39 2019', queued=0ms, exec=0ms * rabbitmq_monitor_10000 on controller-56 'unknown error' (1): call=389, status=Timed Out, exitreason='none', last-rc-change='Sat Apr 13 19:03:00 2019', queued=0ms, exec=0ms * galera_monitor_10000 on controller--55 'unknown error' (1): call=219, status=Timed Out, exitreason='none', last-rc-change='Sun Apr 21 18:50:56 2019', queued=0ms, exec=0ms * rabbitmq_monitor_10000 on controller-55 'unknown error' (1): call=147, status=Timed Out, exitreason='none', last-rc-change='Sun Apr 21 18:50:57 2019', queued=0ms, exec=0ms * redis_monitor_60000 on controller-55 'unknown error' (1): call=172, status=Timed Out, exitreason='none', last-rc-change='Fri Apr 19 18:51:18 2019', queued=0ms, exec=0ms * galera_monitor_10000 on controller-57 'unknown error' (1): call=398, status=Timed Out, exitreason='none', last-rc-change='Sat Apr 20 15:23:35 2019', queued=0ms, exec=0ms * rabbitmq_monitor_10000 on controller-57 'unknown error' (1): call=281, status=Timed Out, exitreason='none', last-rc-change='Sat Apr 20 15:23:36 2019', queued=0ms, exec=0ms * redis_monitor_45000 on controller-57 'unknown error' (1): call=374, status=Timed Out, exitreason='none', last-rc-change='Fri Apr 19 15:22:38 2019', queued=0ms, exec=0ms

Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled

nova-conductor.log ERROR oslo_messaging.notify.messaging [req-df784075-1c86-4a48-8350-52a75b89beb3 e11e6766c40a4f1fb8dac4557bace3c2 290fad30002e46b6ad3b7fcee37645e2 - - -] Could not send notification to versioned_notifications

nova-novncproxy.log, nova-api.log and nova-conductor.log contains following error AMQP server on controller-57.internalapi.localdomain:5672 is unreachable: [Errno 113] EHOSTUNREACH. Trying again in 1 seconds. Client port: None

nova-consoleauth.log ERROR nova.servicegroup.drivers.db DBError: Can't reconnect until invalid transaction is rolled back

nova-rowsflush.log IOError: [Errno 13] Permission denied: '/var/log/nova/nova-manage.log'

Thanks in advance

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Bernd Bausch
close date 2019-05-15 19:24:43.707386

Comments

This huge wall of unformatted text seems to be debug output from the CLI command. My suggestion: Edit your question to make it readable, include the command(s) that generated this output, and find clues in the Nova logs. Also provide info about the version, deployment method and cloud architecture.

Bernd Bausch gravatar imageBernd Bausch ( 2019-04-21 17:25:06 -0500 )edit

Hello changes were made thanks

arslankhan gravatar imagearslankhan ( 2019-04-23 01:49:14 -0500 )edit

Fine but I don't see anything that helps. It's obviously a Tripleo cluster. There seems to be a problem with the DB cluster, but I am not versed enough to solve this and can't tell whether it has anything to do with your rebuild problem. What about Nova logs?

Bernd Bausch gravatar imageBernd Bausch ( 2019-04-23 02:12:09 -0500 )edit

updated the description with different errors found in nova logs

arslankhan gravatar imagearslankhan ( 2019-04-23 03:42:29 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2019-05-15 11:21:37 -0500

arslankhan gravatar image

The problem was with rabbitmq cluster. they were not able to communicate. Had to kill the rabbit mq instances Instead of waiting for pacemaker to restart rabbitmq i restarted the rabbitmq service which caused rabbitmq cluster to go in stopped statue To get out of it i reset the failcount for rabbitmq and then put the node in standby and reboot the nodes one by one. after reboot cluster was started again on all 3 nodes and services were back to normal

Thanks Bernad for pointing me in right direction

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2019-04-21 10:34:15 -0500

Seen: 203 times

Last updated: May 15