Ask Your Question
1

Why are nova and neutron services going down from time to time?

asked 2014-03-06 20:03:16 -0500

spcla1 gravatar image

updated 2014-03-07 15:09:09 -0500

smaffulli gravatar image
OS: Redhat 6.5 
Openstack : Havana

I have seen this problem where nova-compute service goes down, although the node still has the nova-compute service running, the nova service-list command displays the state as down. Sometimes there are few nodes down but most of the time they all go down at the same time.

When this happens, the conductor log shows the following error message and after restarting openstack-nova-conductor, everything goes back to normal. Seems to be an issue with qpid.

conductor.log

2014-03-05 17:18:51.896 42263 ERROR root [-] Unexpected exception occurred 1 time(s)... retrying.
2014-03-05 17:18:51.896 42263 TRACE root Traceback (most recent call last):
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 78, in inner_func
2014-03-05 17:18:51.896 42263 TRACE root     return infunc(*args, **kwargs)
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 709, in _consumer_thread
2014-03-05 17:18:51.896 42263 TRACE root     self.consume()
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 700, in consume
2014-03-05 17:18:51.896 42263 TRACE root     it.next()
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 617, in iterconsume
2014-03-05 17:18:51.896 42263 TRACE root     yield self.ensure(_error_callback, _consume)
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 551, in ensure
2014-03-05 17:18:51.896 42263 TRACE root     return method(*args, **kwargs)
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 608, in _consume
2014-03-05 17:18:51.896 42263 TRACE root     nxt_receiver = self.session.next_receiver(timeout=timeout)
2014-03-05 17:18:51.896 42263 TRACE root   File "<string>", line 6, in next_receiver
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 660, in next_receiver
2014-03-05 17:18:51.896 42263 TRACE root     if self._ecwait(lambda: self.incoming, timeout):
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
2014-03-05 17:18:51.896 42263 TRACE root     result = self._ewait(lambda: self.closed or predicate(), timeout)
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 566, in _ewait
2014-03-05 17:18:51.896 42263 TRACE root     result = self.connection._ewait(lambda: self.error or predicate(), timeout)
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 209, in _ewait
2014-03-05 17:18:51.896 42263 TRACE root     self.check_error()
2014-03-05 17:18:51.896 42263 TRACE root   File "/usr/lib/python2.6/site-packages ...
(more)
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2014-03-12 15:47:55 -0500

cloudssky gravatar image

updated 2014-03-12 16:01:26 -0500

From my experience with one controller and 2 compute nodes, that happens to me from time to time too (I'm testing the environment since 10 weeks).

I guess to have a somehow stable environment, at least 4 nodes are required, controller, neutron network and 2 compute nodes. And if I don't use the environment for 2 days and all VMs sleep, it seems that the hosts and the services go to the sleep mode too :-)

But since 7 days, I'm running a virtualized RDO Foreman installation on top of my base environment and all things are going well.

How many nodes do you have?

edit flag offensive delete link more

Comments

Thanks for your response. I have more than 10 nodes. I actually haven't seen the problem the last 3 weeks.

spcla1 ( 2014-03-25 18:56:12 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

[hide preview]

Get to know Ask OpenStack

Resources for moderators

Question Tools

Follow
3 followers

Stats

Asked: 2014-03-06 20:03:16 -0500

Seen: 376 times

Last updated: Mar 12 '14