Ask Your Question
0

CRITICAL: 2500 instances down

asked 2015-10-15 08:29:17 -0500

anonymous user

Anonymous

updated 2015-10-15 10:41:14 -0500

Hi, We're having a big problem with openstack network. We have a cluster with 170 hosts as compute nodes and 10 hosts as network nodes. Suddenly some instances stop responding and then more and more instance stop responding. This is the log:

2015-10-15 09:45:52.426 31044 ERROR neutron.agent.dhcp_agent [req-f37d335e-2cd6-4ab4-9528-dbcde3e78e96 None] Failed reporting state! 2015-10-15 09:45:52.426 31044 TRACE neutron.agent.dhcp_agent Traceback (most recent call last): 2015-10-15 09:45:52.426 31044 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/dist-packages/neutron/agent/dhcp_agent.py", line 577, in _report_state 2015-10-15 09:45:52.426 31044 TRACE neutron.agent.dhcp_agent self.state_rpc.report_state(ctx, self.agent_state, self.use_call) 2015-10-15 09:45:52.426 31044 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/dist-packages/neutron/agent/rpc.py", line 72, in report_state 2015-10-15 09:45:52.426 31044 TRACE neutron.agent.dhcp_agent return self.call(context, msg, topic=self.topic) 2015-10-15 09:45:52.426 31044 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/dist-packages/neutron/openstack/common/rpc/proxy.py", line 129, in call 2015-10-15 09:45:52.426 31044 TRACE neutron.agent.dhcp_agent exc.info, real_topic, msg.get('method')) 2015-10-15 09:45:52.426 31044 TRACE neutron.agent.dhcp_agent Timeout: Timeout while waiting on RPC response - topic: "q-plugin", RPC method: "report_state" info: "<unknown>" 2015-10-15 09:45:52.426 31044 TRACE neutron.agent.dhcp_agent 2015-10-15 09:45:52.426 31044 WARNING neutron.openstack.common.loopingcall [-] task run outlasted interval by 90.031249 sec

and every 2 minutes:

ERROR neutron.agent.dhcp_agent [-] Unable to sync network state

Can you, please give me some idea? we try everything

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2015-10-15 13:12:35 -0500

ameya gravatar image

Blockquote we try everything

What did you try? What. Is the v er rsion of neutron?

edit flag offensive delete link more

Comments

Finally we decide to shutdown all network nodes and start one by one. Then we shutdown all the instances and restart one by one the compute nodes. The service is in production and we cannot stay offline for more time

emc gravatar imageemc ( 2015-10-15 13:27:17 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2015-10-15 08:29:17 -0500

Seen: 154 times

Last updated: Oct 15 '15