Ask Your Question
0

Openstack neutron l3 agent dead

asked 2017-05-30 11:40:22 -0500

Vikash Kathirvel gravatar image

updated 2017-05-31 07:15:40 -0500

I am trying a 3 node setup with 1 controller, 1 compute and 1 network node. I have configured the keystone, glance and nova services.

I am trying to configure the networking service, using openvswitch as the backend. I am running dhcp agent, metadata agent, openvswitch agent and the l3 agent in the network node.

All the services are running, however the l3 agent shows log:

2017-05-30 21:51:32.827 4362 ERROR neutron.common.rpc [req-a32bb78f-972b-4f77-9562-f6c223bf0013 - - - - -] Timeout in RPC method get_service_plugin_list. Waiting for 16 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.
2017-05-30 21:51:32.828 4362 WARNING neutron.common.rpc [req-a32bb78f-972b-4f77-9562-f6c223bf0013 - - - - -] Increasing timeout for get_service_plugin_list calls to 240 seconds. Restart the agent to restore it to the default value.

And the openstack network agent list command shows l3 agent as dead all the other agents are alive and up.

I am only using two options in the l3_agent.ini

interface_driver = openvswitch

external_network_bridge =

Is there something else required? any help will be appreciated.

EDIT 1:

I have checked the versions both neutron-server and l3-agent are version 9.2.0.

And also, i remember it working first time after install, but then i had to restart the services to change configuration to use openvswitch and the problem started

EDIT 2:

Here are the debug logs:

2017-05-31 17:13:45.004 12612 INFO neutron.common.config [-] Logging enabled!
2017-05-31 17:13:45.005 12612 INFO neutron.common.config [-] /usr/bin/neutron-l3-agent version 9.2.0
2017-05-31 17:13:45.006 12612 DEBUG neutron.common.config [-] command line: /usr/bin/neutron-l3-agent --config-file /usr/share/neutron/neutron-dist.conf --config-dir /usr/share/neutron/l3_agent --config-file /etc/neutron/neutron.conf --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-l3-agent --log-file /var/log/neutron/l3-agent.log setup_logging /usr/lib/python2.7/site-packages/neutron/common/config.py:107
2017-05-31 17:13:45.085 12612 DEBUG oslo_messaging._drivers.amqpdriver [req-cfd85cea-7516-4ee3-9d05-3e7436846b68 - - - - -] CALL msg_id: da645eba066444cc8e1de41485b9922f exchange 'neutron' topic 'q-l3-plugin' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:448
2017-05-31 17:14:45.054 12612 DEBUG oslo_concurrency.lockutils [-] Lock "_check_child_processes" acquired by "neutron.agent.linux.external_process._check_child_processes" :: waited 0.000s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:270
2017-05-31 17:14:45.056 12612 DEBUG oslo_concurrency.lockutils [-] Lock "_check_child_processes" released by "neutron.agent.linux.external_process._check_child_processes" :: held 0.001s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282
2017-05-31 17:14:45.089 12612 ERROR neutron.common.rpc [req-cfd85cea-7516-4ee3-9d05-3e7436846b68 - - - - -] Timeout in RPC method get_service_plugin_list. Waiting for 41 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.
2017-05-31 17:14:45.090 12612 WARNING neutron.common.rpc [req-cfd85cea-7516-4ee3-9d05-3e7436846b68 - - - - -] Increasing timeout for get_service_plugin_list calls to 120 seconds. Restart the agent to restore it to the default value.
2017-05-31 17:15:25.863 12612 WARNING neutron ...
(more)
edit retag flag offensive close merge delete

Comments

Could be a version mismatch - L3 agent code and Neutron server code might not be compatible. Compare the output of:

/usr/bin/neutron-server --version
/usr/bin/neutron-l3-agent --version
Bernd Bausch gravatar imageBernd Bausch ( 2017-05-30 19:06:47 -0500 )edit

@Bernd Bausch The version is same, both are 9.2.0. It also worked the first time after i installed, after that i had to change some configuration to switch to openvswitch, and it stopped working.

Vikash Kathirvel gravatar imageVikash Kathirvel ( 2017-05-31 02:42:30 -0500 )edit

No idea, except generic troubleshooting tips: Use debug logging in the hope that you will get more information. Is this the only RPC that fails?

Bernd Bausch gravatar imageBernd Bausch ( 2017-05-31 05:59:00 -0500 )edit

@Bernd Bausch Yes, it is the only rpc failing, what is more baffling is it worked the first time and the last heartbeat even shows that it was alive a few days back, i have no idea what went wrong in the mean time. I will add the debug logs, please check if something makes sense.

Vikash Kathirvel gravatar imageVikash Kathirvel ( 2017-05-31 07:09:21 -0500 )edit

I've the same kind of issue and I got neutron-l3-agent idle me too. I'm working with neutron-server and neutron-l3-agent version 11.0.2, but I didn't install openvswitch. Any tips to solve this problem guys?

Francesco Lucconi gravatar imageFrancesco Lucconi ( 2018-03-13 09:15:39 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2018-05-13 09:37:55 -0500

I had a similar problem, I noticed my q-l3-plugin queue in rabbitmq had an ever increasing number of messages and plenty of consumers 'rabbitmqctl list_queues name consumers messages |grep q-l3-plugin'. I deleted the q-l3-plugin queue and broke the logjam. One interesting thing I noticed I had debug on and until I did this the config wasn't being dumped into the logs for the L3 agent until I cleared this queue. There are a number of stack overflow answers for how to delete, I used the pika one from here: https://stackoverflow.com/questions/5313027/rabbitmq-how-do-i-delete-all-messages-from-a-single-queue (https://stackoverflow.com/questions/5...)

edit flag offensive delete link more

Comments

I'm now noticing that my 3 controllers have their own q-l3-plugin.HOST queue and the sum of the consumer counts now matches the consumer count for the main q-l3-plugin queue. So maybe I had some config change that dropped my number of rpc workers leaving phantom consumers gumming up the works.

jomlowe gravatar imagejomlowe ( 2018-05-13 09:42:14 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

3 followers

Stats

Asked: 2017-05-30 11:40:22 -0500

Seen: 651 times

Last updated: May 13