neutron: Error, "AgentNotFoundByTypeHost: Agent with agent_type=L3 agent and host=compute1.example.com could not be found", caused by 'l2population'
Can any body help with this issue?
After I deployed my HA OpenStack Cluster (non-production) when I shut down a VM, I got the following error messages. If I do not restart or stop neutron-linuxbridge-agent.service
(in compute node), these logs will not stop printing.
/var/log/neutron/server.log
in controller node (The full error logs are in the last part).2017-12-28 16:01:26.964 16265 INFO neutron.notifiers.nova [-] Nova event response: {u'status': u'completed', u'tag': u'd2ab84b4-8339-491b-888b-ffaede27d795', u'name': u'network-vif-unplugged', u'server_uuid': u'e6dac399-7743-46ed-a384-1cecca3ac3f4', u'code': 200} 2017-12-28 16:01:27.646 16265 ERROR oslo_messaging.rpc.server [req-edcb230d-6314-4b87-b13e-51691254391d - - - - -] Exception during message handling: AgentNotFoundByTypeHost: Agent with agent_type=L3 agent and host=compute1.example.com could not be found
/var/log/neutron/linuxbridge-agent.log
in compute node (The full error logs are in the last part).2017-12-28 16:01:32.881 1510 ERROR neutron.plugins.ml2.drivers.agent._common_agent [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming\n res = self.dispatcher.dispatch(message)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch\n return self._do_dispatch(endpoint, method, ctxt, args)\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch\n result = func(ctxt, **new_args)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py", line 234, in update_device_down\n n_const.PORT_STATUS_DOWN, host)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/rpc.py", line 331, in notify_l2pop_port_wiring\n l2pop_driver.obj.update_port_down(port_context)\n', u' File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/l2pop/mech_driver.py", line 253, in update_port_down\n admin_context, agent_host, [port[\'device_id\']]):\n', u' File "/usr/lib/python2.7/site-packages/neutron/db/l3_agentschedulers_db.py", line 303, in list_router_ids_on_host\n context, constants.AGENT_TYPE_L3, host)\n', u' File "/usr/lib/python2.7/site-packages/neutron/db/agents_db.py", line 291, in _get_agent_by_type_and_host\n host=host)\n', u'AgentNotFoundByTypeHost: Agent with agent_type=L3 agent and host=compute1.example.com could not be found\n'].
I used Pike
to deploy my HA
OpenStack cluster, the OS is CentOS 7.x
. There are four nodes in this cluster, three controller nodes and one compute. These four nodes are all VMs in a physical host, each node has 4 cpu cores and 8GB ram. Controller and cluster services (such as pacemaker, haproxy, memcached, rabbitmq, mariadb, keystone and so on) are all deployed on controller nodes. Host names are resolved through DNS server and time of all nodes are synchronized through NTP server.
Everything seemed to work well after I deploying the HA cluster until I shut down a VM. Error messages begin to print.
This issue confused me for several days, during these days, I checked neutron conf files over and over, redeployed neutron service many times, and tried a lot of ways to locate where I did wrong and attempted to fix it, but ...
Good, I could see the problem, the truth is that I'm looking for help because I have a problem, similar if there was something, I share it without problem or if I discover something.
https://ask.openstack.org/en/question...
You can try to disable l2population mechanism, this may help you. I have post more details in your own question.
It seems my cluster has the same issue. I'm curious if you, in the few months since, have discovered another solution. Or, if you have filed a bug report that I can track.