Ask Your Question
0

linuxbridge-agent fails to start, fixed, now uses 100% cpu

asked 2018-05-31 13:28:26 -0500

mike99201 gravatar image

CentOS 7 with Openstack Queens, one controller and one compute node.

My network bridge agent service was starting, running for a few seconds, failing and then restarting on both the controller and compute node. Openstack network agent list showed both bridge agents as down. The service wouldn't log anything on either the controller or compute node, even when setting debug=true in the conf files. It actually wasn't even creating a log file at all. The only way I knew what was going on was when I logged back into the machine, abrt-cli alerted me of a previous error. When I checked, this is what it said:

# abrt-cli list
id 07a20b3caca84e9ba74065b2e246c76b94b3178c
reason:         __init__.py:98:wrap:AttributeError: 'module' object has no attribute 'is_coroutine_function'
time:           Mon 21 May 2018 01:49:24 PM EDT
cmdline:        /usr/bin/python2 /usr/bin/neutron-linuxbridge-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/linuxbridge_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-linuxbridge-agent --log-file /var/log/neutron/linuxbridge-agent.log
package:        openstack-neutron-linuxbridge-12.0.1-1.el7
uid:            992 (neutron)
count:          7574
Directory:      /var/spool/abrt/Python-2018-05-21-13:49:24-26027

This error doesn't seem to be common on these forums. After a bit of digging, I found one foreign user on another message board, and using google translate, determined he solved this issue by commenting out lines 98 and 99 in /usr/lib/python2.7/site-packages/tenacity/__init__.py

def retry(*dargs, **dkw):
    """Wrap a function with a new `Retrying` object.

    :param dargs: positional arguments passed to Retrying object
    :param dkw: keyword arguments passed to the Retrying object
    """
    # support both @retry and @retry() as valid syntax
    if len(dargs) == 1 and callable(dargs[0]):
        return retry()(dargs[0])
    else:
        def wrap(f):
            if asyncio and asyncio.iscoroutinefunction(f):
                r = AsyncRetrying(*dargs, **dkw)
#            elif tornado and tornado.gen.is_coroutine_function(f):
#                r = TornadoRetrying(*dargs, **dkw)
            else:
                r = Retrying(*dargs, **dkw)
            return r.wraps(f)

        return wrap

So I did this, and it did solve the error, both services started and stayed up and reported as up in openstack. However the service now uses 100% cpu time on a core. Taking a look at the code, it would seem like it is now stuck in an infinite loop in this retry function for asyncio and tornado. I also don't like arbitrarily commenting out lines of code, so I changed it back.

After more digging, I found this little gem:

https://github.com/jd/tenacity/issues/99

It appears as though with python2.7 in centos7, the latest tornado package in the repo is 4.4, but the call on line 98 of __init__.py was only added in tornado 4.5, which is the source of my error. Great. Lets just use pip to update tornado to the latest version.

#yum install python-pip python-wheel
#pip install tornado==5.0.2

Restart all of the openstack services (or actually just the entire machine at this point just ... (more)

edit retag flag offensive close merge delete

1 answer

Sort by » oldest newest most voted
0

answered 2018-06-04 07:46:55 -0500

mike99201 gravatar image

fixed with this patch, patch was recent enough that it was not in the distro release

https://review.openstack.org/#/c/554258/

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2018-05-31 13:28:26 -0500

Seen: 74 times

Last updated: May 31 '18