Ask Your Question
0

nova-compute and nova-scheduler keep going down

asked 2013-08-02 08:45:40 -0500

shoubam gravatar image

updated 2013-08-05 02:42:43 -0500

nova-compute and nova-scheduler keep going into "down" state. After disabeling and enabeling them, they are listed as up for a short period of time (looks like on refresh cycle), but then stay down, and I don't have access to my instances any more.

    $ nova service-list;date
    +------------------+---------------+----------+---------+-------+----------------------------+
    | Binary           | Host          | Zone     | Status  | State | Updated_at                 |
    +------------------+---------------+----------+---------+-------+----------------------------+
    | nova-compute     | xxxxxx-0001   | nova     | enabled | down  | 2013-08-02T13:32:33.000000 |
    | nova-conductor   | osint-nova-01 | internal | enabled | up    | 2013-08-05T07:33:46.000000 |
    | nova-console     | osint-nova-01 | internal | enabled | up    | 2013-08-05T07:33:40.000000 |
    | nova-consoleauth | osint-nova-01 | internal | enabled | up    | 2013-08-05T07:33:40.000000 |
    | nova-scheduler   | osint-nova-01 | internal | enabled | down  | 2013-08-05T06:50:21.000000 |
    +------------------+---------------+----------+---------+-------+----------------------------+

The processes are running though:

ps aux |grep nova-compute
nova      8079  0.1  0.0 354668 43952 ?        S    15:28   0:00 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --logfile /var/log/nova/compute.log

$ ps aux |grep scheduler
nova     18130  0.0  2.6 381808 50684 ?        S    08:49   0:01 /usr/bin/python /usr/bin/nova-scheduler --config-file /etc/nova/nova.conf --logfile /var/log/nova/scheduler.log

and /var/log/nova/compute.log tell me

    2013-08-02 15:28:50.794 8079 DEBUG nova.servicegroup.api [-] ServiceGroup driver defined as an instance of db __new__ /usr/lib/python2.6/site-packages/nova/servicegroup/api.py:61
    2013-08-02 15:28:50.908 8079 INFO nova.manager [-] Skipping periodic task _periodic_update_dns because its interval is negative
    2013-08-02 15:28:51.004 8079 INFO nova.virt.driver [-] Loading compute driver 'libvirt.LibvirtDriver'
    2013-08-02 15:28:51.093 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] Making synchronous call on conductor ... multicall /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:583
    2013-08-02 15:28:51.094 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] MSG_ID is c9fcef41e877470189f39d3d2eeb340a multicall /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:586
    2013-08-02 15:28:51.095 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] UNIQUE_ID is 91380e8de6c44b66965da9dc7ec3d4f0. _add_unique_id /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:337
    2013-08-02 15:28:51.095 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] Pool creating new connection create /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:75
    2013-08-02 15:28:51.112 INFO nova.openstack.common.rpc.common [req-981212fc-da44-4652-a598-36f3c510d08d None None] Connected to AMQP server on osint-mq-01:5672

which I don't find very helpfull. Also /var/log/nova $ grep -i error * gives no output.

Scheduler and compute are not running on the same host. On the scheduler host, logs tell me:

root@nova-01:/var/log/nova $ grep -i error * |grep 2013-08-05 |grep -v api.log
    /var/log/nova $ grep -i error * |grep 2013-08-05 |grep -v api.log
scheduler.log:2013-08-05 08:29:41.173 17137 DEBUG nova.service [-] publish_errors : False wait /usr/lib/python2.6/site-packages/nova/service.py:205
scheduler.log:2013-08-05 08:29:41.193 17137 DEBUG nova.service [-] fatal_exception_format_errors : False wait /usr/lib/python2.6/site-packages/nova/service.py:205

Clocks are in sync thanks to ntp. BUT: the ... (more)

edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted
3

answered 2013-08-05 04:23:14 -0500

shoubam gravatar image

Thanks to this post I stopped all services, then started all services in a specific order (database and message queue first, then identity, then quantum and nova). Now everything is working.

I would suggest that this goes ether into the docs or the logs ...

edit flag offensive delete link more
1

answered 2013-08-02 10:51:59 -0500

armando-migliaccio gravatar image

are the scheduler and compute services running on the same host? Ensure their clocks are in sync. It's unlikely that the compute log has no traces of error, but the scheduler's might.

edit flag offensive delete link more

Comments

I have updated my question with some details regarding your post.

shoubam gravatar imageshoubam ( 2013-08-05 02:05:07 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2013-08-02 08:45:40 -0500

Seen: 4,000 times

Last updated: Aug 05 '13