Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

nova-compute keeps going down

nova-compute keeps going into "down" state. After disabeling and enabeling nova-compute, it is listed as up for a short period of time (looks like on refresh cycle), but then stays down, and I don't have access to my instances any more.

$ nova service-list
+------------------+---------------+----------+---------+-------+----------------------------+
| Binary           | Host          | Zone     | Status  | State | Updated_at                 |
+------------------+---------------+----------+---------+-------+----------------------------+
| nova-compute     | xxxxxx-0001   | nova     | enabled | down  | 2013-08-02T13:32:33.000000 |

The process is running though:

ps aux |grep nova-compute
nova      8079  0.1  0.0 354668 43952 ?        S    15:28   0:00 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --logfile /var/log/nova/compute.log

and /var/log/nova/compute.log tell me

    2013-08-02 15:28:50.794 8079 DEBUG nova.servicegroup.api [-] ServiceGroup driver defined as an instance of db __new__ /usr/lib/python2.6/site-packages/nova/servicegroup/api.py:61
    2013-08-02 15:28:50.908 8079 INFO nova.manager [-] Skipping periodic task _periodic_update_dns because its interval is negative
    2013-08-02 15:28:51.004 8079 INFO nova.virt.driver [-] Loading compute driver 'libvirt.LibvirtDriver'
    2013-08-02 15:28:51.093 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] Making synchronous call on conductor ... multicall /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:583
    2013-08-02 15:28:51.094 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] MSG_ID is c9fcef41e877470189f39d3d2eeb340a multicall /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:586
    2013-08-02 15:28:51.095 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] UNIQUE_ID is 91380e8de6c44b66965da9dc7ec3d4f0. _add_unique_id /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:337
    2013-08-02 15:28:51.095 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] Pool creating new connection create /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:75
    2013-08-02 15:28:51.112 INFO nova.openstack.common.rpc.common [req-981212fc-da44-4652-a598-36f3c510d08d None None] Connected to AMQP server on osint-mq-01:5672

which I don't find very helpfull. Where else could I look for hints? What could this be?

nova-compute keeps going down

nova-compute keeps going into "down" state. After disabeling and enabeling nova-compute, it is listed as up for a short period of time (looks like on refresh cycle), but then stays down, and I don't have access to my instances any more.

$ nova service-list
+------------------+---------------+----------+---------+-------+----------------------------+
| Binary           | Host          | Zone     | Status  | State | Updated_at                 |
+------------------+---------------+----------+---------+-------+----------------------------+
| nova-compute     | xxxxxx-0001   | nova     | enabled | down  | 2013-08-02T13:32:33.000000 |

The process is running though:

ps aux |grep nova-compute
nova      8079  0.1  0.0 354668 43952 ?        S    15:28   0:00 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --logfile /var/log/nova/compute.log

and /var/log/nova/compute.log tell me

    2013-08-02 15:28:50.794 8079 DEBUG nova.servicegroup.api [-] ServiceGroup driver defined as an instance of db __new__ /usr/lib/python2.6/site-packages/nova/servicegroup/api.py:61
    2013-08-02 15:28:50.908 8079 INFO nova.manager [-] Skipping periodic task _periodic_update_dns because its interval is negative
    2013-08-02 15:28:51.004 8079 INFO nova.virt.driver [-] Loading compute driver 'libvirt.LibvirtDriver'
    2013-08-02 15:28:51.093 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] Making synchronous call on conductor ... multicall /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:583
    2013-08-02 15:28:51.094 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] MSG_ID is c9fcef41e877470189f39d3d2eeb340a multicall /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:586
    2013-08-02 15:28:51.095 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] UNIQUE_ID is 91380e8de6c44b66965da9dc7ec3d4f0. _add_unique_id /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:337
    2013-08-02 15:28:51.095 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] Pool creating new connection create /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:75
    2013-08-02 15:28:51.112 INFO nova.openstack.common.rpc.common [req-981212fc-da44-4652-a598-36f3c510d08d None None] Connected to AMQP server on osint-mq-01:5672

which I don't find very helpfull.

Clocks are in sync thanks to ntp.

Scheduler and compute are not running on the same host.

Where else could I look for hints? What could this be?

nova-compute keeps going down

nova-compute keeps going into "down" state. After disabeling and enabeling nova-compute, it is listed as up for a short period of time (looks like on refresh cycle), but then stays down, and I don't have access to my instances any more.

$ nova service-list
+------------------+---------------+----------+---------+-------+----------------------------+
| Binary           | Host          | Zone     | Status  | State | Updated_at                 |
+------------------+---------------+----------+---------+-------+----------------------------+
| nova-compute     | xxxxxx-0001   | nova     | enabled | down  | 2013-08-02T13:32:33.000000 |

The process is running though:

ps aux |grep nova-compute
nova      8079  0.1  0.0 354668 43952 ?        S    15:28   0:00 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --logfile /var/log/nova/compute.log

and /var/log/nova/compute.log tell me

    2013-08-02 15:28:50.794 8079 DEBUG nova.servicegroup.api [-] ServiceGroup driver defined as an instance of db __new__ /usr/lib/python2.6/site-packages/nova/servicegroup/api.py:61
    2013-08-02 15:28:50.908 8079 INFO nova.manager [-] Skipping periodic task _periodic_update_dns because its interval is negative
    2013-08-02 15:28:51.004 8079 INFO nova.virt.driver [-] Loading compute driver 'libvirt.LibvirtDriver'
    2013-08-02 15:28:51.093 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] Making synchronous call on conductor ... multicall /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:583
    2013-08-02 15:28:51.094 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] MSG_ID is c9fcef41e877470189f39d3d2eeb340a multicall /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:586
    2013-08-02 15:28:51.095 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] UNIQUE_ID is 91380e8de6c44b66965da9dc7ec3d4f0. _add_unique_id /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:337
    2013-08-02 15:28:51.095 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] Pool creating new connection create /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:75
    2013-08-02 15:28:51.112 INFO nova.openstack.common.rpc.common [req-981212fc-da44-4652-a598-36f3c510d08d None None] Connected to AMQP server on osint-mq-01:5672

which I don't find very helpfull. Also /var/log/nova $ grep -i error * gives no output.

Clocks are in sync thanks to ntp.

Scheduler and compute are not running on the same host.

host. On the scheduler host, logs tell me:

root@nova-01:/var/log/nova $ grep -i error * |grep 2013-08-05 |grep -v api.log
    /var/log/nova $ grep -i error * |grep 2013-08-05 |grep -v api.log
scheduler.log:2013-08-05 08:29:41.173 17137 DEBUG nova.service [-] publish_errors : False wait /usr/lib/python2.6/site-packages/nova/service.py:205
scheduler.log:2013-08-05 08:29:41.193 17137 DEBUG nova.service [-] fatal_exception_format_errors : False wait /usr/lib/python2.6/site-packages/nova/service.py:205

Where else could I look for hints? What could this be?

nova-compute keeps and nova-scheduler keep going down

nova-compute keeps and nova-scheduler keep going into "down" state. After disabeling and enabeling nova-compute, it is them, they are listed as up for a short period of time (looks like on refresh cycle), but then stays stay down, and I don't have access to my instances any more.

 $ nova service-list
service-list;date
    +------------------+---------------+----------+---------+-------+----------------------------+
 | Binary           | Host          | Zone     | Status  | State | Updated_at                 |
 +------------------+---------------+----------+---------+-------+----------------------------+
 | nova-compute     | xxxxxx-0001   | nova     | enabled | down  | 2013-08-02T13:32:33.000000 |
    | nova-conductor   | osint-nova-01 | internal | enabled | up    | 2013-08-05T07:33:46.000000 |
    | nova-console     | osint-nova-01 | internal | enabled | up    | 2013-08-05T07:33:40.000000 |
    | nova-consoleauth | osint-nova-01 | internal | enabled | up    | 2013-08-05T07:33:40.000000 |
    | nova-scheduler   | osint-nova-01 | internal | enabled | down  | 2013-08-05T06:50:21.000000 |
    +------------------+---------------+----------+---------+-------+----------------------------+

The process is processes are running though:

ps aux |grep nova-compute
nova      8079  0.1  0.0 354668 43952 ?        S    15:28   0:00 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --logfile /var/log/nova/compute.log

$ ps aux |grep scheduler
nova     18130  0.0  2.6 381808 50684 ?        S    08:49   0:01 /usr/bin/python /usr/bin/nova-scheduler --config-file /etc/nova/nova.conf --logfile /var/log/nova/scheduler.log

and /var/log/nova/compute.log tell me

    2013-08-02 15:28:50.794 8079 DEBUG nova.servicegroup.api [-] ServiceGroup driver defined as an instance of db __new__ /usr/lib/python2.6/site-packages/nova/servicegroup/api.py:61
    2013-08-02 15:28:50.908 8079 INFO nova.manager [-] Skipping periodic task _periodic_update_dns because its interval is negative
    2013-08-02 15:28:51.004 8079 INFO nova.virt.driver [-] Loading compute driver 'libvirt.LibvirtDriver'
    2013-08-02 15:28:51.093 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] Making synchronous call on conductor ... multicall /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:583
    2013-08-02 15:28:51.094 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] MSG_ID is c9fcef41e877470189f39d3d2eeb340a multicall /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:586
    2013-08-02 15:28:51.095 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] UNIQUE_ID is 91380e8de6c44b66965da9dc7ec3d4f0. _add_unique_id /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:337
    2013-08-02 15:28:51.095 DEBUG nova.openstack.common.rpc.amqp [req-981212fc-da44-4652-a598-36f3c510d08d None None] Pool creating new connection create /usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py:75
    2013-08-02 15:28:51.112 INFO nova.openstack.common.rpc.common [req-981212fc-da44-4652-a598-36f3c510d08d None None] Connected to AMQP server on osint-mq-01:5672

which I don't find very helpfull. Also /var/log/nova $ grep -i error * gives no output.

Clocks are in sync thanks to ntp.

Scheduler and compute are not running on the same host. On the scheduler host, logs tell me:

root@nova-01:/var/log/nova $ grep -i error * |grep 2013-08-05 |grep -v api.log
    /var/log/nova $ grep -i error * |grep 2013-08-05 |grep -v api.log
scheduler.log:2013-08-05 08:29:41.173 17137 DEBUG nova.service [-] publish_errors : False wait /usr/lib/python2.6/site-packages/nova/service.py:205
scheduler.log:2013-08-05 08:29:41.193 17137 DEBUG nova.service [-] fatal_exception_format_errors : False wait /usr/lib/python2.6/site-packages/nova/service.py:205

Clocks are in sync thanks to ntp. BUT: the "Updated_at" timestamp contains times which seem to be two hours behind system time:

   $ nova service-list;date
    +------------------+---------------+----------+---------+-------+----------------------------+
    | Binary           | Host          | Zone     | Status  | State | Updated_at                 |
    +------------------+---------------+----------+---------+-------+----------------------------+
    | nova-compute     | xxxxxx-0001   | nova     | enabled | down  | 2013-08-02T13:32:33.000000 |
    | nova-conductor   | osint-nova-01 | internal | enabled | up    | 2013-08-05T07:33:46.000000 |
    | nova-console     | osint-nova-01 | internal | enabled | up    | 2013-08-05T07:33:40.000000 |
    | nova-consoleauth | osint-nova-01 | internal | enabled | up    | 2013-08-05T07:33:40.000000 |
    | nova-scheduler   | osint-nova-01 | internal | enabled | down  | 2013-08-05T06:50:21.000000 |
    +------------------+---------------+----------+---------+-------+----------------------------+
    Mon Aug  5 09:33:49 CEST 2013

Where else could I look for hints? What could this be?