Openstack database corruption
Hi all, I've successfully configured an openstask HA cluster composed by 3 controllers and 2 computes. One day, the VIP-DB switch to another host so a mysql connection was broken. Since then, i've an error when querying compute service:
+-----+------------------+-------------------------+----------+---------+-------+----------------------------+
[root@PRCONTROLLER01 mariadb(keystone_admin)]$ openstack compute service list
+-----+------------------+-------------------------+----------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+-----+------------------+-------------------------+----------+---------+-------+----------------------------+
| 3 | nova-consoleauth | PRCONTROLLER01.ftoma.mg | internal | enabled | down | 2017-08-11T11:54:45.000000 |
| 6 | nova-consoleauth | PRCONTROLLER02.ftoma.mg | internal | enabled | up | 2017-08-11T11:53:35.000000 |
| 9 | nova-consoleauth | PRCONTROLLER03.ftoma.mg | internal | enabled | up | 2017-08-11T11:53:35.000000 |
| 132 | nova-scheduler | PRCONTROLLER01.ftoma.mg | internal | enabled | down | 2017-08-11T11:54:45.000000 |
| 135 | nova-scheduler | PRCONTROLLER02.ftoma.mg | internal | enabled | up | 2017-08-11T11:53:35.000000 |
| 138 | nova-scheduler | PRCONTROLLER03.ftoma.mg | internal | enabled | up | 2017-08-11T11:53:34.000000 |
| 141 | nova-conductor | PRCONTROLLER01.ftoma.mg | internal | enabled | down | 2017-08-11T11:54:52.000000 |
| 153 | nova-conductor | PRCONTROLLER02.ftoma.mg | internal | enabled | up | 2017-08-11T11:53:43.000000 |
| 162 | nova-conductor | PRCONTROLLER03.ftoma.mg | internal | enabled | up | 2017-08-11T11:53:33.000000 |
| 164 | nova-compute | prcompute2.ftoma.mg | nova | enabled | up | 2017-08-11T11:53:40.000000 |
| 167 | nova-compute | prcompute1.ftoma.mg | nova | enabled | up | 2017-08-11T11:53:41.000000 |
+-----+------------------+-------------------------+----------+---------+-------+----------------------------+
Even if all these services ar UP an running well, the service show allways DOWN. I tried a restart of all cluster but the error is still there. How could I reset these nova services status on this particular node?
thx all
Are you sure it's a DB corruption? Check the log files of the failed services on controller 1.
Hi Bernd, sorry for the late answer, I was on vacation. I just come back now. I don't have any error on all services. But when requesting with the cli client, i have thoses service marked down. And those states keep changing every second. It's really strange. That's why i was thinking about database
State changes should leave traces in the nova log files. If you enable DEBUG logging, you should find something.
I've enabled DEBUG logging for nova services on controllers, but i cant see any major informations related to the problem. Could you indicate please what I need to check in?