overcloud compute services fluctuates

asked 2018-01-03 19:44:26 -0500

CloudEnthusiast gravatar image

updated 2018-01-03 19:46:25 -0500

Hi All, I have a lab deployed with RHOSP 10 - Undercloud and Overcloud using rackmount servers(DellPowerEdge R630) with 3 controllers, 3 compute nodes, 3 Ceph storage nodes. Everything just works fine. I observed there is instability in few compute services and orchestration services. They are up and immediately at the next moment they go down. No Error messages observed in nova logs. Could anyone please suggest me how to handle this? I can share other information on logs/outputs/config, if required. Please let me know.

[ospadmin@director ~]$ openstack compute service list
+-----+------------------+------------------------------------+----------+---------+-------+----------------------------+
|  ID | Binary           | Host                               | Zone     | Status  | State | Updated At                 |
+-----+------------------+------------------------------------+----------+---------+-------+----------------------------+
|  52 | nova-consoleauth | overcloud-controller-0.            | internal | enabled | up    | 2018-01-03T15:21:20.000000 |
|  55 | nova-consoleauth | overcloud-controller-2.            | internal | enabled | up    | 2018-01-03T15:22:32.000000 |
|  70 | nova-scheduler   | overcloud-controller-0.            | internal | enabled | up    | 2018-01-03T15:21:15.000000 |
|  73 | nova-scheduler   | overcloud-controller-2.            | internal | enabled | up    | 2018-01-03T15:22:32.000000 |
|  76 | nova-conductor   | overcloud-controller-0.            | internal | enabled | up    | 2018-01-03T15:21:19.000000 |
| 115 | nova-conductor   | overcloud-controller-2.            | internal | enabled | up    | 2018-01-03T15:22:37.000000 |
| 136 | nova-compute     | overcloud-compute-0.               | nova     | enabled | up    | 2018-01-03T15:21:19.000000 |
| 139 | nova-compute     | overcloud-compute-2.               | nova     | enabled | up    | 2018-01-03T15:21:15.000000 |
| 142 | nova-compute     | overcloud-compute-1.               | nova     | enabled | up    | 2018-01-03T15:21:14.000000 |
| 145 | nova-consoleauth | overcloud-controller-1.            | internal | enabled | up    | 2018-01-03T15:21:37.000000 |
| 169 | nova-scheduler   | overcloud-controller-1.            | internal | enabled | up    | 2018-01-03T15:21:36.000000 |
| 172 | nova-conductor   | overcloud-controller-1.            | internal | enabled | up    | 2018-01-03T15:21:37.000000 |
+-----+------------------+------------------------------------+----------+---------+-------+----------------------------+


[ospadmin@director ~]$ openstack compute service list
+-----+------------------+------------------------------------+----------+---------+-------+----------------------------+
|  ID | Binary           | Host                               | Zone     | Status  | State | Updated At                 |
+-----+------------------+------------------------------------+----------+---------+-------+----------------------------+
|  52 | nova-consoleauth | overcloud-controller-0.            | internal | enabled | down  | 2018-01-03T15:21:20.000000 |
|  55 | nova-consoleauth | overcloud-controller-2.            | internal | enabled | up    | 2018-01-03T15:22:32.000000 |
|  70 | nova-scheduler   | overcloud-controller-0.            | internal | enabled | down  | 2018-01-03T15:21:25.000000 |
|  73 | nova-scheduler   | overcloud-controller-2.            | internal | enabled | up    | 2018-01-03T15:22:32.000000 |
|  76 | nova-conductor   | overcloud-controller-0.            | internal | enabled | down  | 2018-01-03T15:21:23.000000 |
| 115 | nova-conductor   | overcloud-controller-2.            | internal | enabled | up    | 2018-01-03T15:22:37.000000 |
| 136 | nova-compute     | overcloud-compute-0.               | nova     | enabled | down  | 2018-01-03T15:21:19.000000 |
| 139 | nova-compute     | overcloud-compute-2.               | nova     | enabled | down  | 2018-01-03T15:21:25.000000 |
| 142 | nova-compute     | overcloud-compute-1.               | nova     | enabled | down  | 2018-01-03T15:21:24.000000 |
| 145 | nova-consoleauth | overcloud-controller-1.            | internal | enabled | down  | 2018-01-03T15:21:37.000000 |
| 169 | nova-scheduler   | overcloud-controller-1.            | internal | enabled | down  | 2018-01-03T15:21:36.000000 |
| 172 | nova-conductor   | overcloud-controller-1.            | internal | enabled | down  | 2018-01-03T15:21:37.000000 |
+-----+------------------+------------------------------------+----------+---------+-------+----------------------------+

--Regards

edit retag flag offensive close merge delete

Comments

"down" means that nova-api hasn't received an update from the service for a while. I think the timeout is in the area of 1 minute. This could mean that there are network or message queue problems or the services did indeed crash.

Bernd Bausch gravatar imageBernd Bausch ( 2018-01-03 20:55:35 -0500 )edit

If the Nova api log doesn't contain anything, switch on debug logging and restart nova-api and the other affected services on the controllers. Also switch on debug logging on the compute nodes and restart the compute services there. You should see something in the logs then.

Bernd Bausch gravatar imageBernd Bausch ( 2018-01-03 20:56:36 -0500 )edit

Thanks for your response Bernd ! For messaging the zaqar service is used in the deployment. Also once saw dashboard for overcloud was not coming up. After restarting httpd in directori gui became accessible. Later after some 2 days again , gui was inacessible unable to figure it out. Any clues?

CloudEnthusiast gravatar imageCloudEnthusiast ( 2018-01-04 04:32:46 -0500 )edit

i restarted haproxy and httpd services this time on all 3 controllers and tried. Yet issue persists. Not erros or warning in httpd access and error logs files. Please suggest.

CloudEnthusiast gravatar imageCloudEnthusiast ( 2018-01-04 04:36:53 -0500 )edit
1

Finally, it'resolved. Syned the time on all nodes(cntl, nova, stor) manually using the command timedatectl. No more flips in the services :)

CloudEnthusiast gravatar imageCloudEnthusiast ( 2018-03-05 20:39:29 -0500 )edit