cpu_util meter not being calculated as expected leading to delay in scaling

asked 2019-03-05 23:59:24 -0500

arvind.kumar gravatar image

A design issue is observed in ceilometer service of Openstack. Setup include multiple compute nodes and 3 controller nodes. Meters from each compute node are sent to all the 3 ceilometer instances via RabbitMQ in round robin fashion at an interval of 10 min. After transformation of cumulative cpu meter data, cpu_util is generated by ceilometer instance at controller node and is published to the http address configured in ceilometer pipeline configuration. cpu_util is used by the application to take the decision if scaling of VM needs to be triggered or not. Ceilometer instance calculates cpu_util for a VM from the difference between cumulative cpu usage of VM at two timestamp divided by the timestamp difference. Let’s say 1 compute node send the cumulative cpu usage of a VM (C1, C2, C3, C4) at timestamp T1, T2, T3, T4 (difference between any two timestamp is 10 min). Now (C1,T1) & (C4,T4) tuple is received by ceilometer instance 1, (C2,T2) by instance 2, (C3,T3) by instance 3. Here even if CPU usage of VM is increased between T1 & T2, cpu_util is calculated for 30 min duration (T1 & T4) rather than as expected for 10 min. This leads to scaling getting triggered after T4 that too when CPU usage is consistently above the threshold between T1 and T4. Please suggest how could this issue could be resolved. Do we have any solution to bind VM or compute node meter data to specific ceilometer instance for processing?

edit retag flag offensive close merge delete