TroubleShooting Openstack
Hi,
Just want to get peoples opinions on how they go about troubleshoot the openstack infrastructure, particularly openstack services:
I usually follow Vm provisioning life cycle build within openstack to track down where there might be problems:
Pulled this ilearnstack.com:
Dashboard or CLI gets the user credential and does the REST call to Keystone for authentication.
Keystone authenticate the credentials and generate & send back auth-token which will be used for sending request to other Components through REST-call.
Dashboard or CLI convert the new instance request specified in ‘launch instance’ or ‘nova-boot’ form to REST API request and send it to nova-api.
nova-api receive the request and sends the request for validation auth-token and access permission to keystone.
Keystone validates the token and sends updated auth headers with roles and permissions.
nova-api interacts with nova-database.
Creates initial db entry for new instance.
nova-api sends the rpc.call request to nova-scheduler excepting to get updated instance entry with host ID specified.
nova-scheduler picks the request from the queue.
nova-scheduler interacts with nova-database to find an appropriate host via filtering and weighing.
Returns the updated instance entry with appropriate host ID after filtering and weighing.
nova-scheduler sends the rpc.cast request to nova-compute for ‘launching instance’ on appropriate host .
nova-compute picks the request from the queue.
nova-compute send the rpc.call request to nova-conductor to fetch the instance information such as host ID and flavor( Ram , CPU ,Disk).
nova-conductor picks the request from the queue.
nova-conductor interacts with nova-database.
Return the instance information.
nova-compute picks the instance information from the queue.
nova-compute does the REST call by passing auth-token to glance-api to get the Image URI by Image ID from glance and upload image from image storage.
glance-api validates the auth-token with keystone.
nova-compute get the image metadata.
nova-compute does the REST-call by passing auth-token to Network API to allocate and configure the network such that instance gets the IP address.
neutron-server validates the auth-token with keystone.
nova-compute get the network info.
nova-compute does the REST call by passing auth-token to Volume API to attach volumes to instance.
cinder-api validates the auth-token with keystone.
nova-compute gets the block storage info.
nova-compute generates data for hypervisor driver and executes request on Hypervisor( via libvirt or api).
The table represents the Instance state at various steps during the provisioning : Status Task Power state Steps Build scheduling None 3-12 Build networking None 22-24 Build block_device_mapping None 25-27 Build spawing None 28 Active none Running
For amqp message flows I have trace options turned on in the rabbitmq-web-console - so I can see message flow to worker processes. API queries - I use curl or cli clients to verify correct working of a particular API.
If a process is dyeing silently - I will run it from the command line and turn debug on in the configuration file, also checking log files. i.e nova-compute - /usr/bin/nova-compute.
Just what other troubleshooting techniques people use to solve problems within an openstack infrastructure - if this is being asked in the ...