Ocata snapshot fails on select instances

asked 2018-11-21 13:12:39 -0600

cprestia gravatar image

updated 2018-12-04 11:00:02 -0600

My Openstack setup was deployed following the Centos 7 installation guide. Glance is set up to use

[glance_store]
stores = file,http
default_store = file
filesystem_store_datadir = /var/lib/glance/images/

When a snapshot is run on instances that have just been created or on instances that have not been changed since creation it runs with no issue. However, when a snapshot is attempted on an instance that has been running and actively been used/changed the snapshot fails. The nova-compute log shows:

2018-11-09 20:35:13.659 74124 DEBUG nova.compute.manager [req-1cdae803-ad19-41e9-a748-5e9347ca6018 17a3f2c0e373455e8ff4a4b883be019d e42ff2c5305b490e877ec2f7a9c12796 - - -] [instance: c67ce87f-c97b-4a4f-acab-b8ead60425a3] Cleaning up image 576243a4-9d04-48e8-b2f1-18b64e9fce16 decorated_function /usr/lib/python2.7/site-packages/nova/compute/manager.py:236

2018-11-09 20:35:13.659 74124 ERROR nova.compute.manager [instance: c67ce87f-c97b-4a4f-acab-b8ead60425a3] ImageNotAuthorized: Not authorized for image 576243a4-9d04-48e8-b2f1-18b64e9fce16.

2018-11-09 20:35:13.806 74124 ERROR nova.compute.manager [req-1cdae803-ad19-41e9-a748-5e9347ca6018 17a3f2c0e373455e8ff4a4b883be019d e42ff2c5305b490e877ec2f7a9c12796 - - -] [instance: c67ce87f-c97b-4a4f-acab-b8ead60425a3] Error while trying to clean up image 576243a4-9d04-48e8-b2f1-18b64e9fce16

2018-11-09 20:35:13.806 74124 ERROR nova.compute.manager [instance: c67ce87f-c97b-4a4f-acab-b8ead60425a3] HTTPUnauthorized: 401 Unauthorized
2018-11-09 20:35:13.806 74124 ERROR nova.compute.manager [instance: c67ce87f-c97b-4a4f-acab-b8ead60425a3] This server could not verify that you are authorized to access the document you requested. Either you supplied the wrong credentials (e.g., bad password), or your browser does not understand how to supply the credentials required.
2018-11-09 20:35:13.806 74124 ERROR nova.compute.manager [instance: c67ce87f-c97b-4a4f-acab-b8ead60425a3]     (HTTP 401)

2018-11-09 20:35:14.059 74124 ERROR oslo_messaging.rpc.server ImageNotAuthorized: Not authorized for image 576243a4-9d04-48e8-b2f1-18b64e9fce16.

Size of the image should not be a problem as successful snapshots have been run on instances from 17GB to 497GB. I believe the issue may be that when an instance changes enough, as the snapshot is gathering all the changes there is a token timeout which leads to these errors. I increased the token expiration to 7200, but it still fails. Has anyone seen this error before or have an idea about how to fix it, I've run out of ideas?

Update: I tried setting the transport_url in the glance_api and got the same errors as before. Something I noticed was it failed after 3 hours instead of 12 hours as it had been. I am not sure if setting the transport_url had a hand in this, but it is the first change I have seen. I still feel that it has to do with the changes to the disk as I am still able to take snapshots of instances not in use. I recently lost the ability to take a snapshot of an instance that I had previously successfully taken a snapshot after I used it for two weeks. Any help would be appreciated.

Update II: I have developed a workaround for the issue. The first step is to shut down the instance. Then by going on the compute server that the instance is located and going to the /var/lib/nova/instances/<instance id=""> directory, run

qemu-img commit -f qcow2 disk

This clears the delta of all the changes from the ... (more)

edit retag flag offensive close merge delete

Comments

check the transport url in nova and glance configuration file

Eranachandran gravatar imageEranachandran ( 2018-11-21 21:32:36 -0600 )edit

The transport_url was set in the nova.conf file, but not in the glance-api.conf file. I have added it and am trying another snapshot. In the glance-api.conf file the value was set to none, do you know what it defaults to? I am trying to understand why the snapshot sometimes works?

cprestia gravatar imagecprestia ( 2018-11-26 08:38:32 -0600 )edit

Like you, my first thought is an expired token. When does the snapshot start? The compute log should have a corresponding INFO message entitled instance snapshotting. What's its timestamp?

Bernd Bausch gravatar imageBernd Bausch ( 2018-11-27 19:41:10 -0600 )edit

Another possible aspect: You may have set use_user_token.

Bernd Bausch gravatar imageBernd Bausch ( 2018-11-27 19:54:30 -0600 )edit

The timestamp for instance snapshotting is 2018-11-09 08:30:55.209. I also checked use_user_token and it is commented out. I see it is set to true in the comment so I am assuming since the default is true it is true? If so I can manually set this to false to see if that helps.

cprestia gravatar imagecprestia ( 2018-11-28 08:43:14 -0600 )edit