Ask Your Question
0

Object appears when doing `swift list container` but `swift stat container object` returns a 404

asked 2018-08-01 05:31:18 -0600

drugcrazed gravatar image

When I run swift list container, I'll get a list of objects but not all of these objects exist:

$ swift --version
swift 2.4.0

$ swift list container
object-1
object-2

$ swift stat container object-1
Object HEAD failed: http://swift-proxy.my.domain/v1/AUTH_XXXXX/container/object-1 404 Not Found

If I upload an empty file and then delete it, the document disappears from the swift list container response.

My guess is that container-server and object-server are out of sync somehow but this object is fairly old so it should have been resolved by now. Since this isn't happening on our production environment I'm not concerned but I probably need to know:

  • How has this happened?
  • How can I get swift to return a correct list?
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2018-08-01 12:13:19 -0600

notmyname gravatar image

It's hard to get to the exact reasons why you're seeing what you're seeing, based on the info you've given so far. However, a brief overview of the write path and how the container listing gets updated may shine some light on a few areas that may result in what you're seeing.

When an object write request (PUT or DELETE) is received by the proxy server, the proxy sends new requests to the proper object servers. These new, backend requests also include a directive for the object server to update a particular container server. Like objects, container information is also replicated in the cluster. So if you've got 3x replicas for objects and 3x replicas for containers, each container server receiving an object write request will also get a directive to update a unique replica of the container.

After the object server has flushed the object write to disk successfully, it attempts the container update. If it fails for any reason, then an async_pending is created and a background daemon (the object-updater) processes it later. So your first thing to look at is how many async_pendings are in the system (use swift-recon to help get this info) and make sure the swift-object-updater is running on all the nodes. It could be that the container update failed initially and the update is queued in async_pendings and hasn't been processed.

However, if that isn't the issue, then perhaps the container updates were initially successful and all async_pendings have been processed in the cluster. In that case, maybe you've updated the container ring and container-replication hasn't been able to reconcile the various partial replicas in the system yet. Or maybe the container ring isn't the same on every node (and container-replication hasn't been able to reconcile the various partial replicas in the system yet). Check that the rings are the same everywhere (again swift-recon can help); check that the replicator daemons are all running.

Getting into more unlikely possibilities, perhaps the container drives have been filling up and load has been shed to other drives. Replication cannot possibly keep up in that case (because there's no space on primary locations). However, the listing request may be querying stale replicas and giving the strange results you see. Check drive fullness, add capacity as necessary, and ensure the replicator processes are running. I strongly doubt this is an issue in your cluster because you're able to write objects and get updated listings.

So all of the above is assuming the issue is related to the listings simply being out of date. Swift is designed to remain available even when there are failures in the cluster, so a side effect is sometimes giving stale results like you're seeing. However, it should reconcile itself. The above troubleshooting is to see where those issues may be.

Good luck!

edit flag offensive delete link more

Comments

Sorry for taking ages to get back to you!

It looks like something happened to our swift nodes this time last year, which meant that the metadata had all gone. Since it's not production, we're just manually fixing everything!

drugcrazed gravatar imagedrugcrazed ( 2018-09-28 06:27:56 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

2 followers

Stats

Asked: 2018-08-01 05:31:18 -0600

Seen: 49 times

Last updated: Aug 01