Swift return 404 when GET some objects after adding new HDDs.

asked 2017-12-29 07:28:14 -0600

VELYCHKOVSKY gravatar image

Hello, I have SAIO cluster, and decided to expand it with new HDDs. Before expanding I had 30 HDD and then added 45 new HDD.

Then I noticed than rebalance process was started, and files slowly moving to new HDDs. Btw, my buckets has millions small objects, and also I have many PUT requests because cluster now in fill progress.

After adding new HDDs I have noticed that some old objects was missed (404 not found) and this disapointed me(

I've make some debug steps with missed objects and find out, that I have wrong endpoints to the object. For example:

swift-get-nodes /etc/swift/object.ring.gz AUTH_d26a7f8b5e0f477ab2b206fcef8e7f9d thumbs_com 1/2/3/6/2/12362419_320x180.jpg

Account AUTH_d26a7f8b5e0f477ab2b206fcef8e7f9d Container thumbs_com Object 1/2/3/6/2/12362419_320x180.jpg

Partition 63316 Hash f7545a8af6daa9b54eb711e72edc1996

Server:Port Device 21 Server:Port Device 18 Server:Port Device 11 Server:Port Device 4 [Handoff] Server:Port Device 23 [Handoff] Server:Port Device 15 [Handoff]

curl -g -I -XHEAD "" curl -g -I -XHEAD "" curl -g -I -XHEAD "" curl -g -I -XHEAD "" # [Handoff] curl -g -I -XHEAD "" # [Handoff] curl -g -I -XHEAD "" # [Handoff]

Use your own device location of servers: such as "export DEVICE=/srv/node" ssh "ls -lah ${DEVICE:-/srv/node}/21/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" ssh "ls -lah ${DEVICE:-/srv/node}/18/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" ssh "ls -lah ${DEVICE:-/srv/node}/11/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" ssh "ls -lah ${DEVICE:-/srv/node}/4/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" # [Handoff] ssh "ls -lah ${DEVICE:-/srv/node}/23/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" # [Handoff] ssh "ls -lah ${DEVICE:-/srv/node}/15/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996" # [Handoff]

I've checked and find this object in another location /4/objects/63316/996/f7545a8af6daa9b54eb711e72edc1996

So it is looks like object exist, but container has wrong path to it and I'm getting 404 not found(

Can you help me to find out root of problem and make my cluster consistent back again ?

