Swift - replication issue

asked 2014-09-07 09:29:50 -0500

Anil gravatar image

updated 2014-09-08 15:15:40 -0500

smaffulli gravatar image

HI Guys I have created a tree node swift cluster [1 proxy 2 storage node ] with replicate of 3 for some demo & knowlede sharing but looks like swift replication is not working properly as deleted 1 data file from 1 server but it was not created back . Not sure what is missing here .

172.168.10.200  swift-proxy 
172.168.10.201  swift-node1 
172.168.10.202  swift-node2

Below is the file which was deleted fron node2 [ 1 replica out of 3]

[root@swift-node1 swift]#  ls -ltR /srv/node/ | grep -i 1410089049.97687.data
-rw-------. 1 swift swift 1581 Sep  7 19:24 1410089049.97687.data
-rw-------. 1 swift swift 1581 Sep  7 19:24 1410089049.97687.data


[root@swift-node2 swift]# rm /srv/node/z2d1/objects/249955/5ca/f418da3f4b2db4188405cb38c08655ca/1410089049.97687.data 
rm: remove regular file `/srv/node/z2d1/objects/249955/5ca/f418da3f4b2db4188405cb38c08655ca/1410089049.97687.data'? y

All daemons running fine & rsyn is showing file on othere node

[root@swift-node2 swift]# for i in `cat /var/tmp/swift-start`; do service $i status; done
openstack-swift-account (pid  7175) is running...
openstack-swift-account-auditor (pid  7186) is running...
openstack-swift-account-reaper (pid  7197) is running...
openstack-swift-account-replicator (pid  7211) is running...
openstack-swift-container (pid  7226) is running...
openstack-swift-container-auditor (pid  7238) is running...
openstack-swift-container-replicator (pid  7257) is running...
openstack-swift-container-updater (pid  7272) is running...
openstack-swift-object (pid  7284) is running...
openstack-swift-object-auditor (pid  7304) is running...
openstack-swift-object-replicator (pid  7315) is running...
openstack-swift-object-updater (pid  7334) is running...


[root@swift-node2 swift]# rsync 172.168.10.202::object/z2d1/objects/249955/5ca/f418da3f4b2db4188405cb38c08655ca/
drwxr-xr-x        4096 2014/09/07 20:28:27 .

[root@swift-node2 swift]# rsync 172.168.10.201::object/z1d1/objects/249955/5ca/f418da3f4b2db4188405cb38c08655ca/
drwxr-xr-x        4096 2014/09/07 19:24:10 .
-rw-------        1581 2014/09/07 19:24:10 1410089049.97687.data

Logs & swift-recon showing no replication issue

[root@swift-node2 swift]# swift-recon -r
===============================================================================
--> Starting reconnaissance on 2 hosts
===============================================================================
[2014-09-07 22:11:46] Checking on replication
[replication_time] low: 0, high: 0, avg: 0.0, total: 0, Failed: 0.0%, no_result: 0, reported: 2
Oldest completion was 2014-09-07 14:11:25 (20 seconds ago) by 172.168.10.202:6000.
Most recent completion was 2014-09-07 14:11:34 (11 seconds ago) by 172.168.10.201:6000.
===============================================================================

[root@swift-proxy swift]# swift -A http://172.168.10.200:8080/auth/v1.0 -U admin:admin -K admin stat
       Account: AUTH_admin
    Containers: 1
       Objects: 1
         Bytes: 1581
 Accept-Ranges: bytes
   X-Timestamp: 1410089049.73849
    X-Trans-Id: txb22e0e4518544a5fa6e12-00540c51f6
  Content-Type: text/plain; charset=utf-8
[root@swift-proxy swift]# swift -A http://172.168.10.200:8080/auth/v1.0 -U admin:admin -K admin list logs
anaconda-ks.cfg

Rsyncd.conf has below config

[root@swift-node2 swift]# grep -v ^# /etc/rsyncd.conf 
uid = swift
gid = swift
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
address = 172.168.10.202

[account]
max connections = 20
path = /srv/node/
read only = false
list = yes
lock file = /var/lock/account.lock

[container]
max connections = 20
path = /srv/node/
read only = false
list = yes
lock file = /var/lock/container.lock

[object]
max connections = 20
path = /srv ...
(more)
edit retag flag offensive close merge delete

3 answers

Sort by ยป oldest newest most voted
1

answered 2014-09-11 13:22:10 -0500

torgomatic gravatar image

When Swift deletes an object, it also deletes its entry from the partition's hashes.pkl file. Replication uses hashes.pkl to avoid doing a bunch of disk IO for an up-to-date partition.

You deleted the file by going around Swift and talking to the filesystem directly, so replication thinks the file is still there.

In an actively-used Swift cluster, other object activity (PUTs, DELETEs) will cause hashes.pkl to get updated, and after that happens, replication will restore the manually-deleted object. However, that won't happen until there's other activity that coincidentally fixes hashes.pkl, so it may take a while. On an inactive demo system, "a while" translates to "forever", hence what you saw.

edit flag offensive delete link more
0

answered 2014-09-08 00:03:04 -0500

ebyenjoys gravatar image

Check the account,container object logs in the storage node. Also take a look on your syslog. I am not sure about the issue you are facing,but indeed you would get the clue if you go through the logs.

edit flag offensive delete link more
0

answered 2014-09-09 01:47:07 -0500

Anil gravatar image

Looks like this feature is not yet available in swift found below Bug which is already expired

https://bugs.launchpad.net/swift/+bug...

So for future users who trying same stuff to convience there customers abt self healing with replica copied

" - > If you change a file on a storage node, then the auditor detects the different etag and it replaces the object. If instead you delete it, then the auditor doesn't recognize the missing file and it doesn't replicate it"

Not sure why there was no progress on this ,, but would be happy if this will be available in any future release .

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2014-09-07 09:29:50 -0500

Seen: 1,212 times

Last updated: Sep 11 '14