Ask Your Question
0

Failover principle of Swift

asked 2012-11-15 16:58:52 -0500

gucluakkaya gravatar image

Hi,

During my test i came across following behaviour:

My environment consists of 1 proxy node and 6 storage nodes (account/container/object). I close three nodes and tried to make some gets to certain objects. I have discovered that swift proxy tried to reach the nodes which are closed thus receiving connection timeouts. I guess i did not configure my ring properly against failover. What am i missing in my configuration? You can see my configuration below:

ring configuration with partition power 18 replication 3

account-server.conf

[DEFAULT] bind_ip = 0.0.0.0 workers = 8

[pipeline:main] pipeline = account-server

[app:account-server] use = egg:swift#account

[account-replicator] run_pause=900

[account-auditor]

[account-reaper]

container-server.conf

[DEFAULT] bind_ip = 0.0.0.0 workers = 8

[pipeline:main] pipeline = container-server

[app:container-server] use = egg:swift#container

[container-replicator] run_pause=900

[container-updater]

[container-auditor]

object-server.conf

[DEFAULT] bind_ip = 0.0.0.0 workers = 8

[pipeline:main] pipeline = object-server

[app:object-server] use = egg:swift#object

[object-replicator] run_pause=900 ring_check_interval=900

[object-updater]

[object-auditor]

edit retag flag offensive close merge delete

4 answers

Sort by ยป oldest newest most voted
0

answered 2012-11-15 17:04:34 -0500

gucluakkaya gravatar image

id zone ip address port name weight partitions balance meta 0 1 10.0.0.208 6002 sdb1 100.00 131072 0.00 1 2 10.0.0.207 6002 sdb1 100.00 131072 0.00 2 3 10.0.0.206 6002 sdb1 100.00 131072 0.00 3 4 10.0.0.205 6002 sdb1 100.00 131072 0.00 4 5 10.0.0.204 6002 sdb1 100.00 131072 0.00 5 6 10.0.0.132 6002 sdb1 100.00 131072 0.00

0 1 10.0.0.208 6001 sdb1 100.00 131072 0.00 1 2 10.0.0.207 6001 sdb1 100.00 131072 0.00 2 3 10.0.0.206 6001 sdb1 100.00 131072 0.00 3 4 10.0.0.205 6001 sdb1 100.00 131072 0.00 4 5 10.0.0.204 6001 sdb1 100.00 131072 0.00 5 6 10.0.0.132 6001 sdb1 100.00 131072 0.00

         0     1      10.0.0.208  6000      sdb1 100.00     131072    0.00 
         1     2      10.0.0.207  6000      sdb1 100.00     131072    0.00 
         2     3      10.0.0.206  6000      sdb1 100.00     131072    0.00 
         3     4      10.0.0.205  6000      sdb1 100.00     131072    0.00 
         4     5      10.0.0.204  6000      sdb1 100.00     131072    0.00 
         5     6      10.0.0.132  6000      sdb1 100.00     131072    0.00
edit flag offensive delete link more
0

answered 2012-11-15 17:09:00 -0500

gucluakkaya gravatar image

Sorry for the previous comment accidently pressed solved button. Here is my account,container and object rings.

id zone ip address port name weight partitions balance meta 0 1 ip1 6002 sdb1 100.00 131072 0.00 1 2 ip2 6002 sdb1 100.00 131072 0.00 2 3 ip3 6002 sdb1 100.00 131072 0.00 3 4 ip4 6002 sdb1 100.00 131072 0.00 4 5 ip5 6002 sdb1 100.00 131072 0.00 5 6 ip6 6002 sdb1 100.00 131072 0.00

         0 1 ip1 6001 sdb1 100.00 131072 0.00
         1 2 ip2 6001 sdb1 100.00 131072 0.00
         2 3 ip3 6001 sdb1 100.00 131072 0.00
         3 4 ip4 6001 sdb1 100.00 131072 0.00
         4 5 ip5 6001 sdb1 100.00 131072 0.00
         5 6 ip6 6001 sdb1 100.00 131072 0.00

         0 1 ip1  6000 sdb1 100.00 131072 0.00
         1 2 ip2 6000 sdb1 100.00 131072 0.00
         2 3 ip3 6000 sdb1 100.00 131072 0.00
         3 4 ip4 6000 sdb1 100.00 131072 0.00
         4 5 ip5 6000 sdb1 100.00 131072 0.00
         5 6 ip6 6000 sdb1 100.00 131072 0.00
edit flag offensive delete link more
0

answered 2012-11-15 21:57:12 -0500

clay-gerrard gravatar image

If you only stop two nodes do you still have problems?

It seems like on three replica system, with 50% of the cluster offline - there's going to be some objects that only exist on downed nodes?

-clayg

edit flag offensive delete link more
0

answered 2012-11-16 08:28:32 -0500

gucluakkaya gravatar image

Thank you for your answer. It seems like our application inserting and retrieving containers had some problem. You are right that for 50% of cluster being offline some object cannot be retrieved and after more test i verified that failover work properly, if one node is down swift will look for another node.

Sorry for the inconvenience.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2012-11-15 16:58:52 -0500

Seen: 90 times

Last updated: Nov 16 '12