Ask Your Question
0

swift replication times

asked 2016-01-05 10:31:22 -0600

ccontini gravatar image

Hi,

We are having trouble with our swift setup regarding replication times. We have a 6 nodes setup with 2 proxies and 4 object servers and the ring is built this way:

33554432 partitions, 3.000000 replicas, 1 regions, 4 zones, 139 devices, 0.00 balance The minimum number of hours before a partition can be reassigned is 1

On one of the object servers we changed a lot of drives and has been rebooted many times. Looking at the logs for object replication, we can see things like that:

Jan 5 11:07:32 cca-c7-swift04 object-replicator: 93621/6597974 (1.42%) partitions replicated in 71020.54s (1.32/sec, 1370h remaining)

Replication is obviously taking a very long time. We are looking at ways to improve that. Here are my questions:

  1. Our min_part_hours is set to 1. From what I read this number should be set to something bigger than a full replication pass (which seems to be over 1370h in our case). How does that number change replication ? What should be a right number for us ?

  2. We can change the configuration for rsync and swift to allow for a higher number of rsync connections or a higher number for concurrency in [object-replicator] in object-server.conf. Is it going to improve the situation ?

  3. Is there any other way to speed up the replication process ?

We are using swift 2.2.0.

Thanks,

edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted
0

answered 2016-01-06 16:28:35 -0600

updated 2016-01-06 16:32:17 -0600

what is your backend network MTU? it should be at least 9000 through your storage network to help speed up replication. Distance, network, and connections all matter for replication as well. Rackspace has extensive documentation on the setup and maintenance of a swift ring for object replication as well. Check out this link, it may help: http://docs.openstack.org/developer/swift/deployment_guide.html (http://docs.openstack.org/developer/s...)

Also what is the distance in your replication and what is the link between regions/zones?

edit flag offensive delete link more
0

answered 2016-01-08 13:12:26 -0600

ccontini gravatar image

Our MTU is set to 1500. I understand that enabling bigger frames could speed up the process but in this case our replication times seem off by several orders of magnitude. The servers are very close to one another and have a dedicated replication network. Does the very large number of partitions alone explain the lack of performance ?

edit flag offensive delete link more

Comments

It very well can. If you do have the ability, drop out a disk in each group and create a new ring with just 2 disks 1 region and 1 zone and limit the number of partitions. Test with this and take the values up to your running swift.

bcollins gravatar imagebcollins ( 2016-01-09 07:44:40 -0600 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2016-01-05 10:31:22 -0600

Seen: 252 times

Last updated: Jan 08 '16