Replicating handoff partitions

asked 2017-06-04 04:19:28 -0500

Jarek gravatar image


Because of problems with adding new devices to my Swift' cluster i've dig into object replicator code and i found scenario which probably can be optimized. Normally for partition which belongs to node, replicator makes REPLICATE request to another node and the answer is used to calculate suffixes for sync operation. For handoff partition this step is missing and because of that full partition will be synced. It might be reasonable for small handoff partitions but for large ones it gives a lot of overhead. Let me explain why:

  • We have devices: A, B, C
  • Partition 1 belongs to devices: A, B, C
  • We add new device Z to cluster, make rebalance and upload new ring files
  • Partition 1 should now be on devices B, C, Z and device A becomes handoff for this partition
  • Device A starts sync process for partition 1 to device B (this requires to read directories structure and files info for partition 1 from disk on device B) and doesn't copy anything
  • Same for device C
  • Only for device Z data will be transferred

If swift can base on comparing hashes for device B and C we can avoid io intensive operation on devices which already have full copy of this partition. My naive calculations tells that this scenario could run up to two times faster.

Am i correct or have i missed something?

edit retag flag offensive close merge delete