Ask Your Question
0

Swift 1.10.0: ring rebalance

asked 2013-12-13 13:40:35 -0500

simplidrive gravatar image

Hi,

I have servers with multiple disks which are individual RAID 0 mount points. If one disk goes down i just replace after formatting it within 4-5 hrs time and its start filling up again without much impact. I would like to know in a worst case scenario what is the maximum time i can wait before removing the disk from ring or what are the scenarios where we have to remove the disk from ring. Also when i do a re balance after removing the disk will only partitions residing on the failed disk will get redistributed or all the the partitions will get rearranged on all available disks.

Regards,

Vishal

edit retag flag offensive close merge delete

4 answers

Sort by ยป oldest newest most voted
0

answered 2013-12-26 12:13:39 -0500

simplidrive gravatar image

Yes, this answer my question.

Thanks,

Vishal

edit flag offensive delete link more
0

answered 2013-12-13 14:40:25 -0500

What you are writing seems contradictory to me. RAID 0 does not have any data redundancy or fault tolerance. So whenever one disk of a RAID 0 array goes down, there is full data loss on the whole array, see http://en.wikipedia.org/wiki/RAID#RAID_0 So if you have the ability to swap a failing disk with a new one and can reconstruct the data, this must be RAID 5 or 6.

I would expect that a RAID 5 array can run with a broken disk as long as there is no failure on another one. Any failure on a second disk before having the first one replaced and remirrored will lead to data loss.

edit flag offensive delete link more
0

answered 2013-12-20 09:23:29 -0500

simplidrive gravatar image

Hi,

I think I posted the question in the wrong project. My question was for Openstack Swift.

Rephrasing the question :

I have 2 queries :

  1. We are running storage nodes with 12 disks of 2TB each. Each disk is an individual RAID 0 array. When a disk fails and the replacement is expected in few hours time, is it better to remove it from the ring and do a re-balance OR shall we replace the disk without removing the same from the ring and re-mount it after formatting.

2) If the disk is shown unmounted in recon, then, a) will swift continue trying to write on the failed disk resulting in two available copies only? Am I understanding this correctly OR b) will it stop trying to write to the failed disk and create third copy to some other disk. If this is right then, after how much time will swift stop try writing to this failed disk ?

Regards,

Vishal

edit flag offensive delete link more
0

answered 2013-12-20 19:36:51 -0500

notmyname gravatar image

1) In your case, I'd recommend simply leaving the drive in place until it's restored. I would make sure you unmount it, though, so Swift doesn't repeatedly try to write to it (or can fast-fail when you try). This recommendation, though, may not be good depending on your particular needs. You should evaluate both practices and make the determination based on the data you find. In general, both ops methodologies are well-supported in Swift.

2) When a disk fails, any data on that disk is replicated to other locations in the cluster (so you have full durability), and all new data that would have otherwise been written to that disk will also go to a handoff location. When you either remove the disk from the ring or replace the disk, the data will be moved back. In all cases, the cluster continues to operate normally.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2013-12-13 13:40:35 -0500

Seen: 84 times

Last updated: Dec 26 '13