Ask Your Question
0

Can somebody explain "self-healing" to me ?

asked 2012-08-21 15:51:32 -0500

imiller gravatar image

Hi,

I am very close to rolling out a small Swift production environment for the purposes of backup and archiving.

Currently I am drafting maintenance and procedure docs and so I am going around in circles trying to work out what happens for any given fault and what actions to take.

I was wondering if anybody could point me in the right direction as to how Swift 'Self Heals' - it is banded about all over the place, but I am struggling to find examples.

As far as I can work out, Swift will work around faults but no actual healing will take place until the ring is updated.

For example; if a HDD fails (and gets unmounted) which contains an object to be updated - then the object will be updated on other nodes/HDD's until the failed HDD comes back or is taken out of the ring and the ring updated. This isn't self healing, this is operator healing.

Am I missing something fundamental ?

Thanks for you patience,

edit retag flag offensive close merge delete

17 answers

Sort by ยป oldest newest most voted
0

answered 2012-08-31 07:28:51 -0500

imiller gravatar image

https://github.com/rpedde/swift-training-kick/blob/master/exercises/exercise6-ring-management.txt (https://github.com/rpedde/swift-train...)

edit flag offensive delete link more
0

answered 2012-08-31 00:06:51 -0500

notmyname gravatar image

Whoops. I goofed. What I said was incorrect. Handoff nodes are indeed automatically used when a drive goes down, thus ensuring that you have full replication of your data. Handoff nodes are not used when the whole server goes down. The loss of an entire server should be handled with a replacement or a ring update as soon as possible.

edit flag offensive delete link more
0

answered 2012-08-30 23:21:10 -0500

imiller gravatar image

PS:

I Like reason ( I don't care what anybody does as long as there is a reason behind it - preferably one which make sense, but where I work., people get fired for having no reasons)

"This choice was made because a) most drive failures are transient (eg a new drive can be swapped in relatively quickly) and b) since replicating data out can place a higher burden on other storage nodes, an errant automatic ring update could have cascading failures throughout the cluster."

Makes perfect sense to me. And also cements my idea that nothing really changes without a ring update - operator healing. Which also makes sense to me.

What I am interested in is the effects of recovering from a failure. I think, looking at things, that if I lost all my storage node OS drives to "an attack" as long as I have my proxy alive (or all but one OS drive containing the ring files), it knows where everything is I can build OS node drives of the same IP as before and things would start to function as before pointing at their current storage drives... I do NOT want a situation where I have data, 100+'s of drives of data which I cannot access. Best practice SQLlite backups for a swift deployment ? If not around we need to try and work this.

edit flag offensive delete link more
0

answered 2012-08-30 22:57:16 -0500

imiller gravatar image

Thanks John Dickinson (notmyname) (notyourname?) ...

This is how I read it too... which puts the last 2 sheets of A4 to waste ...

So drive failure does not self heal. It becomes a black spot, where writes are diverted & replicas on that device are reduced by 1.

If the drive comes back, then it's replicas catch up by means of eventual replication. If the drive doesn't come back and is never replaced then all replicas # on that drive will always be reduced by 1 If the drive is replaced, then it is assigned the same partitions as before, but swift sees them as blank and so populates them with the data that they should hold

Which is how I thought it was. Not self healing in the way a troll would, but more self protecting in the way the starship liberator would.

How does this black hole scale with a lost OS disk ? that is an enormous amount of data dependant on the ring files and an IP.

As far as I can see the ring file references only the IP, so if I replace an OS disk, configure SWIFT with it's old IP date based replication of the disks should 'just happen' and there will be no mad rush of data as long as I don't rebuild the ring...

Which leaves the "self healing" question wide open really... So I'm gonna reopen this for a while. I'd like to leave the last pane full of fact and help :)

Thanks again JD for the sanity check post closure.

edit flag offensive delete link more
0

answered 2012-08-30 22:34:19 -0500

@Samuel Merritt Oh, thanks, this is what happens when I stop looking into the ring code for a long time, I see your commit now. :)

@Isaac

will the object replicator then set about moving items to the safest locations as part of it's normal duty?

Yes, the logic of the replicator is a very simple one: if I see "proper" device and I am on the current "handoff" device - transfer it to the "proper" one.

edit flag offensive delete link more
0

answered 2012-08-30 22:29:48 -0500

notmyname gravatar image

It's important to note that when a drive fails, the data on that drive is not immediately replicated to handoff nodes. Any new data that would have been added to that drive will instead go to handoff nodes as Sam explained. But the data that was on the drive is now down to two replicas until either the drive is replaced or the drive is removed from the ring.

This choice was made because a) most drive failures are transient (eg a new drive can be swapped in relatively quickly) and b) since replicating data out can place a higher burden on other storage nodes, an errant automatic ring update could have cascading failures throughout the cluster.

edit flag offensive delete link more
0

answered 2012-08-30 22:45:10 -0500

Looking at John's answer I suppose this means that the better solution for Isaac would be increasing number of nodes anyway. It could be done relatively easy by running multiple object servers on each node, this way he can balance zone assignments any way he likes.

edit flag offensive delete link more
0

answered 2012-08-30 22:20:49 -0500

imiller gravatar image

Oh that's even better news. I feel ashamed I cannot seem to glean this info myself from the docs; I have wasted hours on the wrong docs whilst proofing basic functionality; now I am trying to comprehend what is going on I am as deep as I was at the start.

This is great and helps bit-rot and more likely - multi drive failure a very great deal. (I have had hard disk 'batch failure' in the past where we have lost 20% of drives over a 9 month period, hopefully a 1 in 50 year event ;) )

Samuel; once this "absolutely necessary" tertiary replication is done... if and when the missing parts of the ring come back revealing the lost partitions, will the object replicator then set about moving items to the safest locations as part of it's normal duty? Looking at it, this looks to be the case...

"Self healing" now I understand it better; from the top level docs and testing it appeared that to do anything I had to initiate ring rebuilds. Sure this helped me see things happen (like scaring a horse!) - I suspect now that if I had just sat back and waited I would have gleaned this info.

Thank you

edit flag offensive delete link more
0

answered 2012-08-30 22:04:00 -0500

torgomatic gravatar image

"Data integrity will be degraded only in the case where all devices for specific zone are failed and there are not enough "spare" zones to copy the replicated data to. E.g. if replication level is N and you have less than N zones intact."

Just one minor nitpick: this was true in older versions of Swift. However, in the latest version, replication will prefer to put things in different zones if possible, but if you suffer enough full-zone failures such that your #zones falls below #replicas, replication will start putting copies on other disks in your existing zones. It will prefer disks in different machines, but if absolutely necessary, will store multiple copies on different disks in the same machine.

edit flag offensive delete link more
0

answered 2012-08-30 21:54:59 -0500

imiller gravatar image

Thank you Constantine for such a swift and full answer; it is very much appreciated!

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2012-08-21 15:51:32 -0500

Seen: 308 times

Last updated: Aug 31 '12