Ask Your Question
0

Can somebody explain "self-healing" to me ?

asked 2012-08-21 15:51:32 -0600

imiller gravatar image

Hi,

I am very close to rolling out a small Swift production environment for the purposes of backup and archiving.

Currently I am drafting maintenance and procedure docs and so I am going around in circles trying to work out what happens for any given fault and what actions to take.

I was wondering if anybody could point me in the right direction as to how Swift 'Self Heals' - it is banded about all over the place, but I am struggling to find examples.

As far as I can work out, Swift will work around faults but no actual healing will take place until the ring is updated.

For example; if a HDD fails (and gets unmounted) which contains an object to be updated - then the object will be updated on other nodes/HDD's until the failed HDD comes back or is taken out of the ring and the ring updated. This isn't self healing, this is operator healing.

Am I missing something fundamental ?

Thanks for you patience,

edit retag flag offensive close merge delete

17 answers

Sort by » oldest newest most voted
0

answered 2012-08-29 22:06:36 -0600

I think you are missing a definition of what "self-healing" means to you.

edit flag offensive delete link more
0

answered 2012-08-30 01:38:38 -0600

Lets say you have 5 drives. On 5 nodes.

1 Drive fail's so either you unmount it or the drive failure script does.

The drive failed on Friday night but you don't feel like messing with it. In the mean time swift starts working around the failure by doing things like writing uploads destined to that drive automatically to a handoff node.

Monday roll's around and you finally get the chance to replace the failed drive with a new drive. You insert the drive, format it, and mount it. Of course now that drive is empty but Swift has you covered. It will start replicating all the data that's supposed to be on that drive back on it with out your intervention.

You never had to touch the ring. All you did was replace the physical gear.

Even if an entire zones worth of servers goes away (whether 1 Zone == 1 Drive or 1 Zone == 100 drives),when you swap the gear for a working chassis and mount drives Swift will start replicating the data back.

The only reason you would have to mess with the ring is if you're permanently pulling a node/device off line or if the replacement needs to go in at a higher weight (say you swap a failed 2TB drive with a 3 TB drive).

edit flag offensive delete link more
0

answered 2012-08-30 07:45:49 -0600

imiller gravatar image

Hi, Thanks for the info Florian; very much appreciated.

This is what I understood, but; could you please clarify the following for me?

"swift starts working around the failure by doing things like writing uploads destined to that drive automatically to a handoff node"
- I have read this, and it would imply that when a single drive fails on a node; that node no longer accepts writes; this would imply that a drive failure on a single node, which is designated as a single zone renders the entire zone non writable - whether than zone contains 3 drives or 100. - So, can the 'handoff node' actually potentially be the same node but a different drive ?

-If a drive drive is failed , Swift does not work to replicate the data from that drive to another drive except for writes which would have been made to that drive. So the deployment does not 'self-heal' as such it simple works degraded until a faulty component is replaced or brought back on-line.

-If an entire node fails - and the swift install drives are toast; once the failure is rectified does the installation simply catch-up or would all those drives then be re-overwritten ? That is, since a node is defined by it's IP address so, as long as you rebuild the swift install with the same IP - no ring updates are required and only modified / new data will be copied back to that node - os this the case ?

I would love to see a document outlining different failures and how they are managed both a single server / zone per disk scenario up to a 4 server zone per server install... rather than the 'probably the best thing to do...' scenarios in the manual. I would like to know what would have to happen to lose data.

Thanks,

Isaac

PS - I understand bit-rot is self-healing, but it's Drive failure / installation failure I want to get a grasp on.

edit flag offensive delete link more
0

answered 2012-08-30 21:26:35 -0600

IF a physical machine suffers a drive failure. Swift will replicate data which was present upon this failed drive to functioning hardware to ensure every object has the correct amount of replicas. If left indefinitely in this state the system loses only the storage capacity of that drive; IMPORTANTLY : * data integrity will be exactly as if the drive was functioning. *

Yes this is true. Swift uses eventual consistency model, if any device fails the data will be eventually replicated to other devices. More than that: if write fails it will be retried asynchronously and will be replicated if fails. Data integrity will be degraded only in the case where all devices for specific zone are failed and there are not enough "spare" zones to copy the replicated data to. E.g. if replication level is N and you have less than N zones intact.

1) As soon as that server goes down swift starts replicating data from all those disks to live disks

Correct

2) If left down indefinitely everything would be fine, we would just lose the capacity of those disks

Partially true, if you still have at least 1 device in each zone. If the whole zone fails and you have no spare zone the state will be degraded.

3) If we enliven those disks on a newly built node with the same IP then objects replicated away will be deleted and object which are awaiting update will be updated

Correct. Objects will be replicated back from handoff nodes and deleted on the handoff nodes.

I realise that on larger deployments this is of little consequence, but for the rest of us paying very high €€ for each amp we consume we have to try and work out if 5 servers with 100 disks or 50 servers with 10 disks is better. The latter is significantly higher in cost! (throughput is not an issue, only reliance)

Because swift essentially replicates data between devices, the first scenario (5X100) is indeed possible and should have no problems.

edit flag offensive delete link more
0

answered 2012-08-30 18:58:26 -0600

" I have read this, and it would imply that when a single drive fails on a node; that node no longer accepts writes; this would imply that a drive failure on a single node, which is designated as a single zone renders the entire zone non writable - whether than zone contains 3 drives or 100. - So, can the 'handoff node' actually potentially be the same node but a different drive ?"

You're describing Hadoop HDFS here. 1 drive failure means exactly 1 drive failure in Swift Swift works with partitions, partitions are distributed between devices, nodes are a bunch of devices. If one device fails it just means that 1 replica of each partition that was on this device will go to another device, that's it. Node will not fail, Zone will not fail, etc.

"If a drive drive is failed , Swift does not work to replicate the data from that drive to another drive "

Incorrect. Object replicator will replicate data from one drive to other drive.

" That is, since a node is defined by it's IP address so, as long as you rebuild the swift install with the same IP - no ring updates are required and only modified / new data will be copied back to that node - os this the case ?"

Each node is a bunch of devices. If all devices fail. Then replacing them all with some new devices on the same IP will do the trick just fine.

edit flag offensive delete link more
0

answered 2012-08-30 20:54:17 -0600

imiller gravatar image

" "If a drive drive is failed , Swift does not work to replicate the data from that drive to another drive "

Incorrect. Object replicator will replicate data from one drive to other drive. "

Aha; now this is what I am missing from the docs. The docs do not mention this process occuring, it simply mentions writes bound for the failed disk going elsewhere.

So, if a failed disk is replicated in the background - this means that in a disk failure situation the swift deployment will eventually become whole again; that is that ALL data lost on the failed drive will replicate to different drives. This is is (I assume) akin to a lost drive being given zero weight.

So, Constantine, to respond to your original reply, this is self healing; if Is this true:

IF a physical machine suffers a drive failure. Swift will replicate data which was present upon this failed drive to functioning hardware to ensure every object has the correct amount of replicas. If left indefinitely in this state the system loses only the storage capacity of that drive; IMPORTANTLY : * data integrity will be exactly as if the drive was functioning. *

The above would be perfect. Scaling that up though:

"Each node is a bunch of devices. If all devices fail. Then replacing them all with some new devices on the same IP will do the trick just fine."

My question was to do with losing ONLY the OS drive(s) - Say we have 72 X 3TB disks hanging of a single server and we lose that server. All the (data) drives are fine - we can plug those into another motherboard / install a new OS disk /

Is the following true?:

1) As soon as that server goes down swift starts replicating data from all those disks to live disks 2) If left down indefinitely everything would be fine, we would just lose the capacity of those disks 3) If we enliven those disks on a newly built node with the same IP then objects replicated away will be deleted and object which are awaiting update will be updated

OR is a lost OS drive a lost node forever ?

All I can find to go on as far as procedures go is this: http://docs.openstack.org/developer/swift/admin_guide.html#cluster-telemetry-and-monitoring (http://docs.openstack.org/developer/s...) - Handling Drive Failure & Handling Server Failure - which makes it sound like swift 'works around' rather than ' repairs / heals'

I realise that on larger deployments this is of little consequence, but for the rest of us paying very high €€ for each amp we consume we have to try and work out if 5 servers with 100 disks or 50 servers with 10 disks is better. The latter is significantly higher in cost! (throughput is not an issue, only reliance)

edit flag offensive delete link more
0

answered 2012-08-30 20:54:41 -0600

imiller gravatar image

PS - I really appreciate the help :)

edit flag offensive delete link more
0

answered 2012-08-30 22:57:16 -0600

imiller gravatar image

Thanks John Dickinson (notmyname) (notyourname?) ...

This is how I read it too... which puts the last 2 sheets of A4 to waste ...

So drive failure does not self heal. It becomes a black spot, where writes are diverted & replicas on that device are reduced by 1.

If the drive comes back, then it's replicas catch up by means of eventual replication. If the drive doesn't come back and is never replaced then all replicas # on that drive will always be reduced by 1 If the drive is replaced, then it is assigned the same partitions as before, but swift sees them as blank and so populates them with the data that they should hold

Which is how I thought it was. Not self healing in the way a troll would, but more self protecting in the way the starship liberator would.

How does this black hole scale with a lost OS disk ? that is an enormous amount of data dependant on the ring files and an IP.

As far as I can see the ring file references only the IP, so if I replace an OS disk, configure SWIFT with it's old IP date based replication of the disks should 'just happen' and there will be no mad rush of data as long as I don't rebuild the ring...

Which leaves the "self healing" question wide open really... So I'm gonna reopen this for a while. I'd like to leave the last pane full of fact and help :)

Thanks again JD for the sanity check post closure.

edit flag offensive delete link more
0

answered 2012-08-30 21:54:59 -0600

imiller gravatar image

Thank you Constantine for such a swift and full answer; it is very much appreciated!

edit flag offensive delete link more
0

answered 2012-08-30 22:04:00 -0600

torgomatic gravatar image

"Data integrity will be degraded only in the case where all devices for specific zone are failed and there are not enough "spare" zones to copy the replicated data to. E.g. if replication level is N and you have less than N zones intact."

Just one minor nitpick: this was true in older versions of Swift. However, in the latest version, replication will prefer to put things in different zones if possible, but if you suffer enough full-zone failures such that your #zones falls below #replicas, replication will start putting copies on other disks in your existing zones. It will prefer disks in different machines, but if absolutely necessary, will store multiple copies on different disks in the same machine.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2012-08-21 15:51:32 -0600

Seen: 333 times

Last updated: Aug 31 '12