Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

" "If a drive drive is failed , Swift does not work to replicate the data from that drive to another drive "

Incorrect. Object replicator will replicate data from one drive to other drive. "

Aha; now this is what I am missing from the docs. The docs do not mention this process occuring, it simply mentions writes bound for the failed disk going elsewhere.

So, if a failed disk is replicated in the background - this means that in a disk failure situation the swift deployment will eventually become whole again; that is that ALL data lost on the failed drive will replicate to different drives. This is is (I assume) akin to a lost drive being given zero weight.

So, Constantine, to respond to your original reply, this is self healing; if Is this true:

IF a physical machine suffers a drive failure. Swift will replicate data which was present upon this failed drive to functioning hardware to ensure every object has the correct amount of replicas. If left indefinitely in this state the system loses only the storage capacity of that drive; IMPORTANTLY : * data integrity will be exactly as if the drive was functioning. *

The above would be perfect. Scaling that up though:

"Each node is a bunch of devices. If all devices fail. Then replacing them all with some new devices on the same IP will do the trick just fine."

My question was to do with losing ONLY the OS drive(s) - Say we have 72 X 3TB disks hanging of a single server and we lose that server. All the (data) drives are fine - we can plug those into another motherboard / install a new OS disk /

Is the following true?:

1) As soon as that server goes down swift starts replicating data from all those disks to live disks 2) If left down indefinitely everything would be fine, we would just lose the capacity of those disks 3) If we enliven those disks on a newly built node with the same IP then objects replicated away will be deleted and object which are awaiting update will be updated

OR is a lost OS drive a lost node forever ?

All I can find to go on as far as procedures go is this: http://docs.openstack.org/developer/swift/admin_guide.html#cluster-telemetry-and-monitoring - Handling Drive Failure & Handling Server Failure - which makes it sound like swift 'works around' rather than ' repairs / heals'

I realise that on larger deployments this is of little consequence, but for the rest of us paying very high €€ for each amp we consume we have to try and work out if 5 servers with 100 disks or 50 servers with 10 disks is better. The latter is significantly higher in cost! (throughput is not an issue, only reliance)