Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Hi, Thanks for the info Florian; very much appreciated.

This is what I understood, but; could you please clarify the following for me?

"swift starts working around the failure by doing things like writing uploads destined to that drive automatically to a handoff node"
- I have read this, and it would imply that when a single drive fails on a node; that node no longer accepts writes; this would imply that a drive failure on a single node, which is designated as a single zone renders the entire zone non writable - whether than zone contains 3 drives or 100. - So, can the 'handoff node' actually potentially be the same node but a different drive ?

-If a drive drive is failed , Swift does not work to replicate the data from that drive to another drive except for writes which would have been made to that drive. So the deployment does not 'self-heal' as such it simple works degraded until a faulty component is replaced or brought back on-line.

-If an entire node fails - and the swift install drives are toast; once the failure is rectified does the installation simply catch-up or would all those drives then be re-overwritten ? That is, since a node is defined by it's IP address so, as long as you rebuild the swift install with the same IP - no ring updates are required and only modified / new data will be copied back to that node - os this the case ?

I would love to see a document outlining different failures and how they are managed both a single server / zone per disk scenario up to a 4 server zone per server install... rather than the 'probably the best thing to do...' scenarios in the manual. I would like to know what would have to happen to lose data.



PS - I understand bit-rot is self-healing, but it's Drive failure / installation failure I want to get a grasp on.