swift recon issue

asked 2014-08-19 00:48:23 -0500

simplidrive1 gravatar image


Recently we have observed below on our multi node production swift cluster (1.10.0) (2 swift proxy and 10 storage nodes, 5 zones) :

  1. Even a single disk failure on a node having 12 disks results in recon communication failure to entire host unless we remove the bad disk from the ring. Is this an expected behavior, or should it report the other disks on the node and other parameters from the same node.

  2. Yesterday one of the storage node went in hang state and entire recon communication went blank. The recon did not show any results for any of the hosts / parameters. I got recon error "timeout" when tried running manually. I recovered the server by hard resetting it, then recon started working fine. This is definitely not an expected behavior as even a node goes down the recon should report data from other working nodes.



edit retag flag offensive close merge delete