manicguitarist's profile - activity

2019-02-14 12:16:43 -0600 received badge  Famous Question (source)
2015-03-13 05:02:59 -0600 answered a question Adding a new datacentre (swift object store)

Here is what I did, and what happened.

For info - we need the "+3 thing" (i.e. 6 replicas in total) as the datacentres are a long way apart (100+ miles) and have a lower capacity link between them, and I want each centre to be self sufficient.

The new data centre isn't yet on line, so I created the new rings as detailed above - i.e. in one move I upped the replica count from 3 to 6 and also changed the weighting of the previously zero weighted disks on the new centre to be all the same as the ones on the initial datacentre.

I then pushed out these rings to only the new data centres. Yes this would result in some data getting pushed back and forth, but not very much - as the balance is 0.01 or thereabouts.

The 3 servers at the new datacentre, when running a dispersion report gave only a 50% health and couldn't find half the copies - which was expected - but the original centre reported that all was ok.

2 weeks later the new datacentre have filled their replicas and are reporting a health of around 99% (something keep getting moved to the different places).

This morning I replaced the rings on the original datacentre as well and after a bit of disk churning, the whole system is now reporting 6 replicas and 100% dispersion ok.

Job done.

I can now start using the new datacentre, 2 months ahead of schedule. Now if only that would mean I could put my feet up for two months...

2015-02-27 10:30:00 -0600 received badge  Famous Question (source)
2015-02-27 10:30:00 -0600 received badge  Notable Question (source)
2015-02-27 10:30:00 -0600 received badge  Popular Question (source)
2015-02-26 08:27:32 -0600 asked a question Adding a new datacentre (swift object store)

Running swift v2.3.0. We currently have 1 region, 3 zones with 3 replicas. We have 3 physical machine each with 26 disks.

We are adding a new datacentre and eventually will go to : 2 regions, 3 zones in each region, giving us 6 replicas - with 6 physical machines each with 26 physical disks.

At the moment we have around 6Tb of data in our store (non replicated - 18Tb in total).

My question is - what is the best way to add the new system? Is it better to increase the replica count to 6 and add the new disks in the 2nd region all in one go - or will that result in "extended unavailability" for the data (accessing via a proxy node on the first region)? Or should I go through the pain of very slowly increasing the replica count and the weight of the disks in the second region?

The 2nd way will take months - but I can be sure that there won't be that many partitions in the wrong place.

The first way has the advantage that it shouldn't need to move any of the existing data - merely copy it to the second region - and yes, if we access local proxy nodes on 2nd region the data won't readily be available till it has all replicated.

I guess the real question is that if I do it "all at once" - will the existing partitions be moved around or not?

I copied my ring files to a safe location and did the "all in one" add and rebalanced - the system was in balance (0.01 balance) - but before I use these rings I need to be sure that my customers' data won't just go awol if I push them out...

2014-11-11 10:46:37 -0600 received badge  Notable Question (source)
2014-05-07 06:15:36 -0600 asked a question Upgrade swift 1.4.0 to latest

Hi - I've been asked to upgrade our swift installation to the "latest version".

We are currently running 1.4.0 on Centos 6.5

As a non-Linux expert, what is required to upgrade the swift installation?

I'm at ease installing and configuring swift on both Centos & Ubuntu, but have never upgraded anything Linux-wise.


2014-04-11 06:10:01 -0600 received badge  Popular Question (source)
2014-02-04 04:30:09 -0600 asked a question Quarantined items

We have migrated our data to a new system - by adding the new devices and progressively changing the weight till the old devices are all at zero and the new ones at the correct size.

Have have 4Tb of data on the new system - but there is a few hundred Mb on the old devices (as seen by doing a "df").

Most of this data is "quarantined" - what should I do with it? Is there a method I can examine the files and see where they should be and what they contain - that way I could re-upload the data.

I don't want to decommission the old devices with the data on - without being sure that correct versions of the data are in place.


2013-10-27 22:54:40 -0600 received badge  Popular Question (source)
2013-10-27 22:54:40 -0600 received badge  Famous Question (source)
2013-10-27 22:54:40 -0600 received badge  Notable Question (source)
2013-10-25 17:26:43 -0600 received badge  Student (source)
2013-10-18 04:57:35 -0600 asked a question "Client disconnected on read" error

We are having an intermittent problem with our cloud system. We are reading the data from .Net applications and also from PHP.

Very occasionally the reading application hangs (either a webservice on a web server, PHP code on the webserver or even a console application on my workstation) and eventually returns a Timeout exception - whilst at the same time we get a Client Disconnected on read error (Example is proxy-server Client disconnected on read (txn: tx61887e16cc49422fa44f1695c8be34d6) (client_ip: ..........)) in /var/log/messages

This is proving hard to track because the error isn't easily repeatable - it is intermittent - but causes huge problems when it does occur.

What would be the cause of this error - and/or is there any way we can reduce the timeout exception length?

We have coded around this issue by making all requests to the cloud in a separate thread which we abort if it doesn't return within 2seconds and we retry - and the retry usually works straight away. This has caused further problems upstream though as all those aborted threads take up resources until the http connection times-out.

Any suggestions gratefully received. M