Ask Your Question

Openstack-Swift performance problem

asked 2014-07-17 10:13:22 -0500

bszabo gravatar image

updated 2014-07-17 15:27:51 -0500

smaffulli gravatar image


we are having a problem with our swift cluster, with a swift version 1.8.0. The cluster is built up from 3 storage nodes + a proxy node, we have 2 times replication. Each node sports a single 2TB sata HDD, the OS is running on an SSD. The traffic is ~300 1.3MB files per minute. The files are of the same size. Each file is uploaded with an X-expire-after with a value equivalent of 7 days.

When we started the cluster around 3 months ago we uploaded significantly less files (~150/m), everything was working fine. As we have put more pressure on the system, at one point the object expirer couldn't expire the files as fast as being uploaded, slowly filling up the servers.

After our analysis we found the following:

  • It's not a network issue, the interfaces are not overloaded, we don't have an extreme amount of open connections
  • It's not a CPU issue, loads are fine
  • It doesn't seem to be a RAM issue, we have ~20G free of 64G

The bottleneck seems to be the disk, iostat is quite revealing:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00    57.00    0.00  520.00     0.00  3113.00    11.97   149.18  286.21    0.00  286.21   1.92 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               2.00    44.00    7.00  488.00   924.00  2973.00    15.75   146.27  296.61  778.29  289.70   2.02 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     3.00   60.00  226.00  5136.00  2659.50    54.51    35.04  168.46   49.13  200.14   3.50 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00     0.00  110.00   91.00  9164.00  2247.50   113.55     2.98   14.51   24.07    2.95   4.98 100.00

The read and write wait times are not always that good :), can go up into the thousands range msecs, which is pretty dreadful.

We're also seeing many ConnectionTimeout messages from the node side and in the proxy.

Some examples from the storage nodes:

Jul 17 13:28:51 compute005 object-server ERROR container update failed with (saving for async update later): Timeout (3s) (txn: tx70549d8ee9a04f74a60d6115456d070b)
Jul 17 13:34:03 compute005 swift ERROR with Object server re: Trying to DELETE /AUTH_6988e698bc17460bbfc771e66ffdcde1/channel_recordings/fr/gulli/20140628/121524.wav: Timeout (10s) (txn: tx11c34840f5cd42fdad19f1b7e26a6a1e)
Jul 17 12:45:55 compute005 container-replicator ERROR reading HTTP response from {'zone': 7, 'weight': 2000.0 ...
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2014-08-19 12:09:23 -0500

briancline gravatar image

updated 2014-08-19 12:30:28 -0500

Given your use case, this sounds like a simple high-activity versus low-IOPS capacity problem. Essentially you're asking a relatively few number of drives to do quite a bit of work; it may not seem like a lot, but factor in writes of all replicas, account and container database retrieval/maintenance involved in each request, replication runs for account DBs/container DBs/objects, object-expirer scans, auditor runs, etc.).

Probably the least-hassle solution may simply be to add more raw IOPS capacity by adding more drives to the cluster (note that this does not necessarily mean more physical servers; only drives), as well as set up some monitoring and alerting to check these metrics frequently so you can anticipate the problem much sooner in the future.

You might counter this by arguing that you're nowhere near your total disk capacity and you don't need more disks, but more disk capacity isn't really the answer here since you're hitting a ceiling on the IOPS your existing disks provide. More drives and therefore more IOPS is going to be your best bet, unless you want to replace out these drives with expensive 10K or even 15K drives (although, assuming your activity continues to increase as it always does, there will be a point where you'll still hit the same raw IOPS limitations and have to scale out onto more drives anyway).

EDIT: I missed the tail end of your post the first time I wrote this. I know you mention SSDs are too expensive, however it sounds like you were ruling them out (rightly so) for storage of all your objects. You might reconsider using SSDs if you were to use them only for account and container database storage (essentially using only those SSDs in your account and container rings, rather than the same disks as your object ring uses). Using SSDs in this manner is a very common and battle-tested way of taking some load off the spindle-based disks storing your actual objects while realizing major performance gains in account- and container-related operations.

Using SSDs only for account and container databases won't require anywhere near as much space as your objects will, so may not be as cost-prohibitive as you thought; if you wanted to know your absolute minimum space requirement for SSDs for this purpose, you could run a bit of a cache-killing command du -sh /srv/node/*/{account,container} on each of your storage nodes to determine how much space those currently use.

edit flag offensive delete link more


Thank you for the detailed and explanatory answer, we'll consider moving the account and container database to SSDs. At the moment we reduced the object count 10 fold (we store bigger objects now, around 13m), the cluster seems to work as expected since. But this is the next step we'll take. Thanks

bszabo gravatar imagebszabo ( 2014-08-21 06:33:43 -0500 )edit

Get to know Ask OpenStack

Resources for moderators

Question Tools



Asked: 2014-07-17 10:13:22 -0500

Seen: 3,551 times

Last updated: Aug 19 '14