PUT cost a lot of time

asked 2013-08-02 15:55:28 -0600

249524324-q gravatar image

When I put a large number of 4K files, the performance is terrible.

My testing configuration: [bench] concurrency = 200 object_size = 4096 num_objects = 200000 num_containers = 200

Now I trace the code of PUT operation and find some problems. I seem looks like that three step will cost a lot of time in PUT operation.

in function "ObjectController::PUT":

for chunk in iter(lambda: reader(self.network_chunk_size), ''): upload_size += len(chunk) if time.time() > upload_expiration: self.logger.increment('PUT.timeouts') return HTTPRequestTimeout(request=request) etag.update(chunk) while chunk: written = os.write(fd, chunk) chunk = chunk[written:] sleep()

per lambda:reader will cost 600ms when I put 4K files. And also eventlet.sleep() will cost 400ms. At last, fsync() and async_update() will const 400ms.

Is there anyone who had faced this problem? How to fix this issue?

edit retag flag offensive close merge delete

12 answers

Sort by ยป oldest newest most voted
0

answered 2013-08-04 14:34:13 -0600

cthier gravatar image

So getting back to the PUT performance, as GET performance has always been much faster than PUT performance, as there is a bit more going on with PUTs (updating containers, fsync, etc.).

To really get down to the bottom of this, we will need to know more information about your config. From your email to openstack-dev I gather that you have 1 proxy, 5 storage nodes, each with 4TB drives.

Can you please provide the following information:

  1. What concurrency are you running each service at (posting your configs would be great)?
  2. Are you using a RAID controller with a cache? Are the drives in a RAID configuration?
  3. Are the drive caches on or off?
  4. What Filesystem are you using (XFS, EXT4, etc.) and what params are you using to format?
  5. What is the partition power of your ring?

You mention in the email that you start getting some errors like 503 when the performance starts lagging. Looking at the logs will will help you figure out what is going on (look for things like Timeouts, errors, etc.)

Also, how many objects do you PUT before the performance starts degrading?

Thanks,


Chuck

edit flag offensive delete link more
0

answered 2013-08-04 16:35:46 -0600

249524324-q gravatar image

From your email to openstack-dev I gather that you have 1 proxy, 5 storage nodes, each with 4TB drives. Yes, 1 proxy and 5 storage node, but each storage have 4TB * 4 drivers.

  1. What concurrency are you running each service at (posting your configs would be great)? A: proxy workers= 32 , object-server workers = 32 account workers = 16 ,container workers = 32

  2. Are you using a RAID controller with a cache? Are the drives in a RAID configuration? A: NO

  3. Are the drive caches on or off? A: How to find it's on or off?

  4. What Filesystem are you using (XFS, EXT4, etc.) and what params are you using to format? A: XFS, inode size = 1024 and other params like "type xfs (rw,noatime,nodiratime,nobarrier,logbufs=8)".

  5. What is the partition power of your ring? A: partition power is 18. Would it affect performance?

The errors include 'ConnectTimeout' and 'get final status timeout(10s)'.

Also, how many objects do you PUT before the performance starts degrading? In fact, it started at the beginning.It kept increasingly down from 1900/s to 100/s.

some bench logs: 2013-08-04 02:32:57,281 INFO 3527 PUTS [0 failures], 1763.3/s current:1763.4/s 2013-08-04 02:33:12,282 INFO 22057 PUTS [0 failures], 1297.4/s current:1235.3/s 2013-08-04 02:33:27,284 INFO 39651 PUTS [0 failures], 1239.0/s current:1172.7/s 2013-08-04 02:33:42,291 INFO 48089 PUTS [0 failures], 1023.0/s current:562.3/s 2013-08-04 02:33:57,292 INFO 64144 PUTS [0 failures], 1034.4/s current:1070.3/s 2013-08-04 02:34:12,298 INFO 75191 PUTS [0 failures], 976.3/s current:736.2/s and then it keep between 500/s ~ 900/s for a period of time. And then it looks like: 2013-08-04 05:33:06,339 INFO 5416916 PUTS [0 failures], 501.1/s current:683.1/s 2013-08-04 05:33:21,339 INFO 5420003 PUTS [0 failures], 500.6/s current:205.8/s 2013-08-04 05:33:36,350 INFO 5429288 PUTS [0 failures], 500.8/s current:618.6/s 2013-08-04 05:33:51,381 INFO 5433035 PUTS [0 failures], 500.5/s current:249.3/s

then: 2013-08-04 10:41:24,565 INFO 12095430 PUTS [0 failures], 412.7/s current:368.1/s 2013-08-04 10:41:39,604 INFO 12099253 PUTS [0 failures], 412.6/s current:254.2/s 2013-08-04 10:41:54,606 INFO 12105207 PUTS [0 failures], 412.6/s current:396.9/s 2013-08-04 10:42:09,615 INFO 12108471 PUTS [0 failures], 412.5/s current:217.5/s

At last: 2013-08-04 13:58:32,020 INFO 15393452 PUTS [0 failures], 374.2/s current:336.8/s 2013-08-04 13:58:47,160 INFO 15395591 PUTS [0 failures], 374.1/s current:141.3/s 2013-08-04 13:59:02,172 INFO 15399491 PUTS [2 failures], 374.1/s current:259.8/s 2013-08-04 13:59 ... (more)

edit flag offensive delete link more

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2013-08-02 15:55:28 -0600

Seen: 171 times

Last updated: Aug 12 '13