Ask Your Question
0

PUT cost a lot of time

asked 2013-08-02 15:55:28 -0600

249524324-q gravatar image

When I put a large number of 4K files, the performance is terrible.

My testing configuration: [bench] concurrency = 200 object_size = 4096 num_objects = 200000 num_containers = 200

Now I trace the code of PUT operation and find some problems. I seem looks like that three step will cost a lot of time in PUT operation.

in function "ObjectController::PUT":

for chunk in iter(lambda: reader(self.network_chunk_size), ''): upload_size += len(chunk) if time.time() > upload_expiration: self.logger.increment('PUT.timeouts') return HTTPRequestTimeout(request=request) etag.update(chunk) while chunk: written = os.write(fd, chunk) chunk = chunk[written:] sleep()

per lambda:reader will cost 600ms when I put 4K files. And also eventlet.sleep() will cost 400ms. At last, fsync() and async_update() will const 400ms.

Is there anyone who had faced this problem? How to fix this issue?

edit retag flag offensive close merge delete

12 answers

Sort by ยป oldest newest most voted
0

answered 2013-08-12 02:33:02 -0600

249524324-q gravatar image

Thanks Chuck Thier, that solved my question.

edit flag offensive delete link more
0

answered 2013-08-06 22:00:14 -0600

cthier gravatar image

So a couple of thoughts:

The Timeout errors lead me to believe that you are running out of disk io.

Your object-server workers are a lot higher than I would run for a server with 4 drives. That means there are 8 workers for each drive. I would cut it back to 4-8, and only increment if further testing doesn't cause issues. If things are stable at that point, then you can look at increasing workers incrementally with further testing.

I would also look at turning your disk caches off. They are likely masking the issue at the beginning. The methods will vary by os and if you are using RAID controller, but is pretty easy to google.

The other thing that I would check is to see if the background processes are running too agressively. You could try running with replication/auditors/etc. turned off just as a test to see if that produces any effect. If so, then it would be worth tuning their concurrency, and run delays.

That should be a good start.


Chuck

edit flag offensive delete link more
0

answered 2013-08-02 18:52:57 -0600

torgomatic gravatar image

The reader is what's pulling data off the network. That's expected to take a long time.

The fsync() and async_update() will take a long time because they're doing a bunch of disk IO. fsync() makes sure any data in buffer cache is written out to disk, so if that's taking a while, it's because that call makes a bunch of disk heads move around and wait for spinning platters.

eventlet.sleep() may take a while, but that's because it's yielding to another greenlet. The only way the timing measurement there will accurately measure the eventlet overhead is if you have exactly one request being processed. Otherwise, what looks like wasted time is actually time that is spent on running another greenlet.

edit flag offensive delete link more
0

answered 2013-08-02 20:14:09 -0600

249524324-q gravatar image

Thank you for your help.But I think it does not seem to be a good idea that holding on current request to running another greenlet.The proxy will cost longer time to wait the response because request be holding on.I don't understand why swift does sleep the current greenlet.

edit flag offensive delete link more
0

answered 2013-08-02 20:22:07 -0600

torgomatic gravatar image

It's basically because disk IO won't trigger a context switch, so when disk is slower than network, you get starvation.

See https://github.com/openstack/swift/co... for details.

edit flag offensive delete link more
0

answered 2013-08-03 08:06:46 -0600

249524324-q gravatar image

Thank you for your answer~ In fact, the performance of GET is very well. It can reach 5000/s stability . But the speed of PUT is keeping on 100/s stability at last.The object size is 4KB. What may cause the performance degradation of PUT while GET works well?

edit flag offensive delete link more
0

answered 2013-08-03 22:44:54 -0600

torgomatic gravatar image

Well, you've got a bunch of tiny objects, so they're probably being read out of buffer cache on GET requests, whereas on PUT, you're guaranteed some actual disk IO.

edit flag offensive delete link more
0

answered 2013-08-04 04:28:22 -0600

249524324-q gravatar image

As far as I know, the object will be drop from buffer cache immediate after GET.

There are some code to drop cache in DiskFile::put(swift/obj/server.py).

if 'Content-Length' in metadata: self.drop_cache(fd, 0, int(metadata['Content-Length']))

I think there is no cache with object.

edit flag offensive delete link more
0

answered 2013-08-04 06:54:12 -0600

torgomatic gravatar image

As far as I know, the object will be drop from buffer cache immediate after GET.

Nope. There's a threshold (5 MiB by default) for dropping cache on reads. Under that threshold isn't flushed from cache.

See https://github.com/openstack/swift/bl... for details.

edit flag offensive delete link more
0

answered 2013-08-04 13:00:57 -0600

249524324-q gravatar image

Sorry just made it wrong "As far as I know, the object will be drop from buffer cache immediate after GET" I should be "As far as I know, the object will be drop from buffer cache immediate after PUT".

So there is no cache after PUT. And every one of the objects GET that was different. So I think those object should not be cached.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2013-08-02 15:55:28 -0600

Seen: 106 times

Last updated: Aug 12 '13