Ask Your Question
0

What is maximum recommended objects per container?

asked 2011-12-15 11:26:42 -0500

rustam-code gravatar image

While searching for the swift performance tips I came across this post: http://adrianotto.com/2010/09/openstack-os-is-great-for/ (http://adrianotto.com/2010/09/opensta...)

In the comments, some users mention that it's better to keep max. number of objects per container less than 1M. As far as I understood, this is mainly due to sqlite limitation.

Post is quite old though, and I was wondering if this was addressed in the latest 1.4.4 version?

Can I store unlimited (or ~ 1B objects) in a single container?

edit retag flag offensive close merge delete

12 answers

Sort by ยป oldest newest most voted
0

answered 2012-06-03 11:42:44 -0500

Hello?

edit flag offensive delete link more
0

answered 2011-12-15 13:43:17 -0500

notmyname gravatar image

This is highly dependent on what hardware you deploy on. If you deploy your account and container servers on boxes optimized for IOPS (think RAID 10 of SSDs), then you can easily have 1BB objects in a container with no big impact to your object PUT performance. However, if you are running account, container, and object servers all on one box that only is using cheap, spinning drives, then you should limit your object cardinality to a few million before you see serious object PUT degradation.

Although something we've talked about, there has not yet been any dev effort spent on mitigating this issue in the swift code; the answer right now is to deploy on optimized hardware (container/account for IOPS, object for $/GB, proxies for CPU and RAM).

--John

On Dec 15, 2011, at 5:30 AM, Rustam A wrote:

New question #181977 on OpenStack Object Storage (swift): https://answers.launchpad.net/swift/+...

While searching for the swift performance tips I came across this post: http://adrianotto.com/2010/09/opensta...

In the comments, some users mention that it's better to keep max. number of objects per container less than 1M. As far as I understood, this is mainly due to sqlite limitation.

Post is quite old though, and I was wondering if this was addressed in the latest 1.4.4 version?

Can I store unlimited (or ~ 1B objects) in a single container?


You received this question notification because you are a member of Swift Core, which is an answer contact for OpenStack Object Storage (swift).

edit flag offensive delete link more
0

answered 2011-12-15 14:51:54 -0500

rustam-code gravatar image

Thanks John,

We use cheap hardware with 3-in-1 setup - account/container/object. The main argument for this decision was that we don't expect massive I/O, more archive type of storage.

In our use case we can change container periodically. But it would be nice to not worry about the number of objects at all. If there's a ticket for this, I'd vote +1.

-- Rustam

edit flag offensive delete link more
0

answered 2012-02-24 13:41:56 -0500

So, what are the best practices for storing a large amount of objects on commodity hardware (spinning SATA drives)?

Should we create subdirs based on the key and store them on the same container? Or should we create also separate containers based, for example, on the first chars of the key?

Thanks.

edit flag offensive delete link more
0

answered 2012-02-24 21:28:26 -0500

rustam-code gravatar image

Andre, as far as I understood bottleneck is not files/filesystem. Bottleneck is database (SQLite) where metedata information for all objects is stored. Because there's one db per container, performance of this db drops and number of items in this container grow.

edit flag offensive delete link more
0

answered 2012-02-24 23:34:41 -0500

If you have millions of objects in a single container on regular SATA drives things like container listings are super painful. If you put your container servers/db's on ssd's it buy's you much better performance. If that's not an option you definitely wanna spread your objects across more containers.

edit flag offensive delete link more
0

answered 2012-02-25 09:16:07 -0500

Listing is not a requirement, I just need to GET and PUT objects, the metadata itself is stored on Cassandra.

Still, in order not to worry about performance degradation, I should spread the objects manually it seems. In the referenced blog post the author suggests no more than 1M objects per container and no more than 1M containers per account. So, to be on the safe side, I should also spread the objects into multiple accounts? I will not be using SSDs.

Is this "limitation" something that is being addressed? It sure would be nice not having to worry about spreading the objects manually. I just need a distributed object storage with a flat namespace. Maybe Swift is not for me?

Thanks for the help.

edit flag offensive delete link more
0

answered 2012-02-25 22:25:14 -0500

rustam-code gravatar image

Chuck, thanks for your input. We also store metadata in Cassandra and only need to fetch an object using it's name. If avoiding listing will help here, I would definitely vote for that.

edit flag offensive delete link more
0

answered 2012-02-26 01:36:52 -0500

cthier gravatar image

Like I said in my post, avoiding the listings currently isn't possible. But if you are interested in going down that route, we can point you in the right direction

edit flag offensive delete link more
0

answered 2012-02-25 14:03:15 -0500

cthier gravatar image

For Andre's question, the short story is that you should test it and see :)

The longer story is that these limitations are due to swift keeping these listings. Once the number of objects in a container gets in the order of magnitude of millions, the containers can only handle on the order of 5-10 updates per second (on regular SATA hardware, and on the same nodes as the object servers). This of course is going to vary based on your actual hardware and use cases. If you don't need to write to a single container more than 5-10 objects per second (or if you do, only in bursts), then you should be fine for many millions of objects (though I doubt for a billion). This is just something that you would really have to test for your use case. You will know when the container servers are being overloaded by watching for async pendings on the object servers to stack up.

Similar limits apply to the containers. The more containers you have in an account, the longer it takes to create a new one. I haven't really tested this, but my guess is that with a similar setup as above, for a given account, you would only be able to create 5-10 containers per second, and like everything else, this should be tested :)

All of that said, even if you adopted a lower end limit of 1 million objects per container and 1 million containers per server, that is a lot of objects :) (1,000,000,000,000).

If this is a private cluster then you could also leverage accounts, and there are not really any limitations around accounts that I can think of right this moment.

There is one other option that I have thought about before for use cases like yours, and this would require a bit more work. It is conceivable that you could add an option to swift to ignore listings, and would bypass the parts that need to do account/container lookups. Since you are tracking the metadata outside of swift, not having container/account listings should be fine, and this would free up all of those limitations. This is not a feature that is on our current road map, but if it is something you are interested in implementing, let me know and I can help you get started on the right path.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2011-12-15 11:26:42 -0500

Seen: 250 times

Last updated: Jun 03 '12