Ask Your Question
0

Two paths to the files hashing to the same value

asked 2011-05-06 18:04:31 -0600

rostik-2000 gravatar image

Theoretically, two inputs to the hash function can result in the same output. Suppose that path accountA/containerA/objectA produces the same hash as accountB/containerB/objectB.

Based on the hash, partition is calculated. Since hashes are the same, the partition will be the same. And since datadir directory is created based on hash, these two objects will end up in the same directory.

Then, when an object is uploaded, it is stored in datadir and receives the name which equals to the timestamp. All the other files in that directory with older timestamp are deleted. So if objectA was uploaded before objectB, upon upload of objectB, objectA will be deleted.

Am I wrong somewhere in my reasoning?

edit retag flag offensive close merge delete

11 answers

Sort by ยป oldest newest most voted
0

answered 2011-06-01 22:15:24 -0600

rostik-2000 gravatar image

The probability is indeed too low to cause any problems.

However, imagine that a malicious user generates a number of paths "account/container/object" that will hash to the same value by changing "container/object" suffix. (It is possible to perform this attack for md5, see for example http://www.win.tue.nl/hashclash/ChosenPrefixCollisions/ (http://www.win.tue.nl/hashclash/Chose...) ).

Afterwards, the user uploads different files with the names that will have the same hash to OpenStack. The second file overwrites the first. You may think that such a malicious user harms only himself. BUT! The provider can be sued by that malicious user for data loss (we presume an existence of a legal agreement between a provider and a customer).

Of course, malicious user will need to know HASH_PATH_SUFFIX in order to perform this attack. But it is still possible using insider knowledge (for example by dishonest staff at provider's side disclosing HASH_PATH_SUFFIX value or even initiating the attack).

edit flag offensive delete link more
0

answered 2011-05-06 18:09:12 -0600

notmyname gravatar image

If you find two (account, container, object) sets that hash to the same value (even accounting for the salt we use), then, to swift, they are referencing the same object. You will have simply found two names to the same logical object.

This applies to accounts and account, container pairs, too.

--John

On May 6, 2011, at 1:05 PM, Rostik Slipetskyy wrote:

New question #156307 on OpenStack Object Storage (swift): https://answers.launchpad.net/swift/+...

Theoretically, two inputs to the hash function can result in the same output. Suppose that path accountA/containerA/objectA produces the same hash as accountB/containerB/objectB.

Based on the hash, partition is calculated. Since hashes are the same, the partition will be the same. And since datadir directory is created based on hash, these two objects will end up in the same directory.

Then, when an object is uploaded, it is stored in datadir and receives the name which equals to the timestamp. All the other files in that directory with older timestamp are deleted. So if objectA was uploaded before objectB, upon upload of objectB, objectA will be deleted.

Am I wrong somewhere in my reasoning?


You received this question notification because you are a member of Swift Core, which is an answer contact for OpenStack Object Storage (swift).

edit flag offensive delete link more
0

answered 2011-05-06 18:37:24 -0600

rostik-2000 gravatar image

to swift, they are referencing the same object

But for two users that are uploading their files to swift and incidently happen to get the same hash value, these are two different objects. Isn't it a problem?

edit flag offensive delete link more
0

answered 2011-05-09 11:28:18 -0600

rostik-2000 gravatar image

The probability of hash function giving one output to two different inputs is small, but if swift is ever used as a back-end for gmail or facebook, sooner or later such a situation might happen.

In order to test what will happen then, I have created a dummy implementation of md5 function which always results in the same hash value. Afterwards, I have changed swift.common.utils.py in line 26 to include my dummy version of md5 instead of hashlib.md5.

* TEST *

Actions below test for a situation when hash("accountA/files/file1 + HASH_PATH_SUFFIX") == hash("accountB/files/file2 + HASH_PATH_SUFFIX"):

1) python st -v -A http://10.0.0.2:11000/v1.0 -U accountA:adminA -K passadmina upload files file1 2) python st -v -A http://10.0.0.2:11000/v1.0 -U accountB:adminB -K passadminb upload files file2 3) python st -v -A http://10.0.0.2:11000/v1.0 -U accountA:adminA -K passadmina download files file1

As the result of action 2, file2 will overwrite file 1. As the result of action 3, user from accountA will successfully download the content of the file uploaded by user from accountB.

* POSSIBLE FIX * The path to the file should not depend only on the output of the hash function, but also from some other information that is different for different accounts, for example account name.

edit flag offensive delete link more
0

answered 2011-05-09 19:26:57 -0600

gholt gravatar image

You may wish to read: http://en.wikipedia.org/wiki/Birthday_Paradox (http://en.wikipedia.org/wiki/Birthday...) and the Probability Table contained within. You would have to have 26,000,000,000,000,000 objects before you had even a 0.000001% chance of collision.

The problem with using the account name in the path is that you would have to encode the account name to avoid path naming problems and the account name without encoding is 256 bytes. That can take quite a strain on your inode space.

edit flag offensive delete link more
0

answered 2011-06-01 22:28:13 -0600

notmyname gravatar image

Yes, a collision could happen, but I believe swift is protected in 2 ways:

One is that the hash_path_suffix is just that, a suffix. The attack mentioned in your linked paper deals with generating suffixes for given prefixes. Since the hash_path_suffix is appended to anything that the user can provide to be hashed, unless I am misunderstanding the linked paper, the mentioned attack doesn't apply. In fact, this is exactly why we append the hash_path_suffix instead of prepend it.

Second, the hash_path_suffix is designed to be secret, and we caution deployers to keep it secret in the docs. As you noted, this does not prevent attacks from an insider. However, if an insider has access to the hash_path_suffix, that insider is very likely to not need to compute md5 collisions to cause chaos. The hash_path_suffix should be kept as secret as an /etc/shadow file or the secret key of a key pair.

Using a suffix also prevents attackers from generating many hashes and selecting those that are "close" in an effort to fill up a single storage volume in the swift cluster. Without knowledge of the hash_path_suffix, it is very difficult for an attacker to find "close" hashes.

--John

edit flag offensive delete link more
0

answered 2011-06-01 22:41:30 -0600

gholt gravatar image

This attack vector would require knowing both /<account>/ prefixes and the HASH_PATH_SUFFIX with control only over the middle part with limited data size, which that paper does not describe (it requires being able to change both paths' suffixes). Assuming such an attack vector was feasible it would require the inside knowledge. Knowing that would indicate current or previous access to internal systems. With current access, there are much easier ways to cause problems. With current or previous access and the fact that the name would have to be quite odd looking to work would easily indicate the malicious intent.

All that said, if you have a solution in mind that isn't more harmful to the system than the remote possibility of the threat... It's all a matter of risk management really. :)

edit flag offensive delete link more
0

answered 2011-06-01 23:28:12 -0600

rostik-2000 gravatar image

we caution deployers to keep it secret in the docs (...) The hash_path_suffix should be kept as secret as an /etc/shadow file or the secret key of a key pair.

I just rechecked the Administrator manual ( http://docs.openstack.org/cactus/openstack-object-storage/admin/os-objectstorage-adminguide-cactus.pdf (http://docs.openstack.org/cactus/open...) ), and what I found with respect to swift_hash_path_suffix was:

"# random unique string that can never change (DO NOT LOSE)"

Maybe I have missed other places where the necessity to keep swift_hash_path_suffix secret were mentioned, but it might make sense to indicate it in Admin manual and also at etc/swift.conf-sample from the distribution. Just as a precaution measure.

edit flag offensive delete link more
0

answered 2011-06-02 00:03:02 -0600

notmyname gravatar image

created a bug for the docs

https://bugs.launchpad.net/swift/+bug...

edit flag offensive delete link more
0

answered 2011-06-02 14:42:19 -0600

rostik-2000 gravatar image

This attack vector would require knowing both /<account>/ prefixes and the HASH_PATH_SUFFIX with control only over the middle part with limited data size, which that paper does not describe.

Just a comment. It is still possible to vary the middle part to achieve the same hash value. For example in http://www.win.tue.nl/hashclash/Nostradamus/ (http://www.win.tue.nl/hashclash/Nostr...) there are examples of different PDF files which predict different outcomes of US elections. All of the provided files have the same Md5 hash. If one makes byte-to-byte comparison of two such files, he will note that the prefixes and suffixes are identical.

The above considerations on "keeping hash path value secret" and "risk management" still stay valid. But maybe eventually it might be advisable to refuse from using MD5 even despite its "general availability, good distribution, and adequate speed".

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2011-05-06 18:04:31 -0600

Seen: 167 times

Last updated: Jun 02 '11