share
Stack OverflowIs MD5 really that bad?
[+83] [17] Col. Shrapnel
[2010-05-04 19:10:45]
[ php security md5 hashing salt ]
[ http://stackoverflow.com/questions/2768248] [DELETED]

I have an MD5 hash dc3c2be8869ec299b5ab2748184c50ab
And a simple bit of code to produce it

$salt="V4&jd3M3^s5eF";
$pass="*hidden*";
echo $hash=md5($salt.$pass);

So, my question is:

What the password is?

I'm looking to get past theoretical reasoning, and come up with a simple example that we can use to demonstrate the problem(s) with MD5.

WELL
Due to dishonest demand in the recent answer, I am going to refresh the conditions.
Setting new hash and new salt.

Everyone is welcome to break the hash and post the actual password.

Also, because I am not bound with bounty, the question remains open forever.
Until it gets actually answered.

So, we can have a proof at last, showing to everyone that MD5 is bad, bad and broken.

(110) Most of us can't break even the most flawed encryption or hashing algorithms. Just because we can't break it, doesn't mean it's a good algorithm. If you are going to conclude that md5 is safe just because nobody will come up with your password, you are sticking your head in the sand. - Jacco
It is practically masochism to try to use a collision for a web site, as it's likely it will be a large binary blob. If you put a hard limit on the password, like 50 chars, then you've killed all collisions. Add a salt and nobody will bother if they steal your DB either. - iconiK
(16) The formulation of this question and the "discussion" below seem excessively argumentative. "Broken" is in this case a term of art from cryptography (i.e. this is also not programming related), and this comes down to complaining that you don't like the way crytographers use their own jargon. - dmckee
(12) Just because some of us MIGHT be able to break it, doesn't mean we will. Collisions can take hours/days to generate. More time that most of us are willing to put in to answer a question. A hacker doesn't have any such time restraints. - Reece45
(4) MD5 can be cracked by a motivated attacker. It's not likely that anybody here on StackOverflow will be that motivated. - Mark Ransom
(5) I don't use MD5 because I like to sleep at night. It all depends on your personality. I get the feeling that MD5 is good enough for you. - Josh Stodola
(1) you can make your hashing even more difficult to break by using a random salt instead of a static one - Mark
(1) @Jacco but everyone want to spare their 5 cents on "MD5 is broken" topic. - Col. Shrapnel
(1) Since the length of the password is known and small, it wouldn't take me long at all to brute-force the answer. - user168715
(1) it isn't 3 letters but it's relatively small. I am not giving out any hints - just like in the real case. Though 14 characters is still hard for bruteforce. - Col. Shrapnel
(1) For 3 chars (0 -> 127, ASCII) I have 2 048 384 hashes. That's 2 million. What is relatively small? - iconiK
(1) there are also 11-char salt - Col. Shrapnel
(1) @Col. Shrapnel, excluding the salt. Whether salted or not, same amount of hashes. I'll try 1 to 5 chars, but that'll take a looong time. - iconiK
(18) MD5 isn't yet broken to the point where we can snap our fingers and find the answer. It is broken to the point where it would take a surprisingly small amount of resources - today, probably still measured in millions or greater - to crack it. It's also not like it's going to get less broken - it's just going to get worse as research is done and as technology improves. You're welcome to use it, and you will probably never be cracked, but that's partially due to lack of interest. Why make your security worse intentionally? - ZorbaTHut
(8) A "-1" for all answers where there is (A) a claim that MD5 is "broken" because it suffers from collision attacks, but neglects to mention (B) the fact that known attacks on MD5 are collision attacks but not preimage attacks (where a message X and/or its hash MD5(X) are given and you don't get to choose them, and your attack is to construct another message Y such that MD5(X)==MD5(Y)). MD5 is perfectly fine for some applications. You have to know what you're using it for. Stating that it is "broken" is just spreading misinformation. - Jason S
(6) md5 is broken to the point where collisions are possible. For instance, 'foo' and 'bar' producing the same hash (though they won't, but that's not the point). A lot of code that employs a 'collision proof' hash does no other checking and then breaks horribly when a collision occurs. Password hashes are just one use for a one way 'collision proof' algorithm. Imagine a police database that matched you to a murder because a md5 hash of your fingerprint data collided with a serial killers, and you came up first? Lets not even get into safety mechanisms. If its broken, don't use it. - Tim Post
(2) I voted to reopen this, but after seeing the posts it's obvious it should have stayed closed. - Chris Lively
(1) @Tim why bother to say "breaks horribly" instead of breaking? - Col. Shrapnel
(1) Again, a collision proof hash is useful for much more than just protecting passwords. I urge you to explore other uses of hashing algorithms. - Tim Post
(1) @Col. Shrapnel, because the question is closed at the moment .. and there's not much space in comments :) As I'm not inclined to vote to reopen it .. well :) - Tim Post
(1) Okay, @Tim lets conclude that for the password hashing it's OK - Col. Shrapnel
@col. Shrapnel - even a simple linked or doubly linked key->value dictionary that used it could break if the programmer said "This hash is collision proof, there's no need to compare strings prior to returning the result ... " - Tim Post
(3) @Col. Shrapnel - Lets conclude that you have a 1 in N chance of a collision, N being the number of users. The chance that N user accidentally enters Y users name by mistake with a colliding password is so far remote that its not even worth considering. But the fact remains, if the possibility exist, however remote, don't use it when better alternates exist. - Tim Post
(61) My guess is that Col Shrapnel has stolen the hash to the password to a very valuable server. He can't figure out how to crack it himself, so he has cleverly worded it as a challenge on here to try and get someone to do it for him. - davr
@Col. Shrapnel - horribly was meant to indicate that people suffer needlessly because someone thought "Well, MD5 is good enough ..." To recap: breaks : Your app crashes. breaks horribly: people are put through undue hardship. - Tim Post
(2) I thought this would be worthwhile to talk about, but @Col. Shrapnel is just tryin to argue so I'm voting to close. He will probably end up writing his own answer that "MD5 is safe because no one can get my password" and accepting it anyway. - Earlz
ahaha @davr nice point :) I will post the pass later. - Col. Shrapnel
@davr: No, he just has a long history of arguing about it with @The Rook in comments to other questions=) - newtover
@Earlz there is nothing to talk about. The question is plain and simple. And yes, it can be closed. It doesn't matter. - Col. Shrapnel
@newtover I argue because nothing in security is subjective, there are however complex technicalities that few people are aware of. This is a very good question and i'm sorry that the trolls closed it. - Rook
@The Rook: I agree that the question is rather interesting, since mostly what we see are rumors. @Jason S shed some light on the issue though. - newtover
@Tim I don't quite understand what do you mean, But I suppose you're talking of trolls. Well, I am not. Sometimes I am not too calm, yes, But I am ask what I am ask and I write exactly what I mean. I just thought that such a topic would be a good proof, either pro or contra, depends on the answer. Nothing else - Col. Shrapnel
One example does not make a proof. - Kevin Panko
0xA3: Thanks for the link. Haven't read the paper, but the abstract claims complexity of 2^123.4 for true preimages, down from a "theoretical" 2^128 brute force complexity of an unbreakable 128-bit hash algorithm. Enough to make it faster but probably not enough to abandon use of MD5. - Jason S
@Kevin Panko: But one example is more compelling evidence than zero examples. - Bill the Lizard
@Kevin why not? Say someone will post a pass. How come it cannot be a proof? - Col. Shrapnel
Yes, that would prove something. I meant that just because nobody posts the password, does not prove nobody could ever figure it out. It would prove that those of us who tried were not able to get it, and that does say something... - Kevin Panko
(1) You need to differentiate between new and existing applications. It seems like a no-brainer to pick a better hash algorithm for new application. But if you have an existing app that uses MD5, is it time to consider a switch? Suppose that you would have to wipe out all existing passwords for example; I don't see anything in the answers that indicate it would be worth the hardship. - Mark Ransom
(6) "I don't need theoretical reasoning" is an absurd statement in this context. Cryptography is all about theoretical reasoning. The whole idea of a good cryptographic algorithm is an algorithm which, even theoretically, isn't thought possible to break with remotely reasonable resources. If you accept anything less than this as a cryptographic algorithm, then your security relies on the hope that nobody is dedicated enough to breaking your encryption that they'll go through the trouble of writing an exploit based on a known theoretical weakness targeted specifically at you. - Tyler McHenry
(1) @Mark Ransom you can update the algo used the next time they login. Thats very easy. - Rook
(1) What is exactly the question? - OscarRyz
@Oscar 1. Is MD5 really that bad? 2. If so, what's the pass behind the asterisks? - Col. Shrapnel
I think Oscar's comment sums up the problem with this question - for such a simple query, it had become exceedingly long and rambling! Pared down... - Shog9
@Shog9: Oscar also points out how this falls on the rhetorical rant side instead of asking for clarification on MD5 problems. - community_owned
(4) This is in essence a restaement of stackoverflow.com/questions/822638/… but with a more argumentative style. - ShuggyCoUk
Worth mentioning that simply googling for md5 cracker gives quite a few websites that can reverse some MD5 hashes instantaneously. And using the function as md5(pass + salt) is prone to rainbow table attacks. Also, there's at least one publicly available MD5 cracker which utilizes modern graphics cards' computational power to brute force MD5 hashes in a relatively short time - hours and days instead of weeks and months. - GeReV
I know a guy who smokes and doesn't have cancer; therefore, smoking is not bad for you. - BlueRaja - Danny Pflughoeft
@Col. Shrapnel +1 I'm glad your trying to get to the bottom of this. - Rook
Damn. Only two days for the bounty? I thought this was closed! I was going to run a brute force script but I guess two days isn't nearly enough time to find a collision. - Lotus Notes
[+162] [2010-05-04 19:27:16] 0xA3

Will that be proof enough (code included):

Peter Selinger: MD5 Collision Demo [1]

Update: Comments claim that this answer is not addressing preimage attacks (which the question didn't ask about at the time of writing this answer). I would like to add that there also have been preimage attacks against MD5 (e.g. confer Finding Preimages in Full MD5 Faster Than Exhaustive Search [2]). They are not feasible yet, but "Attacks never get worse, they only get better".

A nice summary on the current status of MD5 can be found at http://www.jcsecurity.co.uk/?p=215. The article concludes:

In conclusion, MD5 has ‘decent-enough’ preimage resistance but has some serious collision vulnerabilities which may or may not effect the application it’s deployed to. Even though MD5 might be safe under certain scenarios, unless you absolutely have to use it for reasons such as backwards compatibility or interoperability don’t. There is no point and as Schneier always says “Attacks never get worse, they only get better”. There are numerous alternatives available today, SHA256, SHA512 and Whirlpool for example are all sensible choices.

Update 2: There is another problem with MD5 in the context of being used as a password hash function. MD5 is fast. Although this article is already more than 2 years old, it still holds today:

Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes [3]

As mentioned by ShuggyCoUk [4], this question is in essence a restatement of Does any published research indicate that preimage attacks on MD5 are imminent? [5] I recommend to read on there.

[1] http://www.mscs.dal.ca/~selinger/md5collision/
[2] http://www.springerlink.com/content/d7pm142n58853467/
[3] http://chargen.matasano.com/chargen/2007/9/7/enough-with-the-rainbow-tables-what-you-need-to-know-about-s.html
[4] http://stackoverflow.com/users/12748/shuggycouk
[5] http://stackoverflow.com/questions/822638/does-any-published-research-indicate-that-preimage-attacks-on-md5-are-imminent

(4) +1 this is the only answer so far that addresses the OP's request for an example of cracking MD5. - Bill Karwin
I looked into that method. From what I could tell it requires the original file/string to hash before generating a collision. That's probably intentional as they didn't want to release a tool to the wild for script kiddies. - Reece45
Nice. Interesting read. Let's see, when SHA-512 falls :) - Techpriester
(4) Note to OxA3 : +1 for great example. This is exactly what the poster wanted. Note to OP (Col Shrapnel): Other answers are equally valid, they just didn't give you the answer in the format you wanted. This site (because it's well-designed) does not force users to give boolean responses. If so, no one would ever get the BEST answer to their questions. If you have a problem with that, use another site that functions more like a poll. - NateDSaint
(3) -1, you're not distinguishing between preimage attacks and collision attacks. The Peter Selinger demo demonstrates a collision attack, not a preimage attack. - Jason S
(3) @Jason: Col. Shrapnel does not seem to understand that there are different ways for a hash to be broken. No one has claimed that there is a good (first) preimage attack, only that MD5 is broken, which this page very clearly shows that it is. - BlueRaja - Danny Pflughoeft
@BlueRaja: Col. Shrapnel's question pertains to password security. Most people seem to have completely overlooked the issue of the salt. - Will Vousden
(3) use Digest::MD5 qw(md5_hex); my $salt='#bh35^&Res%'; my @var = (); my @num = (1 .. 255); push @var, chr($_) foreach @num; foreach my $a (@var) { foreach my $b (@var) { foreach my $c (@var) { my $md = md5_hex($salt.$a.$b.$c); print "Found: pass ($a $b $c) md5($md)" if $md eq 'c1e877411f5cb44d10ece283a37e1668'; } } } - Konerak
@Jason S: At the time I wrote my answer the question basically was: "Is MD5 really that bad" which Peter Selinger nicely demonstrates. The second part of the question (to guess the password from the hash) had already been addressed by Justin Ethier at that moment. - 0xA3
@0xA3: your edit fixes my objection. Thanks. - Jason S
+1 for the great example that illustrates the point nicely for non-cryptos like me. I'd still like to see the Col's password revealed, although that would be mostly for entertainment purposes. - Chris Thornton
what's up with the font? it's too small to read! - Phil Gan
Thanks for this - it way enlightening! - NikiC
1
[+124] [2010-05-04 21:19:02] Kevin Panko [ACCEPTED]

Your password is hunter2 [1]

[1] http://bash.org/?244321

(51) + 1 for awesome geek reference - Rubys
(7) Does he keep 1 2 3 4 5 for his luggage? - Donal Fellows
(12) @Donal: amazing! That's the same combination as on my air shield! - Graham Lee
(1) So the combination is one, two, three, four, five. That's the stupidest combination I've ever heard in my life. That's the kinda thing an idiot would have on his air shield. - Christina Mayers
(30) Did OP really accept this as best answer??? Yikes. There were some really good answers below. - webbiedave
(4) @web I was just thinking that. This was amusing, but there were some excellent answers. It got 0xA3 a Populist badge, anyway - Michael Mrozek
(3) so, let me see, you earned 550 rep points by being funny? :)) - Andrei Rinea
I don't understand it... - Midas
(2) @Midas: It's a reference to one of the most legendary bash.org quotes on the Internet - bash.org/?244321 - BoltClock
2
[+38] [2010-05-04 19:15:19] Justin Ethier

To address your points directly:

1) Yes, many collision vulnerabilities have been documented. The bottom line is that MD5 is fundamentally broken. According to Wikipedia [1]:

US-CERT of the U. S. Department of Homeland Security said MD5 "should be considered cryptographically broken and unsuitable for further use," and most U.S. government applications will be required to move to the SHA-2 family of hash functions after 2010.

Bruce Schneier [2] has also weighed in with regard to one such attack on MD5 to forge SSL certificates:

I'm not losing a whole lot of sleep because of these attacks. But -- come on, people -- no one should be using MD5 anymore.

2) Its impossible to determine what is behind the asterisks. The point of "breaking" MD5 is not to determine the original password value - it is to find a collision. If I can find another value that will hash to the same MD5 string, then that is as good as finding your original password. Of course, salting makes MD5 stronger, but it is still "broken".

[1] http://en.wikipedia.org/wiki/MD5
[2] http://www.schneier.com/blog/archives/2008/12/forging_ssl_cer.html

(3) Okay okay, it is broken. Well 1. Find a collision please. 2. show me a scenario of using that collision, to get access to the admin area on my site - Col. Shrapnel
(47) @Col. Shrapnel, I get the feeling you have already made up you mind.. and are just looking for an excuse not to implement something safe. - Jacco
(8) 1. Find it yourself: woodmann.com/collaborative/tools/index.php/SnD_Reverser_Tool 2. It's not our job to break into your site, even if anyone here really cared enough too. If you want to rely on MD5, you can have fun trying to sleep at night knowing your security is fundamentally broken. - peachykeen
(4) I break into your site and steal the hashed version of your passwds - the point of hashing is that those hashes are no use to me, I can't get the passwds from them. The flaw in MD5 is that I can find another word with the same MD5 hash without trying all possible combinations - I can then use that new word to log into your site. In practice it's not a problem - I still have to do a hell of a lot of tests, just not as many as you would think. And the collision I come up with may be 10k long which I can't use as a passwd anyway. - Martin Beckett
(3) @Martin: Quite frankly, the scenario you gave isn't plausible. If the hacker has access to get the hashed version of the passwords, then they probably already have access to UPDATE one of those passwords. Considering the salt is typically stored in the same record as the password itself, it would be trivial to replace the users password (and salt value) with my own. - Chris Lively
@Justin Ethier: So how would you go about generating a collision B for a digest A such that md5(B + salt) = A? This is what Col. Shrapnel is asking. It's all very well having a collision to the MD5 digest, but this is useless from a practical point of view because it has to be salted first. - Will Vousden
@Chris Lively - that depends on the site security! I was just explaining how/why/if MD5 collisions are an issue in this case. Personally I wouldn't worry. - Martin Beckett
@Chris Lively: Unless a bug in the page or some other problem gives you a partial view of the database. It's more likely you'll get a view of the db, especially on something like a PHP database connection call or a db query. If the user sees parts of the call, some of your secure info could leak out. If a comparison with the hashed password, or even the salt, was leaked by poorly caught errors, then checking for a hash collision is very much a valid way of hacking in. - peachykeen
@peachykeen: considering a lot of dev's use either inline sql or some type of ORM that requires access to run it's own sql calls directly (in other words NOT using stored procedures), then whoever is able to do a select on a database will have access to run insert/update statements. So, checking for collision is by definition unnecessary for a tremendous number of applications. - Chris Lively
@peachykeen: further, if they can see the db query and results then they really don't need the passwords anyway. - Chris Lively
(2) @Chris why are you being hardheaded? The fact is that relying on MD5 compromises your security. In fact it is not at all unlikely that an attacker has the ability to view part of your DB without being able to UPDATE, through SQL injection or otherwise. In fact that's probably the most common entry point to hijacking a web site short of knowing the password outright. If you're so confident, post your site URL that relies on MD5 and hope your query parametrization beats my escape schemes (I'm 80% confident it won't). - Dustin Fineout
@Dustin: You and I are saying pretty much the same thing. I'm just going a bit further. MD5 isn't the problem in my mind, ANY time you only encrypt the password and nothing else is the bigger issue. Further, giving full access at an application level to send custom sql to a database server is a huge problem. I just mentioned how to bypass a particular users security when you have update rights. The same method can be used to just get the data you want to begin with. - Chris Lively
@Dustin: I know you think that parameterization is the main issue. But it's not. What happens when they bypass your code entirely? Most security problems aren't caused by some script kiddy who brute forces a password. The real issues come in when they bypass your code and go straight to the database. Whether by installing their own php/.net/java page on your site or by simply accessing the database directly (with the password you stored in your config file). These attacks don't typically come from outside. They are inside jobs. You can't trust ANYONE. - Chris Lively
MD5 isn't encryption, it's a hashing function (it's one-way). That aside, what do you mean by "and nothing else"? I agree that giving an application free reign to issue any query is epic failure, but all I'm talking about is SELECT functionality. Parametrization isn't a problem (I never meant to imply that and no I don't think it's the main issue, it's a solution), but it must be robust to offer any protection. How would you bypass the code or access a config file? Inside attacks aren't the issue here (aren't we discussing MD5?) obviously anyone with physical access has COMPLETE access. - Dustin Fineout
3
[+25] [2010-05-04 19:23:14] user168715

There seems to be two ways in which MD5 can be "broken," not at all equivalent:

  1. It is easy to find two inputs X and Y such that md5(X) == md5(Y).

  2. Given md5(X), it is easy to find a Y such that md5(X) == md5(Y).

It's my understanding that md5 has vulnerability 1, but not 2. Although I see why vulnerability 1 might be undesirable for some applications (e.g. digital signatures), the original poster is using it for password authentication. Why is md5 "broken" for this application?


(4) +1 for the only response here that distinguishes between a preimage attack (your point #2, which is actually a "first preimage attack") and a collision attack (your point #1). - Jason S
(2) "Why is md5 "broken" for this application?" - because hashes do not become "less" broken, only more. It takes less than brute-force to brute-force an MD5 hash, which is clearly not what you want... And why use a broken hash when it is just as easy to use a non-broken one? - BlueRaja - Danny Pflughoeft
(1) The scenario is known as a Birthday Attack. When I'm searching for 1 of n answers, I can find it in substantially less than 1/n fraction of the effort needed to get 1 specific answer. Given a copy of the password file (Chris Lively in another answer thinks this is infeasible, but I wonder if he has never misplaced a backup tape in his life), I may not be able to get (or fake) YOUR credentials, but it's likely I could get several sets of credentials by this means. - Jason
@Jason Isn't this a weakness of any hash algorithm with fixed-size output, no matter how secure? I don't understand why quickly generating md5 collisions helps this kind of attack. - user168715
(1) Yes, that's basic Information Theory - The Pigeonhole principle. The idea with a cryptographically strong hash function is that you have enough pigeonholes to make finding a collision infeasible for the forseable future. That quality no longer applies sufficienty to MD5. Where opinions differ is to whether the attacks are feasible now, or simply feasible soon. But the general concensus, for a couple of years now, is that there is no justification for deploying NEW systems that rely on MD5, and that old systems should review their use and have a plan to migrate away. - Jason
Here is an example of why once #1 is broken the hash is utterly broken. Your password: "password" - MD5 hash "fjk839j3fjkl39023fmnklj" Now I can't determine password (by reversing the hash) but I can determine that using password of "hjfdh4jkdlj2@" (collision attack) will generate the identical MD5 hash "fjk839j3fjkl39023fmnklj" as your password. As far as the security system is concerned your password IS "hjfdh4jkdlj2@" (or any text that generate the same hash), I know your password, and I am you. - theUnhandledException
(2) @theUnhandled: That's #2, not #1. - user168715
4
[+13] [2010-05-04 19:37:47] Rook

md5() is broken due to possibility of a malicious person to intentionally generate a collision. This is more of a problem for Certificate Authorities and the Public Key Infrastructure (PKI) than passwords. By generating a hash collision its possible to forge a SSL certificate.

Further more a collision against md5() is generated by using a specially crafted prefix, if you append a salt to the beginning of the message then an attacker cannot control the prefix and a collision cannot be generated.

However my biggest argument is that it is trivial to use a secure message digest function. You can install mhash or use one of the many sha2 php implementations [1].

[1] http://www.google.com/codesearch?hl=en&lr=&q=lang%3aphp+sha256&sbtn=Search

(7) +1 for it is trivial to use a secure algorithm. - Jacco
I haven't been here long, but this seems to be a theme around here: People are violently determined to expend extra energy to accomplish something with (provably) worse results than the accepted practice. - Jason
(1) @Jason I haven't been here long either and I find that people like to chit-chat about security more than actually proving weather or not a system can hold water. Have you ever written an exploit? - Rook
Your question has nothing to do with the current discussion. Did you think I was baiting you? Or are you baiting me... but the answer is yes. - Jason
@Jason well that makes 2 of us(milw0rm.com/author/677), because its safe to say no one else on SO has. - Rook
I don't have a specific problem with hobbyists (I was one, and I'm sure you were once too), except for two things. It's challenging to remember to answer as precisely as you can when surrounded by so many off-the-cuff remarks, and it's a bigger problem when the situation is mistaken for a democracy rather than a meritocracy (or an autocracy). I expect I'll figure out which one we have here soon enough. Like my day job. - Jason
5
[+10] [2010-05-04 21:44:16] Tim Post

What you have done here is epitomized a train of broken thinking into a question that seemingly challenges a logical fallacy [1].

In fact, your question is as much a statement as it is a fallacy.

If a one way hash collides, even once, don't use it, period, for the purposes that you are describing.

I really, really hope that you don't work for my bank.

[1] http://en.wikipedia.org/wiki/Fallacy

(5) Well I didn't think of it from such a point of view. I apologize for that. Let me explain what do I mean. There are many statements in our life like "wash your hands before eat". I don't want to believe in these statements. I want to understand why. Yes, I've lack of intellect to understand theoretical rezoning. But I suspect that any theory can be proved with simple experiment. And I have never heard of a real example of breaking of salted MD5 hash. While everyone keep talking of it's weakness. I want to distinguish the rumors from the truth. In practical way. - Col. Shrapnel
I think the fallacy boils down basically to this: The strength of a cryptographic hash is based on some mathematical properties of the algorithm. Why wouldn't one use math to disprove the strength of that same algorithm? In fact don't exploits against these systems often come as concrete implementations of the math? (ignoring for a moment the class of errors caused by bugs in the original implementation) - Jason
(1) I remember as several years ago I wanted to deliver content in my web-application based on a REQUEST_URI hash, and decided to use md5 as the hash function. I happened to run into a collision on about 300 uris in my development environment and do not believe in md5 since that day =) - newtover
(1) @newtover, you're one lucky guy... @Tim Post: You mean a hash that has a collision published, right? All fixed-length-output hashes have an infinite number of collisions, it's just a matter of being able to find them. - Longpoke
@Longpoke - Published or 'found' by a program breaking. Part of my point is, md5 had uses far beyond hashing passwords. - Tim Post
newtover: can you possibly publish the colliding URLs? To my knowledge no non-contrived MD5 collisions have been published before. - Joshua
6
[+8] [2010-05-04 20:14:30] Jerry Coffin

Finding a collision means I find two different inputs that result in the same hash.

Finding a preimage means finding an input that results in a single, specified hash. I.e., you hash something, and I know only the hash, and my challenge is to find a second input that results in the same hash as output.

The currently known attacks on MD5 allow generation of collisions, but do not allow the generation of a second preimage.

On the other hand, the fact that MD5 has been broken to the extent necessary to find collisions easily1 gives a fairly strong indication that finding a second preimage is much more likely than with another hash that hasn't been broken to the same degree. It's possible that nobody will ever develop a preimage attack against MD5 -- but then again, it's also possible that somebody already has, and is putting it to use for their own nefarious ends rather than publishing it.

  1. There's a huge difference between the breaks of MD5 and SHA-1: the break against MD5 makes it trivial for almost anybody to create colliding inputs. If you have an old (e.g. Pentium IV) computer lying around mostly unused, it's sufficient to find colliding inputs at a rate of something like one pair per hour or so (and the "or so" mostly translates to "or very likely faster"). The break against SHA-1 is right at the border between theoretical and practical. As close as I can figure it, you'd need to spend something like a million (US) dollars to build a machine that could find one collision per week.

Col. S: comments like that get this question closed - Jason S
(1) @Jason it doesn't really matter as there are no answers anyway. - Col. Shrapnel
7
[+5] [2010-05-04 22:19:40] Bruno Brant

Assuming a password of a maximum of 20 characters and an alphabet of Latin characters and numbers, there are as many as 5e+35 possible strings.

I wrote a very simple program that attempts to generate every possible string and then compares the salted hash against the one you presented.

It will take roughly 16,1e+22 (16+22 zeros) years to calculate all hashes with this program. It's a lot of time, of course, so yeah, it seems to you that your hash is unbreakable. However, now, remember that there are collisions in MD5 hashes, right? So for every two strings that generate the same hashs, you can reduce the total number of calculations in one. If it collides a lot -- then maybe we can find your pass in a feasible time... one week, maybe?

I will try running the application in my home desktop, which has more processing power and maybe multi-thread it, even though I surely won't left it running for a week.

UPDATE: Recalculated the total of years after improving the algorithm: 15e13 years, way below the last estimative. That's with a single thread (actually, two threads, one to do the calculations and another to give status reports every 30 sec...). Tomorrow I might try creating a thread model for this app.


(4) +1 for effort, but given the context of a password file, a better argument would be made if the OP provided you with a list of 10k hashes and said, "give me a collision for one". But in either case, you're just doing brute force cracking here. And no digest, SHA2 or otherwise, is going to resist that for long unless you increase the complexity of the passwords + the salt to create enough possibilities that your attacker can't just try them all. Weakening a hash is the process of reducing the search space far below the total number of combinations. And that's what's happening. - Jason
(2) Read the article linked above: chargen.matasano.com/chargen/2007/9/7/… and especially some of the links out of that article. I think you'll realize that you can shave many orders of magnitude off of your current implementation. Threading will probably only ever shave one or two orders off of the time, but might illustrate how you'd perform the same operation on a bot net. However I will reiterate that this is still brute-forcing it, and the attacks are about getting below O(2^N) runtime. - Jason
@Jason: thanks... I will read the article. - Bruno Brant
@Jason: UP! Very f$%@$ cool article. :) - Bruno Brant
8
[+4] [2010-05-04 22:31:15] hobbs

Using a single iteration of a simple hash, even with salt, is entirely inadequate for password hashing, as they're entirely too easy to brute-force even with a hash that isn't broken. Use a proper hash-strengthening scheme or key derivation function such as bcrypt, scrypt, PBKDF2, or Glibc's iterated SHA-2 family hashes.


(3) Please don't take it as a trolling but I really don't understand why everyone says it's entirely too easy, but noone to show an example? okay, 3 alpha characters in lower case is easy. But what about 8 alphanumeric mixed case? - Col. Shrapnel
That's only a few hundred billion hashes. A modern video card can compute about a hundred million hashes per second. So you're talking a few hours for an exhaustive search of alnum^8. - hobbs
Whoops, that's not a modern video card anymore. A 3-year-old video card can do a hundred million hashes per second. A newer one will do a half-billion hashes per second. - hobbs
I had a time to think it over. Does this key derivation help against bruteforce? If so, it would be great, though I don't understand how it can be - Col. Shrapnel
(1) Of course, that's the point. A good KDF is hard to parallelize in hardware, and makes computing the hash arbitrarily difficult -- meaning you can slow down a brute-force attack by a factor of 1,000 or 1,000,000 or whatever factor you like compared to a naïve hash. - hobbs
@hobbs, my 9800GTX can do 350M/sec MD5s, and this card is years old. Current cards can do around 2B/sec afaik. - Longpoke
9
[+4] [2010-05-04 19:18:22] Alex

The weakness isn't that you can discover the value being hashed. The weakness is that if you know X and MD5(X), it's possible to construct a Y such that MD5(Y) == MD5(X). This means that it's possible to forge a message to match a signature. On top of that, it has a higher rate of collisions than other, stronger algorithms.

In general, you should use SHA-1 or better.

Read this post [1] for more information.

[1] http://mail.jabber.org/pipermail/standards/2007-September/016771.html

(2) You don't even have to know X. If you know MD5(X) it's possible to construct a Y such that MD5(Y) == MD5(X). - Stephen P
(1) I am sorry, I weren't asking for the posts. I were asking for the example. - Col. Shrapnel
@Alex: -1. (And @Stephen P as well) You're stating incorrect information. MD5 is known to have collision attacks (possible to construct two messages X and Y such that MD5(Y)==MD5(X)) but not yet known to suffer from preimage attacks, where a hash and/or a message X has been given, and the attack is to construct a message Y where MD5(X) == MD5(Y). - Jason S
10
[+3] [2010-05-04 19:14:21] Harold1983-

You are using salted hashes, which is good, because it makes it nearly impossible to do a simple reverse-lookup on a hash, even for common words. MD5 has a higher chance of collisions versus SHA1 though. This means that it's more likely that you'll get the same hash for 2 different strings with MD5, although it's still extremely unlikely. However, sha1() is a drop-in replacement for md5(), so why wouldn't you use it?


(1) Because SHA1 is broken as well: schneier.com/blog/archives/2005/02/sha1_broken.html - Jacco
Ok ok. Higher chance of collision. I've heard thet nearly thousand times. So, what's the password? - Col. Shrapnel
(3) SHA-1 is not a drop-in replacement. It has 160 bits output, not 128. Have fun dropping that in without causing problems. - Joey
(1) Or use sha256, since sha1 is also "broken" - Stephen P
(3) @Jacoo, 2^69 operations to find a single collision? THAT'S AN AWFUL LOT OF OPERATIONS. Fine, if you insist, SHA-512. Break that >.> - iconiK
Good point Johannes - sha1 hashes will be 40 character, md5 are 32. - Harold1983-
As for the salt, of course I am using it because of rainbow tables which are equal danger for ANY hashing method - Col. Shrapnel
11
[+3] [2010-05-04 19:28:09] Gabe

You probably don't need to worry about it. The problem with MD5 is that somebody may be able to find another message that hashes to the same value, which only matters in cases where the hash is visible. This is a big problem with message authentication where you publish a message and say that it is valid if it has a specific MD5 hash. If I can find another message that has the same hash, I can pass it off as valid.

In your case, however, it looks like you're hashing passwords. If you keep the hash value private, there's no way to create a collision, so you'll be safe.


..as long as there's no external access to the password datastore. Say, by SQL injections and so on. - BalusC
(4) Unless someone gains access to the database where all of your hashed passwords are stored, but you'll likely have bigger problems at that point. - Justin Johnson
@Gabe: Not true. MD5 is not yet subject to preimage attacks. - Jason S
Assuming that the password hashes are stored in the same database as the data they're protecting, there's very little value in somebody bothering to brute force hash collisions. - Gabe
(1) @Justin Johnson you might have bigger problems, but your users' biggest problem will be the fact that you leaked their password. You could at least try to protect them adequately by using good hashing practices. - hobbs
(2) @hobbs: How does leaking a salted MD5 hash of a password leak a user's password? - Gabe
@Gabe the part where a person with even the most modest resources can brute-force billions of MD5s per second. - hobbs
(2) hobbs: How do you know which of the infinite messages that hash to the same MD5 as my password is actually my password? - Gabe
(1) @Gabe It's overwhelmingly likely to be the shortest one. Even if you have what you think is a long password, it's very unlikely that even one of its MD5 preimages is shorter -- especially among the set of characters that can be typed. And that's even more true of the average user, whose password isn't long, doesn't come from a wide charset, and isn't creative. Or, put practically -- I've used password-cracking tools, and I've never had one return me a string that wasn't the correct password. The easiest guess is "always" correct. - hobbs
(2) @hobbs: Given that there's no feasible preimage attack on MD5 yet, you're better off with a dictionary attack on the web site -- you don't even need access to the password database! If your concern is that MD5 is just too fast, Col. Shrapnel can make it take 1000x longer just by hashing 1000 times in a loop, so why bother with a whole other algorithm? - Gabe
12
[0] [2010-06-10 21:10:08] GalacticJello

I'll come at this a different way. A lot of companies use MD5 hashes to "sign" their files, or assert that a file with a given hash is unique from another file. They base their entire systems on this, especially with respect to file deduplication, or single instance storage.

Now given the fact that you can have different files with the same hash (see these examples [1]), what possible faith can you put into a system that asserts that no two files exist with the same MD5 hash?

Edit to answer comments:

Let's assume a few things, to take it out of the context of the mathmatical realm, and place it into the context of the original question "Why is the use of MD5 bad in practice?"

Say your company is involved in litigation, and the opposing party demands any and all documents relating to "X". You go and buy some software that will crawl all your storage locations and caltalogs the billions of files and emails and attachments, generating an MD5 hash for each. You then exclude all "duplicate" files based on the MD5 hash, and produce the rest of the relevant documents to opposing consul.

Now say the opposing counsel is a bit of an "enthusiastic" litigator, and wants to cast doubt that your company actually met its obligations, specifically in the trustworthiness of using MD5 as a deduplication mechinsim. The opposing pary is going for your company's throat, wanting the judge to impose some hefty sanctions, or even a summary judgement.

So if you were to go in front of a court in a litigation setting, where your company was under penalty of such sanctions, your defense woud be, yes, using MD5 is fine, because:

You need to distinguish among the cases that

  • (a) hash collisions can happen (albeit with extremely small probability),
  • (b) two files can be intentionally constructed to cause a collision (this is a "collision attack", it's possible with MD5),
  • (c) an arbitrary file can be intentionally constructed to cause a collision with another file (preimage attack, not known for MD5)...
  • (d) that a hash collision w/o intentional construction of files is likely to happen (which is not true... you'd need approx 2^64 different files to have a likely collision in a 128-bit hash.)

To which the litigator would likely respond:

  • (a1) Is it possible that two different files can have the same MD5 hash?

    (your answer would have to be yes)

  • (b1) Do you know if there are any examples of two different files that have the same MD5 hash?

    (again, your answer would have to be yes)

At this point, you have lost support in the eyes of most judges. It is now up to your legal team to steer the course back onto the "MD5 is fine" track. I'd rather not be in that position in the first place. At least with SHA-256 or other longer hashes, you can answer "No" to (b1). And thus, the whole point to the question: "Why is the use of MD5 bad in practice?"

[1] http://www.mscs.dal.ca/~selinger/md5collision/

You can always have different files with the same hash: any practical hash is a function with domain space much larger than range space, so there will always be many inputs mapping into the same output. You need to distinguish among the cases that (a) hash collisions can happen (albeit with extremely small probability), (b) two files can be intentionally constructed to cause a collision (this is a "collision attack", it's possible with MD5), (c) an arbitrary file can be intentionally constructed to cause a collision with another file (preimage attack, not known for MD5)... - Jason S
...and (d) that a hash collision w/o intentional construction of files is likely to happen (which is not true... you'd need approx 2^64 different files to have a likely collision in a 128-bit hash.) For further reference see RFC4270 tools.ietf.org/html/rfc4270 - Jason S
Depending on the document format used I'd probably be able to answer the judge that no meaningful document could be produced that way (MD5 collision generator likes to generate files with bad bytes). - Joshua
And then the opposing conusel would retort with these two files: th.informatik.uni-mannheim.de/people/lucks/HashCollisions/… th.informatik.uni-mannheim.de/people/lucks/HashCollisions/… (use this to view: view.samurajdata.se) - GalacticJello
The point is, you don't need to put yourself in that position in the first place, as the original question is asking "Why is it a bad practice?" If you are designing a new system, don't add that risk. It's following a "best practice" that experts have been saying for years: don't use MD5. - GalacticJello
:shrug: -- anyone using hash algorithms should know what their strengths and weaknesses are. nobody should use them for applications where collision attacks put the application at risk. If collision attacks are not a risk (e.g. website password + salt hashing where preimage attacks are the only risk -- and that only if the website database is compromised) it's fine to use MD5 but people need to be aware enough to make that determination on their own. - Jason S
using a hash like md5 for deduplication is fine aslong as you if you have a collision, check the files byte by byte for their difference. this will give you a 100% chance that you don't kill false duplicates. so you check all files and do a bucket sort according to their hashes and then compare all files in the buckets with eachoter, if there are only few files, or do a second hash like SHA and resort them according to the SHA buckets and finaly just check all the files that are not alone in one bucket. - ITroubs
@iTroubs: That's my point. Why use MD5 when you know you will have collisions, and then when you hit a collision, you end up hashing the file using another method? That introduces signifigant slowdowns in processing billions of documents, especially when you have many, many legitimate duplicate files. You have to ensure they really are duplicates, and jump through a ton of hoops. - GalacticJello
13
[0] [2010-06-11 07:06:01] cyphers

The first step is admitting the problem which you are doing by writing this question. That's good.

Now step slowly away from the broken algorithm and use one of the many fine alternatives. If you really want to be forward looking, use one of the new candidate hashes submitted to NIST such as Skein [1].

OTOH your simplest and most widely implemented bet is probably just SHA-2 at 256 bits. That's a wonderful middle ground of size and strength. I'd advise against using SHA-1 at this point as it will be the next to fall. New code should anticipate the next move.

WRT MD5, that has pretty much been answered by others. Safe to say major X.509 certificate implementations still depend on that old beast, but anyone writing new code should use something that isn't known to be broken. The harder question is how to get rid of MD5 from old code, it's sort of like the year 2000 problem except it has no end date so it ends up just being a multi-decade security flaw happening over and over again.

[1] http://www.skein-hash.info/about

14
[0] [2011-01-08 23:19:42] Rob

Collision space aside, MD5 is a fast hashing algorithm - if I want to brute force it (even with salting) then it takes less effort to do than an equivalent in SHA1, SHA-256 or the like.

Your objective is to make it computationally expensive and slow to generate collisions; using a unique salt per value, and using a slower hashing algorithm, both work towards that goal.


15
[0] [2011-01-08 23:22:24] kmarks2

Three words: Fixed prefix attack.


16
[0] [2011-11-01 07:56:16] evan

Here's a decent practical explanation of why letting people know your salt is a horrible idea:

  1. Go to http://www.md5decrypter.com/
  2. Enter: 99e9446e78aac2056d3903e1adb8fbcd and the Recaptcha
  3. Hit Decrypt.
  4. BOOM goes the dynamite!

Results
Md5 Hash: 99e9446e78aac2056d3903e1adb8fbcd
Normal Text: #bh35^&Res%s4mep8ss

Salt: #bh35^&Res% Pass: s4mep8ss


Just needs to be in a rainbow table. Instructions to create your own here: project-rainbowcrack.com/tutorial.htm - evan
Individually yes, but having each in a different system that requires different methods for hacking, no. SQL injection to get your hashes is one thing. Access to your system to get your salt is another. Besides, you only asked for an example - they state on their site that they only hold 15M hashes, not all of them. - evan
I'm looking to get past theoretical reasoning, and come up with a simple example that we can use to demonstrate the problem(s) with MD5. I gave you your example. Don't get angry when I've done exactly what you've asked. - evan
The password was posted by me a half-year ago. So, there is no use to come up with the same password. If you want to start over - here is another hash for you. - Col. Shrapnel
17