Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would like to hope Stack Overflow of all companies doesn't store passwords in plaintext, but you never know.


They don't have to store them as plain text for it to be a problem.

If they're not salted it's trivial to crack the hashes, and if they are all uniquely salted, while it's time consuming, they can still gradually crack them.

Given that you could probably sift through the users to find particularly juicy targets (usernames of maintainers of top open source projects with github repos for example?) that could justify the work of a time consuming attack on the hashes.


Of all the sites and people, I expect Jeff Atwood and Joel Spolsky to have followed best practices in storing user passwords.

Jeff wrote this in 2007:

> Do not invent your own "clever" password storage scheme

> Never store passwords as plaintext.

> Add a long, unique random salt to each password you store.

> Use a cryptographically secure hash. I think Thomas hates MD5 so very much it makes him seem a little crazier than he actually is. But he's right. MD5 is vulnerable. Why pick anything remotely vulnerable, when you don't have to? SHA-2 or Bcrypt would be a better choice.

https://blog.codinghorror.com/youre-probably-storing-passwor...

Another topic on passwords: https://blog.codinghorror.com/the-dirty-truth-about-web-pass...


> SHA-2 or Bcrypt would be a better choice.

As the sibling comment points out, SHA-2 is worthless for storing passwords. A GPU can crank through an obscene number of SHA-2 hashes per second. Bcrypt is is intentionally much slower and harder to use a GPU to brute force. The two algorithms are almost totally unrelated and it's concerning they were mentioned together.


To be fair, Jeff's post was written in 2007 ... this predates Khanna's PS3 clusters. CUDA was released the same year. So I'm not sure it would be obvious in 2007 that SHA-2 would be "worthless for storing passwords".


PBKDF2 seems to have been specified in 2000 as part of RFC 2898 [1] and, IIRC, existed before then. Bcrypt is from 1999 [2]. Both of those algorithms were explicitly designed to be slow. My understanding is that PBKDF2 wasn't explicitly designed for password storage, but that it was recognized as being useful for that early on. Bcrypt was explicitly designed for password storage. The article referenced came out 7 years later. IMO: there was evidence that salted SHA-2 was not secure.

[1] - https://tools.ietf.org/html/rfc2898#section-5.2 [2] - https://en.wikipedia.org/wiki/Bcrypt


> SHA-2 is worthless for storing passwords

It really depends on what the person means. Yes, single sha-2 hash is pretty useless. But that's often not what's really happening. For example libcrypt is used in many cases with the default $6$ format which uses thousands of rounds of sha512. That's still "password hashed with sha-2".


If that were the case, his advice ought to have been something like "use libcrypt with SHA-2" or something like that. Just "use SHA-2" was kinda useless in 2007 and hasn't gotten more useful since then.


Sure. But that advice was written 12 years ago. I'd be surprised if they didn't update their password storage since then.


How do you update password storage if you don't store the passwords only hashes?

You could upgrade it for new users, but for old ones? (e.g. I don't change passwords often)


When a user whose password is hashed the old way logs in, after checking the password they just supplied against your stored old hash but before forgetting the plaintext, you can compute the new hash & update your records.

Of course, that only works for active users - it won't upgrade anyone that never logs in. Depending on just how weak the old hash is, you may want to eventually cut off any lingering un-upgraded accounts: just forget their old hash, requiring them to go through your password reset process should they ever come back. If you've left it long enough, those accounts will probably never be used again anyway, so that should be NBD.


I updated a DB from MD5 to BCrypt at my last job. Essentially on the next login I updated the password (since the password isn't hashed on the client end). So all active users got switched over. After a relatively short period I reset the remaining passwords and forced users to send password reset emails to login.


Onion hash: hash the existing hash with the new algorithm.


That throws away entropy, but I suppose so long as very few hash collisions of a password are likely to be real passwords themselves it would be harmless.


You could rehash the password the next time the user logs in.


You ask people to change their passwords, and when they do so, you store them with the new scheme. You can do that by e.g. requiring everyone to change their password on their next login/visit.


You don't even need to ask people to change their passwords; when they login, you have a copy of the original password, so you can just store the new hash over the old.


If they aren't salted, just generate your own rainbow table!


He wrote it in 2007.


He said to use Bcrypt or SHA-2. One of those algorithms was explicitly designed for password storage, the other was not. They aren't really the same class of algorithm. As such, it doesn't really make sense to suggest one as an alternative to the other. Nobody ever suggests uses Bcrypt as the hash in a hashtable - even if it technically could be made to work. GPUs that could crank through trillions of hashes per second might not have existed, but, that doesn't make it good advice. Good advice would have been "use bcrypt or, if you can't, PBKDF2 (or maybe some other library that implements a variant of one of those)"


> MD5 is vulnerable. Why pick anything remotely vulnerable, when you don't have to? SHA-2

For the purpose of hashing passwords, there's no significant difference between MD5 and SHA-2. Both are awful choices.

Edit: Actually, SHA-2 may be worse because CPUs have hardware acceleration and thus the attacker may be able to crack it somewhat faster than MD5.


It’s definitely not worse—MD5 is faster with GPUs—but plain SHA-2 isn’t much better, either.


Also "Some level of production access" could easily include the ability for the attacker to intercept and exfiltrate passwords as users log in.


I haven't checked in a while, what's the cost of cracking a single salted password that is stored properly, e.g. with pbkdf2 or bcrypt or whatever is currently state of the art?


"Cracking" isn't a meaningful operation for these algorithms. You would need to try guessing, if you don't know anything about the password then you need to guess all possible passwords until you get the right one or a collision (the pigeon hole principle applies)

If my password is "password" you can probably guess that almost instantly no matter what scheme is used. If my password is 32 characters of random base64 output then it wouldn't matter if the scheme is MD5($password) you won't guess it.

Schemes like PBKDF2 and bcrypt trade something useful in the middle, if your password is mediocre they make it too hard to bother guessing. But you'd need to define how mediocre it is to get a meaningful estimate.


The cost? That depends on the password. How long does it take to crack 123456? That depends on the parameters.

In a good case, it takes 50ms per hash on a modern system, so that's 20 attempts per second (instead of 20 billion per second for a plain md5 or so). I don't really expect a popular site to do much more than that, so that's the best case. From experience, most sites will be more around the 20 billion mark than the 20 mark, but I expect stackoverflow to be on the good end.

The current state of the art is Argon2, with scrypt second and bcrypt/pbkdf2 tied for third. The first two have memory hardness, and the first one is the standard chosen in the password hashing algorithm competition. The third is still acceptable because most developers still go for a salted single hash like sha256. Somehow they got the salting method, but I'd rather they break all identical passwords at the same time (they weren't that strong anyway if they're shared between more than one person) than that they crack only a tiny percentage because of the cost. Slowness helps more than salting, yet as a pentester I see more of the latter than the former. So I'd rather recommend something available for their platform as a good third choice than them going "meh, effort" and not implementing the recommendation.


> The current state of the art is Argon2

Details for the interested implementer: there's a lot of bad software floating around out there, be careful and do your due diligence.

Use the argon2id function. If the language binding does not expose the argon2id function, but only argon2i and argon2d, then it's outdated, avoid. If the library has not been updated past 2016 (argon2 v1.3), it's vulnerable, avoid. (Some language bindings ship with an embedded library.)

Language bindings to the argon2 library do not document how to pick good parameters because language binding authors do not understand nor care about security, the suggestions in the synopses are laughably undervalued. Compare with the expert recommendations in https://password-hashing.net/argon2-specs.pdf chap. 6.4, 8, 9 and https://tools.ietf.org/html/draft-irtf-cfrg-argon2#section-4 .

Algorithm for picking the correct values on the target server hardware:

    const PASSPHRASE := 6 random words from dictionary
    const SALT := 16 bytes from urandom
    const DURATION := 0.5   ### or greater; this is the maximum amount
                            ### you are willing for your user to wait
    mut T_COST := 1
    mut M_FACTOR := concat(4096, 'M')
    const PARALLELISM := `nproc`
    const TAG_SIZE := 16    ### bytes, or 128 bits

    while {
        const TIMER := benchtime argon2id(
            PASSPHRASE, SALT, T_COST, M_FACTOR,
            PARALLELISM, TAG_SIZE
        )
        if TIMER > DURATION {
            if 1 === T_COST {
                reduce M_FACTOR     ### e.g. divide by a constant
                jump to top of while
            } else {
                jump out of while
            }
        }
        print T_COST, concat(M_FACTOR, 'M'), TIMER
        T_COST := T_COST + 1
    }


I work on an enterprise infosec tool that just demonstrated 48 trillion MD5s per second using AWS GPUs.

Hashed passwords are cracked so easily it is a minor obstacle at this point. It is a question of when not if a hash table is fully cracked.


Well, nobody should be using MD5 (nor should they have been using it 20 years ago with the introduction of bcrypt). In fact, nobody should be using any hash function that was designed for speed (such as the SHA family) because you don't want fast hashing of passwords.

Modern cryptographic hash functions that are tailored for password hashing (such as scrypt or Argon2) are much harder to brute-force and have tunable knobs to allow you to increase the memory or CPU hardness. Obviously you cannot be safe forever but if you have a database dump of Argon2id-hashed passphrases with very strong parameters you aren't going to break it any time soon.


There are plenty of publicly leaked hash tables running MD5 and the like. Just because modern hash functions exist does not mean they are in use. [1]

Also you do not need the hash table of a hardened system to get useful passwords. You need a reused password from a weak one.

[1] https://hashes.org/leaks.php


>There are plenty of publicly leaked hash tables running MD5 and the like.

Not related to stackoverflow though.

>You need a reused password from a weak one.

If the weak password is already public, what is gained by finding out that it's a weak password in a strong DB? You've just described a dictionary attack.


My mention of MD5 was just a benchmark reference.

I can see from the downvotes the very idea that it is in regular use as triggering for some folks—-but md5 and other weak hashing algos are not just in obscure anime forums but in systems everywhere.

“They don’t like to think it be like it is, but it do.”

And it isn’t about md5 hash rate it’s about the ease of cracking in general due to low cost of compute.

If the SO password hash has leaked even in bcrypt its going to be attacked and many strong passwords will be broken. If they are reused elsewhere, important email addresses will be attempted elsewhere.

Don’t reuse passwords.


> Don’t reuse passwords.

Nobody here disagrees with this premise. I just disagree that "low cost of compute" changes the fact that functions like Argon2 can be tuned to become more expensive to crack based on changes in computation cost. If you're worried about someone spinning up something on AWS to crack hashes, bump up the memory and CPU hardness and now they'll have to spend much more money to crack your passwords. In addition, the design of most modern password hashing functions is such that you get poor parallelism on GPUs.


Isn't the whole point of modern password hashes that the don't scale with GPU compute in the same way as MD5?


They′re not saying nobody is using MD5; they’re saying “nobody should be using MD5” (emphasis mine).


And how does it do on hashes that are not known to be useless, eg bcrypt or argon2?


It is running on top of hashcat, and at the above-mentioned compute it was benchmarking 45 million bcrypts per second. At that point it is more about the attack plan than the compute.

https://imgur.com/a/DXQMsM1

edit: here is the demo video: https://www.youtube.com/watch?v=KnD4f8N1_OE


"X bcrypts per second" is completely meaningless. What was the bcrypt setting? With the right setting, it would not be more than 1 bcrypt per century, or with the wrong setting, an almost equivalent rate to md5. It depends.

More meaningful would be the speedup compared to a single CPU core, which is what the developers (should) benchmark against. They should make it as slow as possible, so if their system can do bcrypt with a cost of 15 in 0.1 seconds, they should set either that or cost 16. (Much more than 0.2s might be annoying to users or be a DOS vector.)


> With the right setting, it would not be more than 1 bcrypt per century

You can't really call that a "right" setting when it takes at least as long to log in...


Right was meant as necessary to achieve that effect (sorry, English is not my native language). Obviously this is not a recommendation but just to point out that its configuration ranges from negligible to (on today's computers) forever.


English is my native language, and I think the way you used "right" was fine. At any rate, I understood what you meant.


> It is a question of when not if a hash table is fully cracked.

...? At 48 trillion hashes / second, you could get the entire hash space in as little as 224 quadrillion years.

Of course, if you had any collisions, it would take longer. A lot longer.

This suggests that strong passwords are still just as strong under md5 as under a more modern hash. No? Use of md5 is a problem because people use passwords that are easy to guess, not because you can enumerate the hash space. The one-way-ness is as secure as ever.


No, MD5 has about 18 bit collision strength. Combined with predictable salting practice, this is crackable with a calculator. Welcome to Merkle-Damgard construction allowing any prefix or suffix.

Given random salt (random placed or mixed) or HMAC, you have to use the more complex preimage attack at 123 bits.

This is crackable with a medium sized botnet or a supercomputer.

48 THash is an underestimate. Specialized hardware easily surpasses this, even FPGA does.

SHA1 salted is a tougher customer with 60-64 bit collision resistance meaning you probably cannot crack it with your calculator. However, it is still prone to length extension meaning predictable salting has this much strength.


At 123 bits, you're five bits short of the 128 bits I calculated with. I don't think a medium sized botnet can rise to the level of doing 8 quadrillion years of work within your lifetime.

What problem are you trying to solve? As I understand it, we're discussing enumerating the hash space, such that:

1. You are given a hashed value, such as 2b0f4e60b80da7ef1e84573d764f1bf4 .

2. The value is someone's hashed password. You need to find any string which hashes to this particular fixed value, but you don't know of any such string to start with.

You can do this by brute force, but it will take you a long, long time.

The problem is NOT:

1. You have a string which hashes to a particular value.

2. You want other strings which hash to the same value.

And it also isn't:

1. You have a string which, with an unknown prefix, hashes to a particular known value.

2. You want to identify hashes which represent the same string with other prefixes applied.

I don't see where salting is relevant to the question. It's a defense against the phenomenon that cracking one user's password automatically also cracks everyone else who uses the same password (since, without salting, they all have the same hash), but it isn't a defense against having your password cracked by a targeted attack (since, in a targeted attack, there are no other hashes to be collateral damage). Why did you bring it up? What attack are you thinking of?


I agree with your thrust that your parent poster (AstralStorm) is barking in a different forest from the tree we care about, but for salting it is also a defence against pre-computation to trade space for time.

With an unsalted hash an adversary can do as much work as they want in advance, store output and then trade that in once they have your hashes to get all or most of the same rewards as if they'd done the work after getting your hashes. Rainbow tables are the most famous example, but they're part of a family of similar attacks.

Salt lets you arbitrarily discount this advance work because the attacker must do it for all possible salt values and you get to choose how many there are - the early Unix crypt() salted pessimised password hash discounts it by a factor of 4096, modern schemes often use many orders of magnitude more salt. An attacker who has $4M to attack my password scheme probably doesn't want to spend $4M now to have a $1000 advantage once they get the hashes, and they certainly won't for a 1¢ advantage.


A rainbow table is the exact attack I'm saying is still infeasible. ("Strong passwords are still just as strong.") It only works by assuming the victim uses one of a known set of weak, easy-to-guess passwords. If they don't, their hashed password is very unlikely to be in the rainbow table at all, because there's just too much hash space. The calculation in my original comment gives the approximate number of hashes necessary to fill a complete md5 rainbow table, on the assumption that you get zero collisions in the process of filling it in. (That is, every time you hash something, you get a hash you've never seen before, allowing you to add new information to the table.) That assumption is not at all realistic. By the time you've filled in half of the space, unless you can choose them cleverly to avoid collisions, half of the strings you hash should be wasted effort.

In a single-target attack, I don't really see the concept of "pre-computation to trade space for time". That hurts you by taking a lot of space, but it doesn't gain you any time, because you spent at least the required amount of time, but almost certainly more, doing the pre-computing. If you can buy someone else's pre-computed rainbow table, then sure, that's an advantage for you. But the adversary actually doing the pre-computing is doing it in order to crack many people's passwords all at once ("this table will let you identify _everyone_ whose password is qwe123"), which is the scenario I described earlier.

(At this point I feel I should clarify that "some people use the same passwords" is a real threat and a real reason to avoid md5. I just don't think the comment I responded to, "It is a question of when not if a hash table is fully cracked", was made in good faith or informed by... anything. To fully crack md5 in that way, you'd need an easily-computed function that inverts it. No amount of hashing speed is ever going to get you there.)


> In a single-target attack, I don't really see the concept of "pre-computation to trade space for time"

In a single-target attack the reason you'd do this is because you expect your target to react in a timely fashion to discovery of some other part of your attack by changing passwords.

e.g. maybe you're sure you can break in to get hashes, but you will trigger a reactive IDS. You figure you have some period of time after that trigger before your target is alerted and changes their password.

Time-space tradeoff lets you avoid doing all the work against the clock _after_ the IDS triggers, instead you can do it all _before_ you have the hashes, and only pull the trigger and set off the alarms when you're ready to quickly break the hash, get in and do whatever your actual attack requires.

It's not a _common_ scenario, but it's important to remember it exists in designing general purpose components like password hashes.


You do not do this by brute force.

You can assume the system uses a certain salt pattern, e.g. 4 byte prefix or 8 byte prefix or suffix. This can reduce work from full crack to some 40 bit crack. (Guess salt then presume stupid concat scheme, use collision attack to get matches.) That one is doable on a modern PC on a GPU. It is a targetted attack. The mass variant are salted rainbow tables.

You usually do not even have to recover actual password to use credentials associated with the hash.


Please try to describe the actual attack you're talking about. What do you have, what do you want, how do you get it.


MD5 is one thing as a password can be retrieved from a hash table. But pulling out passwords from a hashed + salted value (e.g. via bcrypt) is many orders of magnitude more infeasible, no?


Salting was originally important as a defense against rainbow tables - which are more or less obsolete with GPUs that can crank through trillions of hashes per second. The real reason that bcrypt is a better way to store a password isn't just because it uses a salt - it's because its designed to be slow and to use a bunch of memory which makes it much harder to brute force.


I would be impressed if SO's user password table is in bcrypt.


What makes you say that? bcrypt's been the defacto best practice for user password "storage" for probably 10 years now. MD5's been known to be inadequate for much longer.

Even if they had a legacy implementation in MD5, gradually migrating from storing MD5 hashes to storing bcrypt hashes is trivial to do.


From what I understand, many systems do not choose to implement strong hashing algos.


Even PHP's hash_function uses scrypt. Yes, some people explicitly decide to hash everything with sha1 but nothing you or I do will ever be able to stop them.


I would be disappointed if such a high-profile and technically savvy site would be using anything less.


10 years ago, almost to the day, they had a password vulnerability involving unsalted hashes. So not plain text, but who knows if they've learned the right lessons?

https://blog.codinghorror.com/i-just-logged-in-as-you-how-it...


Unless I'm badly misreading something, the unsalted hashes in that post were on some other (unspecified) site unrelated to Stack Overflow.


In the first sentence, he links to a previous post where it's made more clear that he's talking about Stack Overflow.

> I found what one could call a security hole in Stackoverflow. I'm curious enough to go digging around for holes, but too ethical to actually do anything with them.


I hate to say this, but did you even read the post you actually linked?

From the second post (the one you linked), where Jeff quotes the hacker:

>>I guess I can tell you, so you don't fall into this trap again. There's a site I help out with that doesn't salt their passwords. They're MD5 encrypted, but if you've got a dictionary password, it's very easy to use a reverse-MD5 site to get the original. I was able to figure out you were a user on the site some time back, and realized I could do this, if only I knew your openid provider...

The "password vulnerability involving unsalted hashes" was on another site. The hacker was only able to gain access to Jeff's Stack Overflow account through a combination of their privileged access to the database for that other site and Jeff's own bad op-sec. The only real security error attributable to Stack Overflow in any way was Jeff's own and had nothing to do with Stack Overflow's infrastructure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: