Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Every single time this happens, I immediately wonder: "what was the hashing scheme?"

Like many others before it, Scribd disappoints by not addressing this question. Instead we get this:

Even though this information was accessed, the passwords stored by Scribd are encrypted (in technical terms, they are salted and hashed).

How long was the salt? AFIK, MD5 hashes with an insufficiently long salt can be bruteforced with open sourced CUDA setups.

Further, how did they determine the following?

Most of our users were therefore unaffected by this; however, our analysis shows that a small percentage may have had their passwords compromised.



We use scrypt for passwords hashing. This is modern hard to crack password hashing algorithm.

We do have database access logs, so it was pretty straightforward to identify which users were affected.


You should add this. Savvy people will be positively surprised to see a company actually caring about doing password authentication right.


That is awesome. You should feel comfortable telling people this; it puts you way ahead of the game.


Thanks for clarifying, good to see you're using a decent hashing algorithm :)

I'm still a little unsure of how you are able to know some users had their password compromised. Is it a simple case of finding successful log in attempts from the same IP address as attack?


Compromised != Hacked. To clarify: no accounts were accessed by the hackers, but small amount of account records have had passwords encrypted with outdated algorithm (basically SHA1 + salt), so we preemptively reset their passwords and sent out emails to all affected users.

This is how we define "compromised" - people which had their passwords hash with old algorithm, which is relatively easy to crack.


This seems to imply that many of (all?) the emails/encrypted passwords were leaked, but you don't consider most of them "compromised"...


I'd like to echo this concern -- were all emails/encrypted passwords leaked, but you only consider those protected by outdated hashing schemes to be compromised?

If so, I feel you have an obligation to alert ALL of your users.


Additional question: when did users first alert you to the hack?


For the future, I wonder how useful it would be to run old hashed passwords through a newer system such as scrypt. This way those users who haven't logged in in awhile could also benefit from the safer hashed passwords.

    scrypt(hmac_sha1(password, salt), salt, cpumemargs)
In the future, you could even do it again with more cpu and memory requirements for scrypt, upgrading older users' hashes again with another run of scrypt.


that is a weird definition of compromised.

is it true that all, or greater than 1% of, emails and hashes were dumped?

I find it hard to believe you migrated 99% of passwords to a new scheme. I've never seen over 60%, and that is with a lot of prompting to users (and as as Scribd user i've never been prompted)


The migration can be transparent, since the app has your plaintext password when you log in.

Alternatively, stored passwords can be upgraded by using the new scheme on the hold ashed password, and storing that that's how the password should be checked in the future.

Since not everyone was migrated, I'm assuming they went the first way.


http://www.scribd.com/password/check thank you for this. now I can run a list of emails against this to see who has scribd account


I just put in a bunch of fake email addresses and they all returned with "Good news - your password has not been compromised." I think the only confirmation that youd get of an existing account is if the password was compromised.


They can modify it to simply say whether your account was compromised, regardless of whether you have an account (ie, if no account -> not compromised).


...Which they ought to do. Offering the ability to enumerate user accounts is unlikely to be the immediate goal of this utility, but it's an effect nonetheless.


30 minutes later and it's fixed. Entering an invalid email also results in a "this email was not compromised" message.


That's what they're doing. "aijaspijasohisaho@asoihdshohdusudhs.com" gets a message saying that that account wasn't compromised.


That's good to hear. As a future suggestion to anyone else who finds themselves in this unfortunate situation - including some technical granularity in your press release can go miles in offering reassurance to your technical audience/users.


Why? Honestly asking: what difference does this have on the end result? Now that you know they are using scrypt, how will that impact your actions?

You could say that this has a bearing on whether you continue to use the service, but if that were the case, wouldn't it be better to suggest that all services provide this information up front?


You will not successfully maintain positive customer relationships by boiling all customer interactions down to questions like "how will that impact your actions [right now]?" Relationships are a string of positive and negative experiences that must be carefully curated.

The decision to remain in a relationship is rarely a singular event (related to a singular experience). You could think of it more as the cumulative result of all relationship experiences. Even the best relationships involve some negative experiences, but the important part is making sure those negative experiences are mitigated as best as possible. Customers will give more leeway to vendors with whom they have a strong NET positive relationship.

There are two important technical points that could have been included to great effect:

1) That they store the encryption scheme with the password record so that they can upgrade their crypto incrementally.

2) That their most recent auth algorithm uses scrypt.

So how do these two points directly impact the mitigation of what is otherwise a negative experience? First up we should look at users who will understand what points 1 & 2 mean. These users will respond positively to these items, because it changes the conversation from "Scribd just got h4x'd" to "Hey, at least they had good crypto in place."

The next tier of users will come along, read these comments, and feel more confident that the community of knowledgable people around them are feeling OK about this, so they should too.

As to the question of, "wouldn't it be better to suggest that all services provide this information up front?" I would say yes, it would. This action is not mutually exclusive of including technical details in this communication though.


>Now that you know they are using scrypt, how will that impact your actions?

For one, I'm much less annoyed/pissed off at them now that I know they use scrypt. I'm not about to cancel my account and never use them again. And I'm not freaking out about whether my email and password have been added to a botnet cracking script running against every other website out there.

I've gotten so accustomed to hearing of companies using MD5 + salt and thinking that's secure, that is a pleasant surprise to find one using bcrypt, and downright mindblowing to find one using scrypt. Yes, my expectations are low.

>wouldn't it be better to suggest that all services provide this information up front?

Yes, absolutely.


If I'm understanding kpumuk's comment elsewhere in the thread[1], if you got notified/test positive on their check page[2], then you are at risk if you've reused those credentials, since they were grandfathered hashes with weak protection.

> [...] but small amount of account records have had passwords encrypted with outdated algorithm (basically SHA1 + salt), so we preemptively reset their passwords and sent out emails to all affected users.

> This is how we define "compromised" - people which had their passwords hash with old algorithm, which is relatively easy to crack.

I came up positive on the check, which does make sense since i signed up a long time ago and don't often/ever sign in generally, so they wouldn't have had the opportunity to upgrade my hash after moving to better schemes.

Happily it was a 1-tiem/throwaway password though, but bit scary that it's the first list (that I'm aware of) I'm actually on.

[1] https://news.ycombinator.com/item?id=5493536

[2] http://www.scribd.com/password/check


So what do you do past this point? I know you can probably rough out how much time it would take to find hash collisions and ask as your users to change their passwords before that amount of time elapses, but past that point, can't you no longer assume that it's the actual user logging in to change their password?


We performed a forced password reset on the users with compromised hashes. The old password will not work on Scribd, and those users will need to go through the password reset flow to regain access.


Ah ok. I was wondering how you verified the users' identity if the password was compromised. That makes sense, thanks!


We have reset passwords for all affected users. Hashes that got leaked are not useful now.


Well, that assumes people aren't reusing those passwords.



Salts make cracking a list of N password hashes take roughly N times as long, but if a password is cracked anyway (because it's common and/or because the hash is not using very many rounds, or because an attacker only cares about one particular account), and the password is reused elsewhere, the fact that it was salted doesn't matter anymore.

GP is right; if owners of the leaked accounts [email, hash] pairs are reusing passwords, the leaked hashes are potentially useful even though scribd has reset them. They're simply not useful for logging in to scribd.


yeah if you simply know what a password is, of course it's compromised, but you're not supposed to easily break a salted hashed password.


Salts only really protect against rainbow tables; if the attacker is willing to use a dictionary or brute force attack against a single password, they're not of much use.


Can I suggest you also share what parameters you use with scrypt? Scrypt is parametric and you can choose weak or strong choices for parameters depending on how long you want to spend validating passwords.


Back when Evernote was hacked, I got the idea of creating a draft of the kind of response I would prefer in a case like this.

I also intended to write a simple website script that could generate a statement. Things came up, and the gist has gathered dust for a while.

The gist is available here with some example cases listed that companies can learn from, and people are free to provide feedback or spin it off:

http://pygm.us/EwNHanBP

Post your most important feedback in its comments, so other companies reading the gist see it as well.

Companies definitely need to be prepared for full disclosure in the event of a security breach.


»What hashing scheme do you use?« does not matter for most users.

Most users use weak passwords and a substantial part of this passwords is easy to recover using a dictionary attack. It does not really matter if you use MD5, SHA1, SHA2, HMAC, PBKDF2, bcrypt, scrypt or whatever, nor does it matter if you use no salt, the same salt for all users or a unique salt per user. Even for PBKDF2, bcrypt and scypt the cost factor will - for practical reasons - usually not be large enough to mitigate dictionary attacks using a few thousand of the most common passwords. Therefore weak passwords are compromised regardless of the used hashing scheme. And because especially users with weak passwords tend to reuse the password for different accounts many other accounts are compromised, too.

A user caring about security will not reuse passwords for different accounts and this alone reduces the impact of the event by a huge amount. Further a strong password alone makes it very unlikely that attackers will recover the password even if only unsalted MD5 is used for hashing. Therefore - unless the password is stored in plain text - it is highly unlikely that an attacker will be able to access an account protected by a strong password.

I definitely don't want to argue that using unsalted MD5 is okay - it is not - but for the average user the difference between a weak and a strong hashing scheme is not as large as one would naively expect. Strong hashing schemes will especially protect users using infrequent dictionary words or medium length hard passwords because the additional computation power required to perform a dictionary or brute force attack will force the attackers to use smaller dictionaries and shorter passwords.

Finally storing passwords may benefit from security through obscurity. If the attacker is unable to figure out the used hashing scheme he will be unable to perform a dictionary or brute force attack. This does not mean everyone should come up with there own hashing scheme - this would do MUCH more harm than good - but, for example, using a unknown random - 294,897 instead of 300,000 - cost factor and keeping it secret or adding a second secret salt buried deep in the code to the salt stored together with the username and hash will make it quite a bit harder for the attacker to perform an attack unless they got the information from an insider or were able to steal your code or binaries.


Using an in-code secret key (as opposed to the not-secret-by-necessity salt) is commonly referred to as a pepper. It improves security in the cases when an attacker has access to your database but not to your code or filesystem.

Figuring out the hashing scheme used for a given hash is frequently trivial. All an attacker needs to do is hijack his own hashed password and salt and then run combinations of common hashes with salting patterns until he gets a hit. This is going to be hundreds of combinations to test on the high end, and will generally yield results very easily.


> It does not really matter if you use ... a unique salt per user.

I agree with the other points that you make on this aspect but I do not quite understand this particular point (quoted above) can be true. If you use a strong unique salt for each user's password, then you are padding the length of actual password hashed and thereby effectively reducing the possibility of a successful dictionary attack to virtually zero. If this is so, then how could one mount a successful dictionary attack ?


It is common to store per user salts together with the hashes, often even as a single string formed by concatenating the salt and the hash. Therefore getting hold of the hashes usually means getting hold of the slats, too.

But the other case I mentioned - using the same salt buried deep in your code for all users (what is called a pepper as I learned recently) - will do what you describe until the attacker is able to figure out the pepper used by either stealing the code or brute forcing it.

Finally note that just using a pepper is no good idea and even when combined with a salt needs some careful thoughts. Just using a pepper will yield equal hashes for equal passwords while using a unique per user salt will avoided this. The other problem is that with a pepper you are reusing the same secret for each user. Therefore an attacker has thousands or even millions of samples and may be able to extract information if the scheme is not designed carefully. Combining password, salt and pepper must essentially avoid the same pitfalls as keyed hash functions when combining the key and the message. See for example the design principles behind HMAC [1].

[1] http://en.wikipedia.org/wiki/Hash-based_message_authenticati...


> How long was the salt? AFIK, MD5 hashes with an insufficiently long salt can be bruteforced with open sourced CUDA setups.

The length of the salt has little impact on security beyond 16 bits or so, where it's still feasible to generate rainbow tables for all salts.

If you're storing plain hashes, it doesn't really matter whether it's MD5, SHA-1 or SHA-256 - the work required for a brute-force attack is largely the same. The next step up would be using a key stretching algorithm like PBKDF2 or bcrypt.


Why do you want to know the hashing scheme? Isn't it better if nobody knows? :)


No




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: