share
Stack OverflowThe definitive guide to form-based website authentication
[+5518] [11] Michiel de Mare
[2008-08-02 19:51:50]
[ forms http security authentication language-agnostic ]
[ https://stackoverflow.com/questions/549/the-definitive-guide-to-form-based-website-authentication ]

Moderator note:

This question is not a good fit for our question and answer format with the topicality rules [1] which currently apply for Stack Overflow. We normally use a "historical lock" for such questions where the content still has value. However, the answers on this question are actively maintained and a historical lock doesn't permit editing of the answers. As such, a "wiki answer" lock has been applied to allow the answers to be edited. You should assume the topicality issues which are normally handled by a historical lock are present (i.e. this question not a good example of an on-topic question for Stack Overflow).

Form-based authentication for websites

We believe that Stack Overflow should not just be a resource for very specific technical questions, but also for general guidelines on how to solve variations on common problems. "Form based authentication for websites" should be a fine topic for such an experiment.

It should include topics such as:

It should not include things like:

Please help us by:

  1. Suggesting subtopics
  2. Submitting good articles about this subject
  3. Editing the official answer
(54) Why exclude HTTP Basic Authentication? It can work in HTML Forms via Ajax: peej.co.uk/articles/http-auth-with-html-forms.html - system PAUSE
(56) HTTP Basic Auth has the property of being (comparatively) difficult to make a browser forget. It's also horribly insecure if you don't use it with SSL to secure the connection (i.e., HTTPS). - Donal Fellows
(24) I think it'd be worth talking about sessions (including fixation and hijacking) cookies (the secure and http only flags) HTTP based SSO - symcbean
(2) Key Stretching for decreasing dictionary attacks if your passwords are comprimised - en.wikipedia.org/wiki/Key_strengthening - James
(29) The super-useful HttpOnly cookie flag, which prevents JavaScript-based cookie theft (a subset of XSS attacks), should be mentioned somewhere too. - Alan H.
(5) We should probably have a best-practices tag or something similar for excellent questions and answers like this one. - ptman
(6) I vote on closing because I believe this question in its current state does not fit SO format. One long answer that everyone's editing seems plain wrong. Instead, I would reformat it into small useful chunks like they did with this question (most upvoted question on Programmers). - Dan Abramov
(81) Wow. Lengthy answers, dozens of upvotes for some of them, yet nobody mentions the common mistake of serving login forms over HTTP. I've even argued with people who said "but it submits to https://..." and only got blank stares when I asked if they were sure an attacker didn't rewrite the non-encrypted page the form was served over. - dzuelke
Good point @dzuelke, not to mention that the user has no direct way of checking that its sensitive data is going to be transmitted over a secure connection to a trustworthy server (i mean, checking the server certificate) - idelvall
github.com/FallibleInc/security-guide-for-developers is a good reference - Matt Kocaj
(1) There is a suggestion to move the question to SO Documentation meta.stackoverflow.com/questions/332092/… - Michael Freidgeim
What about using time, that is needed to fill in the form? One should do some research, present a website to bots like meat an observe the time needed to fill the form and submit. Should be under a second. Then observe what time human users need. Simply take the average or even do a bayes classification to differenciate the one from another... - Chris Pillen
(1) @ChrisPillen For a while people could tell if a user was a bot because, when moving to click an enter button, a bot would go straight down then straight across. Humans, of course, move in weird, pseudo-random diagonal lines. So bot-writers responded by having their bots move in weird, pseudo-random diagonal lines. If you can program your site to expect behavior, someone can program their bot to behave that way; that's just an arms race. It's much better to rely on things provably infeasible for computers. - Lord Farquaad
@LordFarquaad I understand that. But that means, there a several bot writers out there, who do not make the effort. And time is special. Because the bot writer needs to build in time loops. That means bot authors need more time to run their bots. Which, in some cases, will crush their business model. - Chris Pillen
@Chris Parts VI and VII in the top answer address throttling, so time is impacted regardless. My point is that calculating some number that a human probably won't beat goes against one of the security fundamentals, which is that you shouldn't try to "outsmart" a bot. It's true that if you're clever enough you can beat most bots, but getting involved in an arms race like that usually means you've got a fundamental flaw somewhere else. It's much better to remove that flaw than try to stay a step ahead, because sooner or later a bot is going to outpace you, and they only need to do that once. - Lord Farquaad
While fun reading, the topic is indeed too broad. Security is a cat-and-mouse game. Hackers find new holes all the time that new security will plug, and vice versa. Not yet mentioned in the answers: behavioural checking. E.g. Google will prompt for your password every once in a while, especially if you do unexpected things. So do contact-less bank cards if you pay in a store or town where you have not been seen before. - Roland
[+3962] [2009-01-25 11:27:46] Jens Roland [ACCEPTED]

PART I: How To Log In

We'll assume you already know how to build a login+password HTML form which POSTs the values to a script on the server side for authentication. The sections below will deal with patterns for sound practical auth, and how to avoid the most common security pitfalls.

To HTTPS or not to HTTPS?

Unless the connection is already secure (that is, tunneled through HTTPS using SSL/TLS), your login form values will be sent in cleartext, which allows anyone eavesdropping on the line between browser and web server will be able to read logins as they pass through. This type of wiretapping is done routinely by governments, but in general, we won't address 'owned' wires other than to say this: Just use HTTPS.

In essence, the only practical way to protect against wiretapping/packet sniffing during login is by using HTTPS or another certificate-based encryption scheme (for example, TLS [1]) or a proven & tested challenge-response scheme (for example, the Diffie-Hellman [2]-based SRP). Any other method can be easily circumvented by an eavesdropping attacker.

Of course, if you are willing to get a little bit impractical, you could also employ some form of two-factor authentication scheme (e.g. the Google Authenticator app, a physical 'cold war style' codebook, or an RSA key generator dongle). If applied correctly, this could work even with an unsecured connection, but it's hard to imagine that a dev would be willing to implement two-factor auth but not SSL.

(Do not) Roll-your-own JavaScript encryption/hashing

Given the perceived (though now avoidable [3]) cost and technical difficulty of setting up an SSL certificate on your website, some developers are tempted to roll their own in-browser hashing or encryption schemes in order to avoid passing cleartext logins over an unsecured wire.

While this is a noble thought, it is essentially useless (and can be a security flaw [4]) unless it is combined with one of the above - that is, either securing the line with strong encryption or using a tried-and-tested challenge-response mechanism (if you don't know what that is, just know that it is one of the most difficult to prove, most difficult to design, and most difficult to implement concepts in digital security).

While it is true that hashing the password can be effective against password disclosure, it is vulnerable to replay attacks, Man-In-The-Middle attacks / hijackings (if an attacker can inject a few bytes into your unsecured HTML page before it reaches your browser, they can simply comment out the hashing in the JavaScript), or brute-force attacks (since you are handing the attacker both username, salt and hashed password).

CAPTCHAS against humanity

CAPTCHA [5] is meant to thwart one specific category of attack: automated dictionary/brute force trial-and-error with no human operator. There is no doubt that this is a real threat, however, there are ways of dealing with it seamlessly that don't require a CAPTCHA, specifically properly designed server-side login throttling schemes - we'll discuss those later.

Know that CAPTCHA implementations are not created alike; they often aren't human-solvable, most of them are actually ineffective against bots, all of them are ineffective against cheap third-world labor (according to OWASP [6], the current sweatshop rate is $12 per 500 tests), and some implementations may be technically illegal in some countries (see OWASP Authentication Cheat Sheet [7]). If you must use a CAPTCHA, use Google's reCAPTCHA [8], since it is OCR-hard by definition (since it uses already OCR-misclassified book scans) and tries very hard to be user-friendly.

Personally, I tend to find CAPTCHAS annoying, and use them only as a last resort when a user has failed to log in a number of times and throttling delays are maxed out. This will happen rarely enough to be acceptable, and it strengthens the system as a whole.

Storing Passwords / Verifying logins

This may finally be common knowledge after all the highly-publicized hacks and user data leaks we've seen in recent years, but it has to be said: Do not store passwords in cleartext in your database. User databases are routinely hacked, leaked or gleaned through SQL injection, and if you are storing raw, plaintext passwords, that is instant game over for your login security.

So if you can't store the password, how do you check that the login+password combination POSTed from the login form is correct? The answer is hashing using a key derivation function [9]. Whenever a new user is created or a password is changed, you take the password and run it through a KDF, such as Argon2, bcrypt, scrypt or PBKDF2, turning the cleartext password ("correcthorsebatterystaple") into a long, random-looking string, which is a lot safer to store in your database. To verify a login, you run the same hash function on the entered password, this time passing in the salt and compare the resulting hash string to the value stored in your database. Argon2, bcrypt and scrypt store the salt with the hash already. Check out this article [10] on sec.stackexchange for more detailed information.

The reason a salt is used is that hashing in itself is not sufficient -- you'll want to add a so-called 'salt' to protect the hash against rainbow tables [11]. A salt effectively prevents two passwords that exactly match from being stored as the same hash value, preventing the whole database being scanned in one run if an attacker is executing a password guessing attack.

A cryptographic hash should not be used for password storage because user-selected passwords are not strong enough (i.e. do not usually contain enough entropy) and a password guessing attack could be completed in a relatively short time by an attacker with access to the hashes. This is why KDFs are used - these effectively "stretch the key" [12], which means that every password guess an attacker makes causes multiple repetitions of the hash algorithm, for example 10,000 times, which causes the attacker to guess the password 10,000 times slower.

Session data - "You are logged in as Spiderman69"

Once the server has verified the login and password against your user database and found a match, the system needs a way to remember that the browser has been authenticated. This fact should only ever be stored server side in the session data.

If you are unfamiliar with session data, here's how it works: A single randomly-generated string is stored in an expiring cookie and used to reference a collection of data - the session data - which is stored on the server. If you are using an MVC framework, this is undoubtedly handled already.

If at all possible, make sure the session cookie has the secure and HTTP Only flags set when sent to the browser. The HttpOnly flag provides some protection against the cookie being read through XSS attack. The secure flag ensures that the cookie is only sent back via HTTPS, and therefore protects against network sniffing attacks. The value of the cookie should not be predictable. Where a cookie referencing a non-existent session is presented, its value should be replaced immediately to prevent session fixation [13].

Session state can also be maintained on the client side. This is achieved by using techniques like JWT (JSON Web Token).

PART II: How To Remain Logged In - The Infamous "Remember Me" Checkbox

Persistent Login Cookies ("remember me" functionality) are a danger zone; on the one hand, they are entirely as safe as conventional logins when users understand how to handle them; and on the other hand, they are an enormous security risk in the hands of careless users, who may use them on public computers and forget to log out, and who may not know what browser cookies are or how to delete them.

Personally, I like persistent logins for the websites I visit on a regular basis, but I know how to handle them safely. If you are positive that your users know the same, you can use persistent logins with a clean conscience. If not - well, then you may subscribe to the philosophy that users who are careless with their login credentials brought it upon themselves if they get hacked. It's not like we go to our user's houses and tear off all those facepalm-inducing Post-It notes with passwords they have lined up on the edge of their monitors, either.

Of course, some systems can't afford to have any accounts hacked; for such systems, there is no way you can justify having persistent logins.

If you DO decide to implement persistent login cookies, this is how you do it:

  1. First, take some time to read Paragon Initiative's article [14] on the subject. You'll need to get a bunch of elements right, and the article does a great job of explaining each.

  2. And just to reiterate one of the most common pitfalls, DO NOT STORE THE PERSISTENT LOGIN COOKIE (TOKEN) IN YOUR DATABASE, ONLY A HASH OF IT! The login token is Password Equivalent, so if an attacker got their hands on your database, they could use the tokens to log in to any account, just as if they were cleartext login-password combinations. Therefore, use hashing (according to https://security.stackexchange.com/a/63438/5002 a weak hash will do just fine for this purpose) when storing persistent login tokens.

PART III: Using Secret Questions

Don't implement 'secret questions'. The 'secret questions' feature is a security anti-pattern. Read the paper from link number 4 from the MUST-READ list. You can ask Sarah Palin about that one, after her Yahoo! email account got hacked during a previous presidential campaign because the answer to her security question was... "Wasilla High School"!

Even with user-specified questions, it is highly likely that most users will choose either:

  • A 'standard' secret question like mother's maiden name or favorite pet

  • A simple piece of trivia that anyone could lift from their blog, LinkedIn profile, or similar

  • Any question that is easier to answer than guessing their password. Which, for any decent password, is every question you can imagine

In conclusion, security questions are inherently insecure in virtually all their forms and variations, and should not be employed in an authentication scheme for any reason.

The true reason why security questions even exist in the wild is that they conveniently save the cost of a few support calls from users who can't access their email to get to a reactivation code. This at the expense of security and Sarah Palin's reputation. Worth it? Probably not.

PART IV: Forgotten Password Functionality

I already mentioned why you should never use security questions for handling forgotten/lost user passwords; it also goes without saying that you should never e-mail users their actual passwords. There are at least two more all-too-common pitfalls to avoid in this field:

  1. Don't reset a forgotten password to an autogenerated strong password - such passwords are notoriously hard to remember, which means the user must either change it or write it down - say, on a bright yellow Post-It on the edge of their monitor. Instead of setting a new password, just let users pick a new one right away - which is what they want to do anyway. (An exception to this might be if the users are universally using a password manager to store/manage passwords that would normally be impossible to remember without writing it down).

  2. Always hash the lost password code/token in the database. AGAIN, this code is another example of a Password Equivalent, so it MUST be hashed in case an attacker got their hands on your database. When a lost password code is requested, send the plaintext code to the user's email address, then hash it, save the hash in your database -- and throw away the original. Just like a password or a persistent login token.

A final note: always make sure your interface for entering the 'lost password code' is at least as secure as your login form itself, or an attacker will simply use this to gain access instead. Making sure you generate very long 'lost password codes' (for example, 16 case-sensitive alphanumeric characters) is a good start, but consider adding the same throttling scheme that you do for the login form itself.

PART V: Checking Password Strength

First, you'll want to read this small article for a reality check: The 500 most common passwords [15]

Okay, so maybe the list isn't the canonical list of most common passwords on any system anywhere ever, but it's a good indication of how poorly people will choose their passwords when there is no enforced policy in place. Plus, the list looks frighteningly close to home when you compare it to publicly available analyses of recently stolen passwords.

So: With no minimum password strength requirements, 2% of users use one of the top 20 most common passwords. Meaning: if an attacker gets just 20 attempts, 1 in 50 accounts on your website will be crackable.

Thwarting this requires calculating the entropy of a password and then applying a threshold. The National Institute of Standards and Technology (NIST) Special Publication 800-63 [16] has a set of very good suggestions. That, when combined with a dictionary and keyboard layout analysis (for example, 'qwertyuiop' is a bad password), can reject 99% of all poorly selected passwords [17] at a level of 18 bits of entropy. Simply calculating password strength and showing a visual strength meter [18] to a user is good, but insufficient. Unless it is enforced, a lot of users will most likely ignore it.

And for a refreshing take on user-friendliness of high-entropy passwords, Randall Munroe's Password Strength xkcd [19] is highly recommended.

Utilize Troy Hunt's Have I Been Pwned API [20] to check users passwords against passwords compromised in public data breaches.

PART VI: Much More - Or: Preventing Rapid-Fire Login Attempts

First, have a look at the numbers: Password Recovery Speeds - How long will your password stand up [21]

If you don't have the time to look through the tables in that link, here's the list of them:

  1. It takes virtually no time to crack a weak password, even if you're cracking it with an abacus

  2. It takes virtually no time to crack an alphanumeric 9-character password if it is case insensitive

  3. It takes virtually no time to crack an intricate, symbols-and-letters-and-numbers, upper-and-lowercase password if it is less than 8 characters long (a desktop PC can search the entire keyspace up to 7 characters in a matter of days or even hours)

  4. It would, however, take an inordinate amount of time to crack even a 6-character password, if you were limited to one attempt per second!

So what can we learn from these numbers? Well, lots, but we can focus on the most important part: the fact that preventing large numbers of rapid-fire successive login attempts (ie. the brute force attack) really isn't that difficult. But preventing it right isn't as easy as it seems.

Generally speaking, you have three choices that are all effective against brute-force attacks (and dictionary attacks, but since you are already employing a strong passwords policy, they shouldn't be an issue):

  • Present a CAPTCHA after N failed attempts (annoying as hell and often ineffective -- but I'm repeating myself here)

  • Locking accounts and requiring email verification after N failed attempts (this is a DoS [22] attack waiting to happen)

  • And finally, login throttling: that is, setting a time delay between attempts after N failed attempts (yes, DoS attacks are still possible, but at least they are far less likely and a lot more complicated to pull off).

Best practice #1: A short time delay that increases with the number of failed attempts, like:

  • 1 failed attempt = no delay
  • 2 failed attempts = 2 sec delay
  • 3 failed attempts = 4 sec delay
  • 4 failed attempts = 8 sec delay
  • 5 failed attempts = 16 sec delay
  • etc.

DoS attacking this scheme would be very impractical, since the resulting lockout time is slightly larger than the sum of the previous lockout times.

To clarify: The delay is not a delay before returning the response to the browser. It is more like a timeout or refractory period during which login attempts to a specific account or from a specific IP address will not be accepted or evaluated at all. That is, correct credentials will not return in a successful login, and incorrect credentials will not trigger a delay increase.

Best practice #2: A medium length time delay that goes into effect after N failed attempts, like:

  • 1-4 failed attempts = no delay
  • 5 failed attempts = 15-30 min delay

DoS attacking this scheme would be quite impractical, but certainly doable. Also, it might be relevant to note that such a long delay can be very annoying for a legitimate user. Forgetful users will dislike you.

Best practice #3: Combining the two approaches - either a fixed, short time delay that goes into effect after N failed attempts, like:

  • 1-4 failed attempts = no delay
  • 5+ failed attempts = 20 sec delay

Or, an increasing delay with a fixed upper bound, like:

  • 1 failed attempt = 5 sec delay
  • 2 failed attempts = 15 sec delay
  • 3+ failed attempts = 45 sec delay

This final scheme was taken from the OWASP best-practices suggestions (link 1 from the MUST-READ list) and should be considered best practice, even if it is admittedly on the restrictive side.

As a rule of thumb, however, I would say: the stronger your password policy is, the less you have to bug users with delays. If you require strong (case-sensitive alphanumerics + required numbers and symbols) 9+ character passwords, you could give the users 2-4 non-delayed password attempts before activating the throttling.

DoS attacking this final login throttling scheme would be very impractical. And as a final touch, always allow persistent (cookie) logins (and/or a CAPTCHA-verified login form) to pass through, so legitimate users won't even be delayed while the attack is in progress. That way, the very impractical DoS attack becomes an extremely impractical attack.

Additionally, it makes sense to do more aggressive throttling on admin accounts, since those are the most attractive entry points

PART VII: Distributed Brute Force Attacks

Just as an aside, more advanced attackers will try to circumvent login throttling by 'spreading their activities':

  • Distributing the attempts on a botnet to prevent IP address flagging

  • Rather than picking one user and trying the 50.000 most common passwords (which they can't, because of our throttling), they will pick THE most common password and try it against 50.000 users instead. That way, not only do they get around maximum-attempts measures like CAPTCHAs and login throttling, their chance of success increases as well, since the number 1 most common password is far more likely than number 49.995

  • Spacing the login requests for each user account, say, 30 seconds apart, to sneak under the radar

Here, the best practice would be logging the number of failed logins, system-wide, and using a running average of your site's bad-login frequency as the basis for an upper limit that you then impose on all users.

Too abstract? Let me rephrase:

Say your site has had an average of 120 bad logins per day over the past 3 months. Using that (running average), your system might set the global limit to 3 times that -- ie. 360 failed attempts over a 24 hour period. Then, if the total number of failed attempts across all accounts exceeds that number within one day (or even better, monitor the rate of acceleration and trigger on a calculated threshold), it activates system-wide login throttling - meaning short delays for ALL users (still, with the exception of cookie logins and/or backup CAPTCHA logins).

I also posted a question with more details and a really good discussion of how to avoid tricky pitfals [23] in fending off distributed brute force attacks

PART VIII: Two-Factor Authentication and Authentication Providers

Credentials can be compromised, whether by exploits, passwords being written down and lost, laptops with keys being stolen, or users entering logins into phishing sites. Logins can be further protected with two-factor authentication, which uses out-of-band factors such as single-use codes received from a phone call, SMS message, app, or dongle. Several providers offer two-factor authentication services.

Authentication can be completely delegated to a single-sign-on service, where another provider handles collecting credentials. This pushes the problem to a trusted third party. Google and Twitter both provide standards-based SSO services, while Facebook provides a similar proprietary solution.

MUST-READ LINKS About Web Authentication

  1. OWASP Guide To Authentication [24] / OWASP Authentication Cheat Sheet [25]
  2. Dos and Don’ts of Client Authentication on the Web (very readable MIT research paper) [26]
  3. Wikipedia: HTTP cookie [27]
  4. Personal knowledge questions for fallback authentication: Security questions in the era of Facebook (very readable Berkeley research paper) [28]
[1] https://en.wikipedia.org/wiki/Transport_Layer_Security
[2] https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange
[3] https://letsencrypt.org/
[4] https://stackoverflow.com/questions/1380168/does-it-make-security-sense-to-hash-password-on-client-end
[5] https://en.wikipedia.org/wiki/CAPTCHA
[6] https://en.wikipedia.org/wiki/OWASP
[7] https://www.owasp.org/index.php/Authentication_Cheat_Sheet
[8] https://en.wikipedia.org/wiki/ReCAPTCHA
[9] https://en.wikipedia.org/wiki/Key_derivation_function
[10] https://security.stackexchange.com/a/31846/8340
[11] https://en.wikipedia.org/wiki/Rainbow_table
[12] https://en.wikipedia.org/wiki/Key_stretching
[13] https://owasp.org/www-community/attacks/Session_fixation
[14] https://paragonie.com/blog/2015/04/secure-authentication-php-with-long-term-persistence
[15] http://www.whatsmypass.com/?p=415
[16] https://en.wikipedia.org/wiki/Password_strength#NIST_Special_Publication_800-63
[17] https://cubicspot.blogspot.com/2012/01/how-to-calculate-password-strength-part.html
[18] https://blogs.dropbox.com/tech/2012/04/zxcvbn-realistic-password-strength-estimation/
[19] https://xkcd.com/936/
[20] https://haveibeenpwned.com/API/
[21] https://www.lockdown.co.uk/?pg=combi&s=articles
[22] https://en.wikipedia.org/wiki/Denial-of-service_attack
[23] https://stackoverflow.com/questions/479233/what-is-the-best-distributed-brute-force-countermeasure
[24] https://www.owasp.org/index.php/Authentication_Cheat_Sheet
[25] https://www.owasp.org/index.php/Authentication_Cheat_Sheet
[26] https://pdos.csail.mit.edu/papers/webauth:sec10.pdf
[27] https://en.wikipedia.org/wiki/HTTP_cookie#Drawbacks_of_cookies
[28] https://cups.cs.cmu.edu/soups/2008/proceedings/p13Rabkin.pdf

(71) Well, I don't really agree with the Captcha part, yes Captchas are annoying and they can be broken (except recaptcha but this is barely solvable by humans!) but this is exactly like saying don't use a spam filter because it has less than 0.1% false negatives .. this very site uses Captchas, they are not perfect but they cut a considerable amount of spam and there's simply no good alternative to them - Waleed Eissa
(257) @Jeff: I'm sorry to hear that you have issues with my reply. I didn't know there was a debate on Meta about this answer, I would have gladly edited it myself if you'd asked me to. And deleting my posts just deleted 1200 reputation from my account, which hurts :( - Jens Roland
Great answer, thanks. Can you explain why the 'Improved' Best Practices for persistent login is flawed? At first blush it does seem like it addresses the DOS scenario for invalidating sessions with Miller's approach, what am I missing? - jfager
(1) Don't forget to look at your crypto algorithm. You wouldn't want to be hit by a side channel attack such as timing. - Colin Bowern
(14) "After sending the authentication tokens, the system needs a way to remember that you have been authenticated - this fact should only ever be stored serverside in the session data. A cookie can be used to reference the session data." Not quite. You can (and should, for stateless servers!) use a cryptographically signed cookie. That's impossible to forge, doesn't tie up server resources, and doesn't need sticky sessions or other shenanigans. - Martin Probst
(1) Are there any good articles discussing either (1) theory behind and/or (2) implementation of challenge-response mechanisms? - eykanal
(5) For persistent login. How is the Username+token better than just a random session-id that the server has mapped to the username and loginstatus? I have never even heard of the first method before. - aero
(3) This is the longest and most interesting answer I have ever seen on Stack Overflow. But it really is more suitable for a blog post! - Chinmoy
How is a second cookie for persistent login better than one cookie/session with a longer expiration time? - s4y
While Tyler Atkin's password strength library looks excellent, it is licensed under the GPL which is kind of weird for a JavaScript library. Providing a link to an alternative implementation that can be used without restriction in a commercial web app would be good. - Bayard Randel
(12) "a desktop PC can search the FULL KEYSPACE up to 7 characters in less than 90 days" A machine with a recent GPU can search the full 7 char keyspace in less than 1 day. A top of the line GPU can manage 1 billion hashes per second. golubev.com/hashgpu.htm This leads to some conclusions about password storage which aren't directly addressed. - Frank Farmer
(4) I totally agree with you on security questions. They are very bad. I just wish I could explain this to my bank. - Tim Matthews
(2) @aero: By including the username in the persistent login cookie token, you avoid a scenario where an attacker has 10 million chances to guess a correct token (on a site with 10 million users). Of course, if the token string is long enough, it shouldn't be a big issue, but I would still recommend it. - Jens Roland
@Sidnicious: Session cookies are transient by definition, ie. they automatically expire when the user closes the browser window. - Jens Roland
@Colin Bowern: Are you aware of any successful timing attacks on remote web sites? From the literature I've read on successful timing attacks, they require sub-millisecond variance, which makes them impractical for remote attacking web sites (where common response times are in the 20-50 millisecond range) - Jens Roland
@fjager: sure - I added a simple note to the post explaining why it doesn't add significant security to the system. - Jens Roland
@Jens Why? If the purpose of the cookie is to store information about the login session, why wouldn't it last as long as the login is going to last? It seems like have separate session/remember-me cookies just adds complexity. - s4y
(3) @Sidnicious: You are not talking about a session cookie then. Session cookies by definition have no expires directive. If you add an expires directive to a session cookie, it becomes a persistent cookie. It sounds like you want to make your "session" cookie persistent and eliminate the true session-limited cookie. I guess that's possible, but you lose all the useful things you can do with session cookies, i.e. keeping transient lightweight state objects in memory (you don't want to keep all session data alive for months, trust me). - Jens Roland
@Jens Roland: About the rememberMe, in the cookie, I should save the same token that is stored in the DB? If not, I should save in the cookie the value that originated the Token? - jonathancardoso
(1) @Jonathan: The token is just a random generated value. Save the user ID and token in the cookie, then save a hash of the token in the DB. that way, you can validate the login cookie by re-hashing the token and checking the user record for a match - Jens Roland
@Mike: Good points. If a user tries to log in (ie. posts login credentials to the server) they will not be checked and an error is returned. A server-synchronized countdown clock should be displayed next to the login form to inform the user of the throttling. The form itself or the submit button should be disabled until the countdown is done. - Jens Roland
@Mike: Account-based 'locking' should IMO only happen if the user breaks your site's TOS while logged in, if account fees are unpaid, or if the user is investigated by the authorities. Until the user has logged in correctly, you (generally) can't be sure that the failed login attempts aren't a malicious third party trying to lockout the real user. - Jens Roland
@Mike: IP-based 'locking' however, can be advisable in case of obvious DoS attacks. That doesn't have anything to do with login form security in itself though, and is usually handled directly on the network traffic, so that request flooding from a single IP automatically triggers an IP block for a number of hours. - Jens Roland
@Jens, thanks for the clarifications. I was just thinking that I have accounts with two different banks. One of them locks your account for 2 hours with 3 failed logins. The other locks your account indefinitely after 5'ish failed logins and forces you to call their tech support to reenable it. Neither (as far as I could tell) lock your IP address. - Mike
@jfager: When the user re-logs-in, nothing is deleted, since the safety deletion only occurs if the site is passed an old cookie containing a valid series identifier and an invalid token. Since there is no such cookie (because the attacker deleted it), the system simply initiates a new series and doesn't even warn the user. (remember: a user can have many laptops = many series identifiers, and the 'improved' system can't tell the difference) - Jens Roland
@jfager: Absolutely correct - I didn't mean to portray it as a drawback, my only point is that the 'improvement' doesn't actually deliver on its promise. It can make implementors and site owners believe their auth system protects against cookie stealing, and it doesn't -- and fake security is worse than no security. - Jens Roland
(5) There are a few things here which were either briefly mentioned or not at all. One being the use of hash algorithms specific to passwords (e.g. bcrypt). This means that the time to compute the hash increases to 0.1 seconds (at least) but that makes a big difference when searching an entire keyspace. The second thing is that security must be everywhere: sanatizing user inputs, using prepared statements for sql, making sure that resources arent url predictable, then there is the whole XSS set of issues... - chacham15
(3) Agreed with @chacham15. Lockout and throttling policy is great, but using a password digest like bcrypt or scrypt instead of a general purpose hashing function enforces this at the lowest level possible. There is no way around having to put in the full amount of work to generate a password digest, so no developer can simply forget to implement or accidentally disable the throttling mechanism. - Stephen Touset
@namelessjon: Correction: Even if all users had high-entropy passwords, we'd still want to hash them in case of database theft - Jens Roland
@Ferdy: Thanks :) a fixed 1 or 2 second delay will still work, but increasing it incrementally to 10 or 20 seconds will multiply the resilience against brute force attacks. - Jens Roland
@JensRoland Thanks for confirming the simplified model works. Albeit it less effective, I would prefer avoiding the pain that is reliably tracking login attempts. Better yet, I like part VIII of your answer best: outsourcing the issue alltogether. It's not just security that is a pain in building your own user management system, something as seemingly simple as reliably sending out emails is a pain as well. - Fer
NEVER use recaptcha, if you want to retain any of your custoemrs, ever. Seriously, if you have to use a captcha then use one that humans have a chance of solving first time, they might not be unbreakable but they're a good compromise because most will stop a variety of bots without annoying your customers too much. - jonhobbs
(6) @MikeMike: "..and loop through them in php" -- why not just select the row in SQL? SELECT * FROM LoginTokens WHERE UserID=[userid from cookie] AND HashedToken=[hash(token from cookie)] should work just fine (remember to use prepared statements / stored procedures for the SQL though) - Jens Roland
Does anyone know where I can find a vetted PHP/MySQL implementation of Charles Miller's almost-10-year-old "Persistent Login Cookie" solution? Please post it here: stackoverflow.com/questions/15647261/… - ProgrammerGirl
@JensRoland: How would you select the row in SQL when the Token is hashed with bcrypt? Please answer here: stackoverflow.com/questions/15685951/… - ProgrammerGirl
@Jesse: I would agree with you if the 'improved' system actually did that, but if an attacker deletes the cookie, the site has no clue that an attack has taken place - only that a new chain has been added to the collection of valid chains. If and only if the user monitors his own number of valid chains stored in the database, never clears his own cookies, and keeps track of 100% of the devices he uses to authenticate on the site, will he be able to deduce that an attack MAY have taken place. The scenario where the system automatically detects the intrusion does not apply to cookie deletion. - Jens Roland
(1) @Jesse: You are correct, but the improvement seems highly speculative to me, since essentially every attack vector which would allow a third party access to the cookie would also enable them to delete it. Hoping that they won't is nothing more than security through obscurity. Even if there is added protection against the weakest attackers, I would still argue that giving the site owners / legitimate users a false sense of security by advertising an 'intrusion detection' mechanism which doesn't actually protect them, is downright irresponsible and should be discouraged. - Jens Roland
Just a thought. What about bitcoin-like hashing for credentials on the client? Iterative hashing of username, password & a random number (session specific) that takes couple of seconds on the client (legitimate users won't mind slight delay) and it would be a nightmare for attackers. Valid credentials and proof of work would be required for successful authentication. - Patrick
(1) "hashing the password client-side": more than just saying that this is "useless", it should be made very clear that this is a security flaw. This makes the hash password equivalent, and means that the server, by storing the password hash, is effectively storing the password (and none of us do that, right?) Here's a question on this exact point. - jameshfisher
@JensRoland Great post, thank you. If my site doesn't have any state it wants to keep in the session cookie for example, then do my session cookie and persistent cookie basically each only contain distinct large random strings (and username for the persistent)? - Newtang
(1) @Newtang: The persistent cookie should always be a large random string (and optionally a username for convenience). The session cookie is usually set by your web/MVC framework (ASP.NET_SessionId, PHPSESSID, ci_session, JSESSIONID) and should contain just the session identifier (another large random string). If you want to keep additional state about the session, that state should normally be stored server-side, with the session identifier used as a primary key. - Jens Roland
@shannon: Absolutely - as long your user data is protected by strong encryption, you could theoretically upload it to Pastebin and it wouldn't be a problem -- and yet I don't think anyone would feel comfortable with such a setup. Some frameworks actually provide such an encrypted-cookie solution as an alternative to a session database, and it is a viable solution, no doubt. The problems begin when people implement that solution with weak or no encryption (and they do), or exceed the data limits on cookie data. - Jens Roland
@JensRoland: I guess my thought was, if we assume the encryption is weak, then we have to worry that even a simple session token is also not safe, and we risk session hijacking, replays, and any number of other related attacks. - shannon
Would not a "secret question" be beneficial when used in conjunction with typical password reset functionality? A typical password reset process ensures only the possessor of the email account can reset their password. But if the user has had their email account compromised, then the addition of a security question could foil an attempt to reset their password and gain access to a site. - Courtney Miles
I saw one Captcha that was done in Canvas HTML 5 which was moving three balls around so that they were in the smallest to largest (circles rather than balls) and it was actually really nice. I have not come across that page in a while. - Doug Hauf
@Andrew: That's not actually the case. Multiple SIDs (chains) are permitted because it's the way multiple logged in devices (e.g. the home PC, work PC, and iPad) are supported. Under a single-chain (single-device) scheme, each time a user switches between devices, the 'theft' warning is raised, eventually rendering it worthless because the user sees it all the time. - Jens Roland
@JensRoland If you want to allow remember-me for multiple devices, then you have to modify the solution. I think one way would be instead of SID, store IP (hashed) in the cookie and in the DB. The DB would have a pool of IPs, some of them marked as Valid and others marked as Banned. Upon manual login, User's IP is stored in the cookie and in the DB and marked as Valid. Upon every auto or manual login, the incoming user's IP is checked against the DB (token too). If IP is found and Valid - green light. If IP is not found - then green light, BUT incoming user's IP is marked as Valid and - Andrew
and the old IP (from the cookie) is marked as Banned in the DB. If owner logs in using old (banned) IP, red light - theft is assumed. Everything gets cleared. This is not 100% proof, as until the real owner comes back, thief will be using the site and also during that time Owners IP might change, but the chances are minimized and multiple devices are allowed. Optionally, the banned IP's could be deleted every month / 3 months, etc, as ISP might assign 'old' IP again. - Andrew
Just a note: SSL is NOT the only solution for "safe login". You can actually make a sniffer-safe login process using a hashed password. Basically, when generating the login page you send a salt to the browser (which will be saved on server-side too), and when submitting the form data, you'll send the hashed password and salt (look @ crypto-js), server then hashes the password + generated salt on the session and compares both. If the password on the server is hashed, just hash the password on the browser, then concat the salt and hash again. - WoLfulus
@WoLfulus: First of all, as an MITM attacker all I have to do is replace the salt (or the JS hashing function) you're sending to the browser over an unencrypted connection and your entire scheme is void. Secondly, even if I can't manipulate the server response ahead of time, simple sniffing will give me the salt, the hash function and the hashed password+salt, which opens you up to simple dictionary and brute force attacks. Never trust client side hashing for security. - Jens Roland
@JensRoland The salt is stored on the server side for validation too, if you change it on the client you'll fail the validation on the server, resulting in an invalid password. I agree with the bruteforce. - WoLfulus
@WoLfulus: Even if the login doesn't go through, I now own your password, despite your effort to mask it with hashing. - Jens Roland
(1) @jameshfisher " more than just saying that this is "useless", it should be made very clear that this is a security flaw. This makes the hash password equivalent, and means that the server, by storing the password hash, is effectively storing the password" It's only a security flaw if you don't also use a separate salted hash on the server side. If you simple hash on the client side, then salted hash on the server side, the password itself is never in transit, and cracking the database still doesn't mean being able to authenticate. - Parthian Shot
@ParthianShot has a point, but client side hashing is still useless -- and adding needless complexity to a security system just creates another moving part that can confuse the maintainer and potentially introduce security flaws (such as leading an inexperienced dev to drop the server side salt+rehashing) - Jens Roland
(1) @JensRoland "leading an inexperienced dev to drop the server side salt+rehashing" Reminds me of "a common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools". You kind of need to assume that you have a competent dev team. However, salting the hash client side has one huge advantage; a person who decrypts traffic after-the-fact can still only use what they get as an auth token on your site, whether or not the end user is using the same password across multiple sites. Although defense in depth does add complexity. - Parthian Shot
(1) I'm not a fan of Charlie Miller's strategy to be honest. The system I favor just employs two random values: One is stored in the database as an index, the other is stored in the client's cookie and its hash is stored in the database. We then compare the hashes in constant-time. paragonie.com/blog/2015/04/… - Explained in detail here. - Scott Arciszewski
I'm thinking about the Session data: "This fact should only ever be stored server side in the session data". And about a clustered environment (like in a cloud)? How the user session should be stored? - kavain
@kavain: In such environments, it's common to keep sessions in a database (hosted on a separate server). This decouples sessions from the specific web server handling the request (stateless HTTP FTW) and has a minimum security level equal to your database security, which is usually a good thing - Jens Roland
Interesting that SRP is mentioned in the answer yet so many of the comments discuss obsolete hashing. JavaScript SRP libraries like thinbus-srp are fast and effective with the only overhead being the fetch of a salt and challenge few the server to perform a zero-knowledge password-proof to the server. - simbo1905
@JensRoland Thanks for the great answer! Can you please elaborate on this - "The delay is not a delay before returning the response to the browser. It is more like a timeout ... during which login attempts to a specific account ... will not be accepted or evaluated at all." - I understand you're basically saying that using something like sleep(30) before the "invalid login details" response is sent won't do, since the bot/hacker can continue submitting login attempts with separate requests executed in parallel; but how do you suggest enforcing a timeout then? Disabling the form's login button? - Kosta Kontos
@KostaKontos you'll need to trigger an asychronous timeout which runs on the server side, that's all. How you display that to the user is only a UI problem, since the security measure isn't on the front end at all. - Jens Roland
Good piece. However, password strength is almost totally irrelevant - uniqueness between sites is all that matters. Either the database has been captured in which case the actual user data is lost. Or the attacker is attempting a brute force web attack - which you can easily mitigate. Unless a user's password is extremely weak (eg monkey123) - so long as a password is unique across sites super strong passwords are largely pointless and will contribute to users forgetting them. If you specifically tell users to use a unique password and they don't, that's their responsibility. - niico
@niico password strength is not irrelevant if an attacker wishes to target a specific user (on some systems it may suffice for an attacker to gain access to any random account, but in many cases, such as on social sites, attackers want to crack specific user accounts). The difficulty of cracking a single user password depends on the number of permitted login attempts, but also largely on the password strength. If people are allowed to use low-entropy passwords, they will choose 'password', 'secret', 'qwerty', and 'letmein' (and their dog's name). You don't need many attempts to guess those. - Jens Roland
Regarding 'Don't reset a forgotten password to an autogenerated strong password - such passwords are notoriously hard to remember, which means the user must ... just let users pick a new one right away - which is what they want to do anyway.' I would argue that if you use a 3rd party password manager this (remembering autogenerated pws) is no longer a problem. I would also posit that letting a user pick a new one (if they're relying on human memory to retain the password) they're going to pick an easily-cracked or repeated password 9 times out of 10 anyway. Now you're back to a worse problem. - Jeff Mergler
"When a lost password code is requested, send the plaintext code to the user's email address" you are not supposed to store the plain password in database - DiaJos
@Webman: I think you're confused. That sentence continues "then hash it, save the hash in your database -- and throw away the original". It explicitly says to not store the plaintext code in the DB. As for captchas, yes many implementations are ineffective against half-decent pattern recognition software these days. - Jens Roland
okay I have understood that if you can send the plaintext code to user that would mean that you store this as plain in database because AFAIK it's impossible to reverse the hashing process but I can be wrong, okay for captcha I understand what you mean now - DiaJos
The point is that you'd generate a new random one each time. If there is a previous reset code hash still in the DB (these should expire automatically after a limited time, say, 30 minutes) you simply overwrite it. Another option is to create a payload and wrap it in something like a JWT which the server can validate upon receipt, but that's only really worth it if you're using JWTs elsewhere so you have that logic in place already. - Jens Roland
Why isn't CSRF protection being explained here? - Temp O'rary
(1) @TempO'rary this answer was originally written in 2009, and CSRF wasn't as widespread or as well understood at the time; CORS didn't arrive in browsers until ~2013 (and CSRF was added to the OP's question in 2016); truth is I didn't include it because it felt a bit exotic. But you're right, it should have been added since then. - Jens Roland
1
[+435] [2008-08-02 20:40:45] Michiel de Mare

Definitive Article

Sending credentials

The only practical way to send credentials 100% securely is by using SSL [1]. Using JavaScript to hash the password is not safe. Common pitfalls for client-side password hashing:

  • If the connection between the client and server is unencrypted, everything you do is vulnerable to man-in-the-middle attacks [2]. An attacker could replace the incoming javascript to break the hashing or send all credentials to their server, they could listen to client responses and impersonate the users perfectly, etc. etc. SSL with trusted Certificate Authorities is designed to prevent MitM attacks.
  • The hashed password received by the server is less secure [3] if you don't do additional, redundant work on the server.

There's another secure method called SRP, but it's patented (although it is freely licensed [4]) and there are few good implementations available.

Storing passwords

Don't ever store passwords as plaintext in the database. Not even if you don't care about the security of your own site. Assume that some of your users will reuse the password of their online bank account. So, store the hashed password, and throw away the original. And make sure the password doesn't show up in access logs or application logs. OWASP recommends the use of Argon2 [5] as your first choice for new applications. If this is not available, PBKDF2 or scrypt should be used instead. And finally if none of the above are available, use bcrypt.

Hashes by themselves are also insecure. For instance, identical passwords mean identical hashes--this makes hash lookup tables an effective way of cracking lots of passwords at once. Instead, store the salted hash. A salt is a string appended to the password prior to hashing - use a different (random) salt per user. The salt is a public value, so you can store them with the hash in the database. See here [6] for more on this.

This means that you can't send the user their forgotten passwords (because you only have the hash). Don't reset the user's password unless you have authenticated the user (users must prove that they are able to read emails sent to the stored (and validated) email address.)

Security questions

Security questions are insecure - avoid using them. Why? Anything a security question does, a password does better. Read PART III: Using Secret Questions in @Jens Roland answer [7] here in this wiki.

Session cookies

After the user logs in, the server sends the user a session cookie. The server can retrieve the username or id from the cookie, but nobody else can generate such a cookie (TODO explain mechanisms).

Cookies can be hijacked [8]: they are only as secure as the rest of the client's machine and other communications. They can be read from disk, sniffed in network traffic, lifted by a cross-site scripting attack, phished from a poisoned DNS so the client sends their cookies to the wrong servers. Don't send persistent cookies. Cookies should expire at the end of the client session (browser close or leaving your domain).

If you want to autologin your users, you can set a persistent cookie, but it should be distinct from a full-session cookie. You can set an additional flag that the user has auto-logged in, and needs to log in for real for sensitive operations. This is popular with shopping sites that want to provide you with a seamless, personalized shopping experience but still protect your financial details. For example, when you return to visit Amazon, they show you a page that looks like you're logged in, but when you go to place an order (or change your shipping address, credit card etc.), they ask you to confirm your password.

Financial websites such as banks and credit cards, on the other hand, only have sensitive data and should not allow auto-login or a low-security mode.

List of external resources

[1] http://en.wikipedia.org/wiki/SSL
[2] https://stackoverflow.com/questions/14907581/ssl-and-man-in-the-middle-misunderstanding
[3] https://security.stackexchange.com/questions/45254/owasp-recommendation-on-client-side-password-hashing
[4] http://srp.stanford.edu/license.txt
[5] https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet#Impose_infeasible_verification_on_attacker
[6] http://www.codeproject.com/Articles/704865/Salted-Password-Hashing-Doing-it-Right
[7] http://srp.stanford.edu/license.txt
[8] http://en.wikipedia.org/wiki/Session_hijacking
[9] http://pdos.csail.mit.edu/papers/webauth:sec10.pdf
[10] http://news.ycombinator.com/item?id=205572
[11] http://www.codinghorror.com/blog/archives/000953.html
[12] http://news.ycombinator.com/item?id=55660
[13] http://en.wikipedia.org/wiki/Password_cracking
[14] http://www.securityfocus.com/blogs/262

(1) Given the recent MITM vulnerability surrounding signed SSL certificates (blog.startcom.org/?p=145) so a combination of SSL and some kind of Challenge response authentication (There are alternatives to SRP) is probably a better solution. - Kevin Loney
a lot of this stuff is situational. i tend not to use session cookies at all. cookies getting hijacked is almost always the servers fault. man in the middle / packet sniffing arent that common - Shawn
BCrypt Nuget package : nuget.org/List/Packages/BCrypt - Fabian Vilers
(1) Note 1 about this answer: it is a draft, to be edited as a wiki. If you can edit this, you're welcome to. - Peter Mortensen
SRP is specific to the presence of several parties if I understand well - DiaJos
2
[+170] [2011-08-08 15:32:34] Charlie

First, a strong caveat that this answer is not the best fit for this exact question. It should definitely not be the top answer!

I will go ahead and mention Mozilla’s proposed BrowserID [1] (or perhaps more precisely, the Verified Email Protocol [2]) in the spirit of finding an upgrade path to better approaches to authentication in the future.

I’ll summarize it this way:

  1. Mozilla is a nonprofit with values [3] that align well with finding good solutions to this problem.
  2. The reality today is that most websites use form-based authentication
  3. Form-based authentication has a big drawback, which is an increased risk of phishing [4]. Users are asked to enter sensitive information into an area controlled by a remote entity, rather than an area controlled by their User Agent (browser).
  4. Since browsers are implicitly trusted (the whole idea of a User Agent is to act on behalf of the User), they can help improve this situation.
  5. The primary force holding back progress here is deployment deadlock [5]. Solutions must be decomposed into steps which provide some incremental benefit on their own.
  6. The simplest decentralized method for expressing an identity that is built into the internet infrastructure is the domain name.
  7. As a second level of expressing identity, each domain manages its own set of accounts.
  8. The form “account@domain” is concise and supported by a wide range of protocols and URI schemes. Such an identifier is, of course, most universally recognized as an email address.
  9. Email providers are already the de-facto primary identity providers online. Current password reset flows usually let you take control of an account if you can prove that you control that account’s associated email address.
  10. The Verified Email Protocol was proposed to provide a secure method, based on public key cryptography, for streamlining the process of proving to domain B that you have an account on domain A.
  11. For browsers that don’t support the Verified Email Protocol (currently all of them), Mozilla provides a shim which implements the protocol in client-side JavaScript code.
  12. For email services that don’t support the Verified Email Protocol, the protocol allows third parties to act as a trusted intermediary, asserting that they’ve verified a user’s ownership of an account. It is not desirable to have a large number of such third parties; this capability is intended only to allow an upgrade path, and it is much preferred that email services provide these assertions themselves.
  13. Mozilla offers their own service to act like such a trusted third party. Service Providers (that is, Relying Parties) implementing the Verified Email Protocol may choose to trust Mozilla's assertions or not. Mozilla’s service verifies users’ account ownership using the conventional means of sending an email with a confirmation link.
  14. Service Providers may, of course, offer this protocol as an option in addition to any other method(s) of authentication they might wish to offer.
  15. A big user interface benefit being sought here is the “identity selector”. When a user visits a site and chooses to authenticate, their browser shows them a selection of email addresses (“personal”, “work”, “political activism”, etc.) they may use to identify themselves to the site.
  16. Another big user interface benefit being sought as part of this effort is helping the browser know more about the user’s session [6] – who they’re signed in as currently, primarily – so it may display that in the browser chrome.
  17. Because of the distributed nature of this system, it avoids lock-in to major sites like Facebook, Twitter, Google, etc. Any individual can own their own domain and therefore act as their own identity provider.

This is not strictly “form-based authentication for websites”. But it is an effort to transition from the current norm of form-based authentication to something more secure: browser-supported authentication.

[1] https://browserid.org/
[2] https://wiki.mozilla.org/Identity/Verified_Email_Protocol/Latest
[3] http://www.mozilla.org/about/manifesto.en.html
[4] http://en.wikipedia.org/wiki/Phishing
[5] http://www.w3.org/2011/identity-ws/papers/idbrowser2011_submission_10.pdf
[6] https://wiki.mozilla.org/Identity/Verified_Email_Protocol/Latest-Session

(3) BrowserID link is dead - Spoody
The project seems to have been mothballed.... see en.wikipedia.org/wiki/Mozilla_Persona - Jeff Olson
3
[+153] [2012-05-22 12:11:20] Pieter888

I just thought I'd share this solution that I found to be working just fine.

I call it the Dummy Field (though I haven't invented this so don't credit me). Others know this as a honey pot.

In short: you just have to insert this into your <form> and check for it to be empty at when validating:

<input type="text" name="email" style="display:none" />

The trick is to fool a bot into thinking it has to insert data into a required field, that's why I named the input "email". If you already have a field called email that you're using you should try naming the dummy field something else like "company", "phone" or "emailaddress". Just pick something you know you don't need and what sounds like something people would normally find logical to fill in into a web form. Now hide the input field using CSS or JavaScript/jQuery - whatever fits you best - just don't set the input type to hidden or else the bot won't fall for it.

When you are validating the form (either client or server side) check if your dummy field has been filled to determine if it was sent by a human or a bot.

Example:

In case of a human: The user will not see the dummy field (in my case named "email") and will not attempt to fill it. So the value of the dummy field should still be empty when the form has been sent.

In case of a bot: The bot will see a field whose type is text and a name email (or whatever it is you called it) and will logically attempt to fill it with appropriate data. It doesn't care if you styled the input form with some fancy CSS, web-developers do it all the time. Whatever the value in the dummy field is, we don't care as long as it's larger than 0 characters.

I used this method on a guestbook in combination with CAPTCHA [1], and I haven't seen a single spam post since. I had used a CAPTCHA-only solution before, but eventually, it resulted in about five spam posts every hour. Adding the dummy field in the form has stopped (at least until now) all the spam from appearing.

I believe this can also be used just fine with a login/authentication form.

Warning: Of course this method is not 100% foolproof. Bots can be programmed to ignore input fields with the style display:none applied to it. You also have to think about people who use some form of auto-completion (like most browsers have built-in!) to auto-fill all form fields for them. They might just as well pick up a dummy field.

You can also vary this up a little by leaving the dummy field visible but outside the boundaries of the screen, but this is totally up to you.

Be creative!

[1] http://en.wikipedia.org/wiki/CAPTCHA

(37) This is a useful anti-spam trick, but I would suggest using a field name other than 'email', or you may find that browser auto-fill's fill it in, inadvertently blocking genuine users of your site. - Nico Burns
(1) I already noted that in the Warning section, but yeah that's true. - Pieter888
(2) Aww, I came up with this on my own one day, and as you mentioned, once I applied it to my contact forms across multiple websites it has absolutely blocked spam for over a year. Love it, but I was dreading the day when I saw it mentioned on a high visibility website. ;) - Dustin Graham
you can turn off autocomplete for individual form fields. what you should be careful of is humans using useragents that do not recognise style sheets. - bluesmoon
(8) I also have several more of these using visibility:hidden and also position:absolute;top:-9000px you can also do text-indent and also z-index on a few of these elements and place them in compressed CSS file names with awkward names - since bots can detect 1display:none` and they now check for a range of combinations - I actually use these methods and they're old tricks of the trade. +1 - TheBlackBenzKid
(2) Thought I would mention the potential accessibility concern here. You should include a message (which can also be hidden) that the field should be left blank. - Michael Mior
(21) What happens when a user with a vision impairment is using a screenreader to navigate the form? - soycharliente
(9) This technique has a name: the honeypot en.wikipedia.org/wiki/Honeypot_(computing) - pixeline
(27) No need for inline styling. Just add a class to the field (maybe use a weird word that could never mean anything to a bot), and hide it via the site's CSS file. Like: <input type="text" name="email" class="cucaracha"> and in your CSS: .cucaracha { display:none; }. - Ricardo Zea
Careful of usability, as users who want to use auto-populating fields by plug-ins or browser settings may populate these fields automagically. Then you're hurting your users above preventing hackers. - Erik Philips
Instead of z-indexing the input, how about leaving it as a completely normal-looking <input type="text" name="email" /> and then z-index something over the top of it? - rybo111
(1) FYI, it is called the honeypot technique. Very useful against bots, presumably not so much against spamming sweat farms where real humans interact with your forms. - pixeline
@NicoBurns, that is a good point, but I think emptying the field by js can help. - dav
I don't know if this is used out there, but for those who are worried that this technique won't work now that it is published, just add a second field that must be filled in (most likely by a JavaScript, such as copying an existing field on submission if the field is empty). Then, you check for one field that must be empty, and one filled with your "JavaScript Key". Without a human examining the code, this is virtually impossible for a bot to hack. - Sablefoste
It would take only a brief analysis of how your site does logins to be able to duplicate authentic login attempts. - Phil
Bots normally just curl forms. Hiding the field via javascript could be an option. You could also clear any of the browser's auto-complete data this way. - ThomasHaz
4
[+88] [2011-08-08 16:29:10] lifeisstillgood

I do not think the above answer is "wrong" but there are large areas of authentication that are not touched upon (or rather the emphasis is on "how to implement cookie sessions", not on "what options are available and what are the trade-offs".

My suggested edits/answers are

  • The problem lies more in account setup than in password checking.
  • The use of two-factor authentication is much more secure than more clever means of password encryption
  • Do NOT try to implement your own login form or database storage of passwords, unless the data being stored is valueless at account creation and self-generated (that is, web 2.0 style like Facebook, Flickr [1], etc.)

    1. Digest Authentication is a standards-based approach supported in all major browsers and servers, that will not send a password even over a secure channel.

This avoids any need to have "sessions" or cookies as the browser itself will re-encrypt the communication each time. It is the most "lightweight" development approach.

However, I do not recommend this, except for public, low-value services. This is an issue with some of the other answers above - do not try an re-implement server-side authentication mechanisms - this problem has been solved and is supported by most major browsers. Do not use cookies. Do not store anything in your own hand-rolled database. Just ask, per request, if the request is authenticated. Everything else should be supported by configuration and third-party trusted software.

So ...

First, we are confusing the initial creation of an account (with a password) with the re-checking of the password subsequently. If I am Flickr and creating your site for the first time, the new user has access to zero value (blank web space). I truly do not care if the person creating the account is lying about their name. If I am creating an account of the hospital intranet/extranet, the value lies in all the medical records, and so I do care about the identity (*) of the account creator.

This is the very very hard part. The only decent solution is a web of trust. For example, you join the hospital as a doctor. You create a web page hosted somewhere with your photo, your passport number, and a public key, and hash them all with the private key. You then visit the hospital and the system administrator looks at your passport, sees if the photo matches you, and then hashes the web page/photo hash with the hospital private key. From now on we can securely exchange keys and tokens. As can anyone who trusts the hospital (there is the secret sauce BTW). The system administrator can also give you an RSA [2] dongle or other two-factor authentication.

But this is a lot of a hassle, and not very web 2.0. However, it is the only secure way to create new accounts that have access to valuable information that is not self-created.

  1. Kerberos and SPNEGO - single sign-on mechanisms with a trusted third party - basically the user verifies against a trusted third party. (NB this is not in any way the not to be trusted OAuth [3])

  2. SRP [4] - sort of clever password authentication without a trusted third party. But here we are getting into the realms of "it's safer to use two-factor authentication, even if that's costlier"

  3. SSL [5] client side - give the clients a public key certificate (support in all major browsers - but raises questions over client machine security).

In the end, it's a tradeoff - what is the cost of a security breach vs the cost of implementing more secure approaches. One day, we may see a proper PKI [6] widely accepted and so no more own rolled authentication forms and databases. One day...

[1] http://en.wikipedia.org/wiki/Flickr
[2] http://en.wikipedia.org/wiki/RSA_%28security_firm%29
[3] http://en.wikipedia.org/wiki/OAuth
[4] http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol
[5] http://en.wikipedia.org/wiki/SSL
[6] http://en.wikipedia.org/wiki/Public-key_infrastructure

(30) Hard to tell which answer you are talking about in 'I do not think the above answer is "wrong"' - Davorak
5
[+60] [2011-08-08 23:07:08] josh

When hashing, don't use fast hash algorithms such as MD5 (many hardware implementations exist). Use something like SHA-512. For passwords, slower hashes are better.

The faster you can create hashes, the faster any brute force checker can work. Slower hashes will therefore slow down brute forcing. A slow hash algorithm will make brute forcing impractical for longer passwords (8 digits +)


(6) SHA-512 is also fast, so you need thousands of iterations. - Seun Osewa
(7) More like something like bcrypt which is designed to hash slowly. - Fabian Nicollier
(6) As mentioned in another answer, "OWASP recommends the use of Argon2 as your first choice for new applications. If this is not available, PBKDF2 or scrypt should be used instead. And finally if none of the above are available, use bcrypt." Neither MD5 nor any of the SHA hashing functions should ever be used for hashing passwords. This answer is bad advice. - Mike
6
[+57] [2012-11-24 21:15:52] pixeline

My favourite rule in regards to authentication systems: use passphrases, not passwords. Easy to remember, hard to crack. More info: Coding Horror: Passwords vs. Pass Phrases [1]

[1] http://www.codinghorror.com/blog/2005/07/passwords-vs-pass-phrases.html

even better - hard to remember, hard to guess: Password managers ... linking to article from 2005 where that likely meant an excel spreadsheet :) - felickz
7
[+33] [2015-07-18 01:18:52] Iain Duncan

I'd like to add one suggestion I've used, based on defense in depth. You don't need to have the same auth&auth system for admins as regular users. You can have a separate login form on a separate url executing separate code for requests that will grant high privileges. This one can make choices that would be a total pain to regular users. One such that I've used is to actually scramble the login URL for admin access and email the admin the new URL. Stops any brute force attack right away as your new URL can be arbitrarily difficult (very long random string) but your admin user's only inconvenience is following a link in their email. The attacker no longer knows where to even POST to.


A simple link in an email isn't actually secure, since email is not secure. - David Spector
It is as secure as any other token based password reset system that is not two-factor though. Which is almost all of them. - Iain Duncan
8
[+22] [2015-08-16 17:31:35] user9869932

I dont't know whether it was best to answer this as an answer or as a comment. I opted for the first option.

Regarding the poing PART IV: Forgotten Password Functionality in the first answer, I would make a point about Timing Attacks.

In the Remember your password forms, an attacker could potentially check a full list of emails and detect which are registered to the system (see link below).

Regarding the Forgotten Password Form, I would add that it is a good idea to equal times between successful and unsucessful queries with some delay function.

https://crypto.stanford.edu/~dabo/papers/webtiming.pdf


9
[+20] [2015-04-29 01:06:29] Mike Robinson

I would like to add one very important comment: -

  • "In a corporate, intra- net setting," most if not all of the foregoing might not apply!

Many corporations deploy "internal use only" websites which are, effectively, "corporate applications" that happen to have been implemented through URLs. These URLs can (supposedly ...) only be resolved within "the company's internal network." (Which network magically includes all VPN-connected 'road warriors.')

When a user is dutifully-connected to the aforesaid network, their identity ("authentication") is [already ...] "conclusively known," as is their permission ("authorization") to do certain things ... such as ... "to access this website."

This "authentication + authorization" service can be provided by several different technologies, such as LDAP (Microsoft OpenDirectory), or Kerberos.

From your point-of-view, you simply know this: that anyone who legitimately winds-up at your website must be accompanied by [an environment-variable magically containing ...] a "token." (i.e. The absence of such a token must be immediate grounds for 404 Not Found.)

The token's value makes no sense to you, but, should the need arise, "appropriate means exist" by which your website can "[authoritatively] ask someone who knows (LDAP... etc.)" about any and every(!) question that you may have. In other words, you do not avail yourself of any "home-grown logic." Instead, you inquire of The Authority and implicitly trust its verdict.

Uh huh ... it's quite a mental-switch from the "wild-and-wooly Internet."


(9) Did you fell in the punctuation well as a child? :) I've read it three times and I am still lost at what point you are trying to make. But if you are saying "Sometimes you do not need form based authentication" then you are right. But considering we are discussing when we do need it, I dont see why this is very important to note? - Hugo Delsing
(2) My point is that the world outside a corporation is entirely different from the world inside. If you are building an app that is accessible to the "wooly wide web," and for general consumption by the public, then you have no choice but to roll your own authentication and authorization methods. But, inside a corporation, where the only way to get there is to be there or to use VPN, then it is very likely that the application will not have – must not have – "its own" methods for doing these things. The app must use these methods instead, to provide consistent, centralized management. - Mike Robinson
(2) Even intranets require a minimum amount of security in the building. Sales has confidential profit and loss numbers, while engineering has confidential intellectual property. Many companies restrict data across departmental or divisional lines. - Sablefoste
10
[+10] [2016-08-10 13:27:44] jwilleke

Use OpenID Connect [1] or User-Managed Access [2].

As nothing is more efficient than not doing it at all.

[1] http://openid.net/connect/
[2] https://kantarainitiative.org/confluence/display/uma/Home

11