21.2. Hashing Passwords

Simple authentication allows users to log in to a site with a username and password. To implement authentication, we need to be able to verify that a user’s password is correct. However, you should NEVER store passwords in a database. Seriously. Don’t do it. Ever.

Storing passwords in a database is incredibly insecure. A hacker may break into the database, gaining access to every user account. Or a deviant employee with access to the database may do the same.

Fortunately, it is possible for us to implement simple authentication without storing passwords, by using a technique called password hashing. Password hashing is a particular type of encryption that we will explore throughout the rest of this section.

21.2.1. Two-Way Encryption

When you think of encryption, you might think of WWII or the Cold War, when the militaries of various countries exchanged secret messages that were unreadable by their enemies. This type of encryption is two-way encryption. A message is encoded with a key before being sent, and when it is received, it is decoded by another key.

A two-way encryption flow. A key is used to encrypt sensitive data. After being transmitted, another key decrypts it.

Two-way encryption (Created by Munkhzaya Ganbold, licensed via CC BY-SA)

21.2.2. One-Way Encryption

One-way encryption, also known as hashing, encodes a message in a way that makes it nearly impossible to decode. An algorithm implementing a one-way encryption algorithm is a cryptographic hash function (we’ll usually leave off “cryptographic”). For a specific message, the output of a hash function is a hash.

The hashing process is essentially just the top half of the diagram above. While it might not be clear yet, hashing allows us to securely store passwords and still be able to validate a user’s submitted password. Precisely how this is done will be explored in a moment. The diagram below shows the general flow of how hashing works to secure the site.

Diagram showing how encoded passwords are compared to a submitted password to authenticate a user.

Diagram showing how encoded passwords are compared to a submitted password to authenticate a user.

It can be difficult to write a good hash function, so thankfully you will never have to. Over the years, some very smart people have created some very good hash functions. However, it is important to understand how the properties of a hash function allow for secure password storage. For our purposes, it is only important that we consider a hash function to have these properties:

  1. Deterministic: If we encode a message with the function at different points in time then we always get the same result.
  2. Hard to reverse: It is infeasible to calculate the input value that yields a given hash.
  3. Hash values are ALMOST unique: If a and b are two different messages, then it is extremely unlikely that they have the same hash value. By “extremely unlikely” we mean that this is something like a one-in-a-trillion likelihood (for example, there are only about 8 billion people on earth).
  4. Similar messages have VERY different hash values: In other words, if we change a message only slightly, the resulting hash value is very different. A function that returned AlXL3M_ws for the message "LaunchCode" and AlXL3M_wt for the message "LaunchCodf" would not be a suitable hash function.

21.2.3. Password Verification With Hashes

Our application will select a particular hash function. Let’s call it h. Then, for a message x, the hash value will be the result of calling h with the argument x. Invoked, this looks like h(x). Rather than store passwords in a database, we will store their hash values.

Consider a fictional user that wants to sign up for our site, Jamie. Jamie likes Taylor Swift, so their desired username is tswizzle_fan and their desired password is lover1989 (not a great password choice, by the way, but we’ve seen worse).

When Jamie registers for an account on our site, we will call our hash function with their password:

string hash = h("lover1989"); // returns "bd62d7e13ef465fa705f30de198ba0cb"

Then we store Jamie’s username along with her hashed password in our user table.

username passwordHash
tswizzle_fan bd62d7e13ef465fa705f30de198ba0cb

This is very secure. Even if somebody breaks into the database and finds Jamie’s info, they will not be able to log in. Since we used a hash function, the hacker will have a very hard time turning the hash into a password (see property 2).

We can still authenticate Jamie, however. When they come to our site to log in, they will submit a username and password. Let’s call the password value submittedPassword. Some basic logic will allow us to determine, with an extremely high rate of probability, whether or not the pair is valid.

To check Jamie’s username/password pair, we could do something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// fetches Jamie's user object from the database
User user = GetUserByUsername("tswizzle_fan");

// gets the hash value stored on their object
string passwordHash = user.PasswordHash;

// hashes the submitted password
string submittedHash = h(submittedPassword);

if (passwordHash == submittedHash)
{
   // the hashes are the same, the passwords can be assumed to be the same
}
else
{
   // the hashes are different, so the passwords are definitely different
}

The conditional compares the values of the hash stored in the database with the hash generated from the submitted password. By property 1, we know that if the hash values are different, then there is no way the passwords are the same. By property 3, we can safely assume that the passwords are the same.

Note

When using Identity, the library handles hashing passwords for newly registered users and comparing hashes when logging in a user. The example above is just an example and meant to illustrate what is going on under the hood of Identity.

21.2.3.1. Hashing Isn’t Perfect

Using hash functions to process passwords is not a cure-all. One vulnerability is the possibility for collisions. A collision occurs when two different messages have the same hash value. By property 3, this is supposed to be rare. However, if a collision is found for a given hash function, then it may be possible to create an algorithm to generate collisions. In other words, given a specific hash value, the algorithm could generate a string with the same hash value.

The once-popular MD5 and SHA1 hash algorithms quickly become obsolete (for cryptographic purposes, at least) once collisions were found.

Most hashing algorithms become more vulnerable as global computing power increases. If a hacker breaks into a database, they will obtain the hashes of all of its users’ passwords. Since only a small handful of hash functions are commonly used, they might simply try millions of strings with each of the more popular hash functions and wait until they find a match.

Note

The widespread use of brute force attacks is why it is always a bad idea to use a password that:

  • is ranked as one of the most commonly used passwords
  • utilizes publicly accessible information about you, such as birth date or address
  • uses common words from the dictionary

When trying to crack a password hash using brute force, these are the first items a hacker will attempt to use.

21.2.4. Which Hash Function Does Identity Use?

Identity contains its own password hasher called IdentityV3. When a new user is added to the database, their inputted password is passed to IdentityV3. IdentityV3 is not its own hashing algorithm, but it uses PBKDF2 for hashing.

IdentityV3 is named after the third version of Identity and is the default password hasher. IdentityV2 corresponds to the second version of Identity and may still be in use in some code bases. As a result, developers can still specify that IdentityV2 needs to be used in the configuration options.

21.2.5. Check Your Understanding

Question

True/False: One-way encryption does not involve decryption.

  1. True
  2. False

Question

Which of the following best describes hashing?

  1. Hashing is the process of encrypting plaintext so that it is very difficult to obtain the original message.
  2. Hashing is more secure than two-way encryption but less useful because it does not decode an encoded message.
  3. Hashing cannot confirm that two passwords are the same because the original values are never saved.
  4. Hashing smashing.