I re-gave my NDC talk — From SQL to RavenDB: A .NET Team’s Journey — at /dev/summer

# From SQL to RavenDB: A .NET Team’s Journey

I gave a talk at NDC!

# Crypto primer

My next blog post will be on the details of HTTPS, but before we get to that, here’s a quick cryptography primer.

### Alice, Bob, Eve, and Mallory

In cryptography discussions, 4 characters are commonly used: *Alice*, *Bob*, *Eve*, and *Mallory*.

Alice and Bob want to send messages to each other.

However, Alice and Bob are not alone in the world.

There is also Eve, who is capable of eavesdropping messages whilst they are in transport. Because Alice and Bob don’t want Eve to be able to read their messages, they use cryptography to encrypt all messages before they are sent, and to decrypt them when they are received. If Eve eavesdrops a message whilst it is in transport, it will be encrypted, and because Eve doesn’t know how to decrypt it, she will be unable to read it.

And there is also Mallory, who is capable of sending messages with a faked from address. Because Alice and Bob don’t want Mallory to be able to impersonate them, they use cryptography to sign all messages before they are sent, and to verify them when they are received. If Mallory sends a message with a faked from address, then because Mallory doesn’t know how to sign it, it won’t pass verification, so Alice and Bob will know it’s been faked.

### Shared secret

In order to communicate with each other in this way, Alice and Bob need to know how to encrypt, decrypt, sign, and verify messages.

Because Alice and Bob are good software engineers, who believe in separating concerns, they are going to achieve this by using two separate algorithms: an encryption/decryption algorithm and a signature/verification algorithm.

Alice and Bob also believe in code reuse, so they’re not going to write their own algorithms from scratch. Instead, they’re going to choose algorithms that already exist and have been thoroughly peer reviewed.

However, as we’ve seen, Eve mustn’t know how to decrypt messages and Mallory mustn’t know how to sign messages.

Because there are only a small number of suitable algorithms to choose from, it’s possible for Eve and/or Mallory to guess which algorithms Alice and Bob have chosen, and therefore the secrecy can’t be in the algorithm itself.

To achieve the required secrecy, the algorithm takes an input, a *key*, which is a *shared secret* between Alice and Bob, and Eve and Mallory must not know it.

One way for Alice and Bob to establish this shared secret is to meet up in person, far away from Eve and Mallory, and pick a random *pre-shared key* that only Alice and Bob know.

### Symmetric encryption

Alice and Bob also need to choose both an encryption/decryption algorithm as well as a signature/verification algorithm.

To keep things simple, to start with we’re going to use *symmetric encryption*, so called because the secrecy is achieved using a shared secret, which both Alice and Bob know.

There are many symmetric signature/verification algorithms; one example is called *HMAC*.

For symmetric encryption/decryption algorithms, there are two general categories: *stream ciphers* and *block ciphers*.

Stream ciphers, for example *RC4*, typically use a *cryptographically secure pseudorandom number generator* to generate a sequence of random numbers that is as long as the message being sent. To encrypt/decrypt a message, it is then simply XORed with this sequence.

In contrast, block ciphers can only encrypt/decrypt fixed-length blocks. For example, *AES* uses 16 byte blocks.

That said, it is also possible to make block ciphers capable of encrypting/decrypting variable length messages by using one of the many *modes of operation*, for example *CBC*.

Finally, some modes of operation, for example *GCM*, combine both encryption/decryption as well as signature/verification, solving both problems in one algorithm.

### Non-repudiation

In the example above, Alice and Bob were signing and verifying their messages in order to detect fake messages from Mallory. Now imagine that Bob receives a message from Alice that has a valid signature, and he wants to legally prove that Alice did in fact sign it.

The difficulty for the legal proof is that because both Alice and Bob know the shared secret, they are both capable of signing the message, and so there is reasonable doubt as to who signed it — for example Bob could have signed it instead of Alice.

The fix is to remove the shared secret using *asymmetric encryption*, which is so named because it uses a *private key* and a *public key*. These two keys are mathematically connected to each other, but deducing one from the other is designed to be infeasible.

To use asymmetric encryption, Alice creates a public/private key pair. She stores the private key securely so that only she has access to it, and she distributes the public key to everyone else: Bob, Eve, and Mallory.

In asymmetric signature/verification algorithms, for example *DSA*, to sign a message you need to know the private key, whereas to verify the signature you need to know the public key.

Since only Alice knows her private key, and Bob does not, asymmetric signature/verification algorithms remove the reasonable doubt and prove that Alice signed the message. This property is called *non-repudiation*.

### Key exchange

As we’ve seen, the symmetric encryption algorithms above need shared secrets, essentially keys, and so far we’ve been assuming that Alice and Bob met up in person and picked a random pre-shared key that only they know. Whilst great for friends, this is clearly untenable for HTTPS in the real world. I visit a lot of HTTPS websites, and I don’t want to spend my life on aeroplanes just to meet their sysadmins to establish pre-shared keys.

One option would be to outsource all this travel to a *trusted third party*, who would establish the pre-shared keys on my behalf. The problem with this is that every pair of individuals needs a different pre-shared key, so it scales as O(n^2), and also, because the trusted third party knows the pre-shared key, it could conspire with Eve to help her read the messages without detection.

We need another solution to perform *key exchange* and arrive at a shared secret.

One such solution is *Diffie-Hellman key exchange*. However, it is only designed to protect against Eve, and it is not secure against Mallory.

The solution that’s commonly used instead, involves an asymmetric encryption/decryption algorithm, for example *RSA*.

With these algorithms, to encrypt a message you need to know the public key, whereas to decrypt a message you need to know the private key.

Just like block ciphers they can only encrypt/decrypt fixed length messages, for example 245 bytes long. Although you could technically use the block cipher modes of operation to encrypt/decrypt longer messages, this isn’t done in practice because asymmetric encryption algorithms are considerably slower than symmetric encryption algorithms.

In practice, symmetric encryption is used for the message, which can be very long, for example if it’s a video, whereas asymmetric encryption is only used for the *session key*, which is tiny:

- Alice:
- picks a random session key,
- symmetrically encrypts the message using that session key,
- and asymmetrically encrypts that session key using Bob’s public key.

- Bob:
- Using his private key, can asymmetrically decrypt the session key,
- and use that to symmetrically decrypt the message.

Now when Alice wants to communicate with Bob, she only needs to know Bob’s public key. There is still the problem of distributing a mapping between identities and public keys, but this can be left to a trusted third party. It scales as O(n), and there is no way for a trusted third party to covertly help Eve — although it could distribute an incorrect mapping, this has a high risk of being caught.

### Further reading

Wikipedia is great for this kind of stuff; here are some links to get you started.

Alice and Bob (but not Eve and Mallory), have a shared secret, which is the key to a symmetric encryption algorithm. Perhaps, they’re using a stream cipher, e.g. RC4 and its cryptographically secure pseudorandom number generator, or perhaps they’re using a block cipher, e.g. AES, with perhaps CBC or GCM as its mode of operation.

They need to perform key exchange; maybe they:

- met up in person and picked a random pre-shared key
- used Diffie-Hellman key exchange
- used the asymmetric encryption algorithm RSA with its public keys and private keys and a trusted third party to securely send a session key to each other

For signing/verification perhaps they’re using HMAC, or better yet, to get non-repudiation, perhaps they’re using DSA.