Privacy 2019: Fixing a 16 year-old privacy problem in TLS with ESNI

Using VPC Service Controls and the Cloud Storage Transfer Service to move data from S3 to Cloud Storage
April 29, 2019
Red Hat Drives Operational Simplicity and Modern System Support with Latest Version of Red Hat Virtualization
April 30, 2019
Using VPC Service Controls and the Cloud Storage Transfer Service to move data from S3 to Cloud Storage
April 29, 2019
Red Hat Drives Operational Simplicity and Modern System Support with Latest Version of Red Hat Virtualization
April 30, 2019

This post is the second post in a series covering privacy, anonymity and security on the internet in recent times, with a focus on real issues affecting people in the real world. Censorship and pervasive state-sponsored surveillance is a daily reality for hundreds of millions of people around the world.

This post continues from where we left off in our previous post. This post discusses a suggested solution to the aforementioned problems with SNI.

Fixing a 16 year-old privacy problem in TLS with ESNI

SNI – What Is It Good For?

In our previous post, we discussed Server Name Indication (SNI), an extension to SSL/TLS that dates back to 2003 and that has since become mandatory (TLS 1.3).

So what problem does SNI actually solve and why is it necessary?

SNI allows the server handling SSL/TLS termination to know which host you’re connecting to; this is important because the server needs to hand you a certificate with a Subject Name (SN)/SAN that is valid for the hostname being accessed.

This becomes a problem when you have more than one host on one IP address. This is very common today and is also crucial to help save addresses in the dwindling IPv4 address space.

Today with cloud providers such as Google, Amazon, and edge network providers such as Cloudflare or Akamai, a single IP can serve as a load-balancer or front-facing server for an arbitrarily large number of hosts. All of the services mentioned offer some sort of load-balancing and TLS termination service. All-in-all, SNI is an important and a necessary feature of the internet as it is today.

As you probably know from our last post, SNI unfortunately has also been used and abused for censorship and monitoring since its inception, and the most commonly known way of side-stepping that with domain fronting has been closed by most cloud providers.

image of tls and sni

This all makes SNI a necessity despite the major info-leak it causes. Luckily though, a fix is on the way and being shaped in the IETF draft we’ll be discussing for the rest of this post: Encrypted Server Name Indication for TLS 1.3

So How Does eSNI Fix SNI?

The following explanation is based upon the latest working draft [4f3ce56], The latest published draft currently is #3; the main differences between these versions so far revolve around the DNS lookup and resolution of ESNIKeys. Due to #2 being implemented and already deployed by Cloudflare and Firefox Nightly, we will attempt to address differences.

To encrypt our SNI we need to first obtain an encryption key, which is done by querying an ESNI-type record via DNS. In previous draft versions (up to draft #3), this was done by querying a TXT record called _esni. For example, to query ESNI information for sentinelone.com one would query the TXT record _esni.sentinelone.com.

The current draft asks IANA to update the RR Registry with a with a new record type ESNI. It’s not a huge difference, but it is important to note since there are deployed implementations (Cloudflare & Firefox) that use the TXT record version.

image of Query ESNI Via DNS Over TLS

The data retrieved by DNS is a struct called ESNIKeys.

The most important information in ESNIKeys is a list of named {EC,?}DHE groups and their matching public key share components. These are used to derive a symmetric key, and that key is used to encrypt and authenticate (AEAD) the associated SNI data.

image of Creating Encrypted SNI

To enable the server to also be able to decrypt and verify the data, the client sends his key share entry along with the encrypted SNI. Using this, the server is able to derive the same symmetric key using {EC,?}DHE.

Inside of the encrypted SNI, a random nonce is also included. The response the server sends to ClientHello (ServerHello) includes an (encrypted) list of the accepted extensions. If ESNI was accepted(esni_accept), the server will include the random nonce originally sent in the ClientHello; the client must verify it is indeed the same nonce.

The server can also request the client to switch ESNI keys in this phase. This is useful to prevent an ESNI key-rotation from causing a denial-of-service.

This (in a nutshell) is how ESNI works in the latest draft. Some details obviously had to be left out to make this summary readable, but if you’re interested in learning more, the full draft is available on Github.

image of TLS with ESNI

DNS Considerations

Evil DNS

A compromised DNS server compromises the anonymity that ESNI can offer if the ESNI keys can be switched by an attacker. If attackers gain control of the A/AAAA record, they can use this to know which website the user attempted to access since they will be able to decrypt the encrypted_server_name during ClientHello.

If an attacker has control of a major DNS resolver such as an ISP and can perform this action on a large number of domains, this might be effective. In this case, though, the attacker most likely also controls the A/AAAA records and has already won the battle since the attacker can send users to an IP address that will mark them as having accessed some domain.

Another potential issue with this attack is it may cause a denial of service against those sites; assuming correct implementation of the protocol, a failed decryption of the ESNI should result in a connection abort with “decrypt_error” (this is marked as MUST in the draft).

tl;dr — ESNI is not TOFU. You’re trusting keys given to you by DNS.

Replay Attacks

Protection against replay attacks is partial.

The TLS Client Hello packet contains a 32-byte random. This is used to derive the Pre-master secret, the server on response sends its own random and they together derive this key.

This does provide protection against a full TLS session replay as the server will respond with a different random, which will cause a different pre-master key to be derived and the rest of the recorded session will be useless.

This, however, doesn’t prevent an attacker from repeating the same ClientHello and seeing if there is a response. That makes it possible to presence-check whether a server is still being served by a service. An example of this is shown in the image below, which shows a ClientHello with a valid ESNI being replayed. All these connections received the Server Hello, meaning the SNI is still being served by the host.

This means if you recorded a ClientHello to a hidden service, you can presence-check if it is still being served at least until a public key rotation occurs in ESNIKeys.

This failure can be differentiated from the service being gone due to a decryption failure causing a TLS Alert “decryption_error”

image of tls alert

There is currently a suggestion on Github to add a timestamp inside the ESNI, limiting the length of its validity.

Conclusion

As we wrap up the coverage of SNI that we began in our previous post, we hope you now have a good understanding of what SNI actually is, why we need it and why we need to fix it formally. We saw the rise and demise of domain fronting, which attempted to both sidestep the SNI issue and use it as leverage for cloaking in-risk users (and some malicious activities).

Since this post covers a protocol that is still in-flux, upcoming changes may alter some aspects of the protocol mentioned here. Once the final document is released, we will review it and if there are major changes we will update this post as necessary.

In Our Next Episode…

We will be discussing one of the core aspects of trust on the internet – Certificates. We will be looking through the lens of multiple technologies such as HSTS, HPKP and Certificate transparency and how it attempts to solve the rogue CA problem and what are some of the downsides and strings attached to it. Stay tuned!


Like this article? Follow us on LinkedIn, Twitter, YouTube or Facebook to see the content we post.

Read more about Cyber Security

Leave a Reply

Your email address will not be published. Required fields are marked *