The zk in zkTLS

Madhavan Malolan

Aug 30, 2024

zkTLSZK

Dan recently wrote a great piece about Web Proofs as a non-technical primer. Recommend! But, the name that caught on in social media is "zkTLS". Smart folks are pissed because that name is not correct - and rightly so.

From Dan's blog:

Web Proofs use TLS Notary and privacy-preserving ZKPs to demonstrate authenticity of data from any server.

But, what does privacy preserving mean? What's ZK about it?

The top of mind answers would be -

If you want to prove your bank balance, you can prove that it is greater than $10,000 without needing to reveal the exact bank balance.
zkTLS usually extracts data from web pages. That means, the page might contain more information than you want to reveal. For example, your bank balance webpage might have more information like your full name, account number, physical address and so on. To protect privacy of the user, zkTLS products use zk proofs to extract only parts of the page that actually contain the requied information. This is also referred to as selective reveal.

However, these zk proofs and privacy preservation features are just nice to haves. Not must haves.

The zk in zkTLS is not for privacy preservation, it is for user security and data integrity. To understand this, we need to understand the tech a little deeper.

Technical Background

When opening any website on your web browser, the browser performs what's called a TLS Handshake. A TLS Handshake is a ceremony that happens between your browser and the website you are trying to open to exchange what's called a shared key. For this Handshake, both the browser and the website must choose their own respective private, secret key.

This secret key should be kept secret by both the browser and the website. They should not reveal this secret key to anyone, including each other. If this secret key is known to a hacker, they can decrypt all see all the requests and responses that are sent - in past, present and future. This secret key is used only during the Handshake to come up with a shared key that is known to only the website and the browser; and no hacker. This secret key is destroyed immediately after the Handshake is complete. It's never used thereafter. It is the shared key that is used to encrypt and decrypt the requests your browser is making and the response the website sends. Once the Handshake is complete, all the data between your browser and the website is secure. No hacker can read the data that's being exchanged. That's why you can be confident in putting in your credit card details on the internet.

Unfortunately, for us in crypto, the TLS Handshake was designed to exchange a symmetric key. That means, you get security but no provability. That is, if there's a response that came from a website, you cannot prove it to a friend that this is indeed the response that came from the website.

One way to prove to the friend that certain data came from the server is to say "hey look this is the encrypted that came from the server, and look i have the key to decrypt it - you can try decrypting it yourself". There is some merit to that statement. It is impossible (under certain constraints) to generate a new key to decrypt a particular encrypted response. If you try to create a new key, you'll likely get gibberish when you actually decrypt using this new key. So if you have a key to decrypt the data, it is safe to say it was the key that was derived from the TLS Handshake - so, your friend can be sure that the data that you are showing them is indeed what was in the response sent by the server. But we still have one problem. The problem is that the key derived from the TLS Handshake is a symmetric key. Meaning, you (your browser) and the website use the same key for both encryption and decryption of the requests and responses. So, when you showed your friend that you have the key to decrypt an encrypted blob - it doesn't tell your friend anything at all. For all you know, you could have encrypted the blob yourself and just decrypting it in front of your friend. You could have encrypted whatever you want, irrespective of the data that was sent from the website. There is no guarantee that the blob that is being decrypted was sent by a website. All the decryption melodrama for nothing.

So, how can we actually prove to a friend about what data was received from the website? zkTLS.

Three types of zkTLS

There are really three ways of doing zkTLS (at the cost of using a misnomer).

1. TEE Based

Trusted execution environments (TEE) can be used to make https requests. TEEs are tamper proof CPUs. Not even the owner of the machine can alter the computation. So you can get guarantees of the request that was made and the response that was received. All these guarantees are provided by the TEE. So, you basically give your encrypted user name and password to a service provider that owns a TEE machine. The service provider cannot decrypt it. It can be decrypted only by the service provider's TEE. The TEE would decrypt your username and password, login on your behalf and report the data that was seen on the response. I use username-password as an abstraction, but in practice (e.g. Teleport) we see Auth tokens being sent to the TEE instead of the username password itself. Auth tokens have the benefit that even if the TEE is compromised, the auth tokens can be deactivated.

TEE based zkTLS are efficient implementations and add almost no computation or networking overhead, as long as you are willing to accept the security risk associated with the TEE itself. The design of such a system was first published in a research paper called Town Crier in 2016, and a notable team building using this primitive is Clique.

So, when proving to a friend that a certain response came from a website you can also provide the signature from the TEE that the correct TLS Handshake was conducted and the claimed requests and responses are indeed what was exchanged. If your friend trusts that you did not go the distance to hack a TEE in order to generate this signature, they will believe your claim.

2. MPC Based

MPC stands for multi party computation.

As we saw, the TLS Handshake is a way to generate a key between the website and the browser. At the end, both the browser and the website have the symmetric key. The MPC Based zkTLS changes this dynamic. What if the browser never has the key?

During the TLS Handshake, the website does it's part of the ceremony without any changes. But on the browser's end, there's a modification. Instead of the browser responding to the ceremony immediately, it consults a network of nodes. This network of nodes participate in another ceremony (the MPC) to establish a multiparty secret key, instead of the browser picking it's own random secret key. This multiparty secret key then is used to do the TLS Handshake on behalf of the browser. So, when the shared key is created after the Handshake, no single node or browser knows the shared key. In order to encrypt and decrypt requests and responses, all the nodes and the browser must cooperate. To encrypt a request, a browser would use it's part of the shared key to encrypt the data, send it to node 1. Node 1 would add it's encryption using it's part of the shared key and send it to node 2 and so on. When all the nodes have done their part, the request can be sent to the website. Upon receipt of the response, the browser can't decrypt it by itself either. It must do its part of the decryption and send it to the Node 1, then Node 2 ... you know the drill.

So, now when proving to a friend that a certain response came from a website, you can also claim "Here is the response that came from the server, and here is an attestation from all my MPC nodes that we decrypted the data from the website together - and this is what the decryption is". Your friend will believe you if they believe that it's unlikely enough for you to have gone to each of the MPC node and bribed them.

This was first published in a research paper called DECO in 2020. It was soon realized that MPC is very expensive to implement in practice. So it was replaced by a 2PC solution, involving the browser and one node instead of multiple (using garbled circuits, a particular form of MPC). A notable implementation being TLS Notary. This model has good security guarantees but lot of networking communication overhead to setup the MPC.

3. Proxy Model

HTTPS proxies are commonplace on the internet. Their job is to forward https traffic, without being able to read what traffic they are forwarding.

The proxy model of zkTLS uses the proxy feature of browsers. When opening a website, instead of sending the request directly to the website - it sends it to the website via the https proxy. By default, the website will also respond to the server via this proxy. So the proxy will see all the encrypted requests and response that is being exchanged between the browser and the website.

The proxy then provides an attestation to the said encrypted requests and responses along with the information of whether it was a request or a response. That is, an attestation whether the encrypted data sent by the browser or the website. The browser then creates a zkproof of the decryption of the response. The zk proof is equivalent of saying "I know a shared key that decrypts this encrypted data and here is the decryption. But I will not tell you the shared key itself." This relies on the fact that it is impractical to generate a new key that decrypts the data to anything other than gibberish. That's why knowing you were able to decrypt the data is good enough, the knowledge of the key itself is not needed. Because, if you also reveal the key, the other person will also be able to decrypt all the other messages you had sent earlier - including the username & password.

So, when proving to a friend that a certain response came from a website you can say "Here's the encrypted data and here's a proof from the proxy that it was sent by the website, and here's the zkproof that I have the shared key that decrypts it, and here is that decryption". Again, your friend will believe you if they believe it is unlikely for you to have bribed the proxy to incorrectly say that certain encrypted data came from a website, when it actually came from you. This is efficient in both compute and networking as long as you are willing to accept the risk of a very hard to pull off attack which requires physical access to the proxy's machine.

This model was published in a research paper called "Proxying is Enough" in 2024. Reclaim Protocol is an implementation of this model, however the implementation predates the paper itself - in 2022.

OK - so what's zk about all this?

In all of these examples, we don't really talk about privacy. We're more interested in integrity of the proof. Meaning if you are telling your friend that the response came from the website, how can they trust you? Hiding parts of the response for privacy reasons is only the next step. But all is moot, if there is no integrity guarantee.

In all of these models, different cryptographic primitives are used. But all of them are used to protect the shared key without compromising the integrity of the claim about the information.

In the TEE model, the decryption is done inside the TEE; the tls shared key is generated inside the TEE and never accessible to anyone.
In the MPC or 2PC model, the decryption is done by nodes/node and the browser in cooperation; neither the nodes nor the browser even have access to the tls shared key to be able to share it in the first place
In the Proxy model, you have access to the decryption key; but you needn't disclose the key, at the same time you cannot lie about it

A zkproof may be augmented on top of all of these to hide parts of the page that you don't want to reveal to the friend. But as we saw, it's a nice to have, not a must have. The must have is the integrity. And that is why there is debate about calling it zkTLS.

TEE models don't use zk proofs, they use TEE guarantees
MPC/2PC models don't use zk proofs, they use MPC to do the Handshake
Proxying model does use zk proofs, but they use it after the TLS Handshake is done

Proxying model is the only one that uses zk for integrity. But none of the models use zk at the TLS Handshake level to mandate the name zkTLS.