Investigation into Message Layer Security (MLS)

This article investigates Message Layer Security (MLS), the IETF standard protocol (RFC 9420) for Signal-style end-to-end encryption. Through a practical analysis of OpenMLS (the Rust reference implementation) and a demonstration CLI chat application (mls-chat), the post reveals the substantial gap between protocol specification and production deployment.

Introduction

MLS (Message Layer Security) is a relatively recent project to elevate the ideas from the Signal protocol about end-to-end encryption (E2EE) to the status of internet standard. The effort is driven by a consortium that includes Facebook (WhatsApp), Wire, Mozilla, Google, Twitter and others.

Since its publication in 2023 as RFC 9420, multiple organizations have declared a commitment to migrate their technology to MLS, including:

Interestingly, Signal’s protocol was already open source and used in multiple applications (e.g. Signal, WhatsApp). The reason why MLS was developed and published as a separate protocol is due to a specific shortcoming in Signal’s architecture: adding/removing group members in Signal have a communication complexity that’s O(N) in the number of members already in the group. In other words, it does not scale well to very large groups. MLS addresses this with its “ratchet tree” innovation (see below).

Main protocol concepts

Like Signal, MLS defines a client-server protocol where the server has zero visibility over the content of messages (end-to-end encryption). The server’s role is twofold:

durable message distribution and delivery on a group channel: this is the most important. The protocol is optimized for the case where the server guarantees that messages sent by one client are delivered in-order to other clients within the same group, without loss. This is because the state of a MLS group on each client is updated and verified through the specific sequence of MLS messages.

The protocol does not becomes cryptographically weaker if these guarantees don’t hold (e.g. data won’t leak out of groups) but then clients must maintain their own message reordering buffer and they cannot catch up on group conversations if they are temporarily offline.
user directory and user key distribution for invites (optional): when a user creates a MLS group and wishes to invite another user to the group, they must know a key package for that other user. This could be copied manually (e.g. person-to-person on a USB key) but the common solution is for the server to provide a directory service.

Using the directory service, clients can look up key packages from another identifier, often a phone number, username or e-mail address. Clients must also register their own key packages to the directory.

To prevent identity fraud, the directory is responsible for user authentication — ensuring that the lookup key (phone number, username etc) is actually in control of the client.

RFC 9420 designates as “MLS delivery service” (DS) a server that provides both features together.

Adding/removing group members

Group membership changes are handled as follows:

clients are responsible for refilling key packages in the DS directory over time; this maintains the invariant that each known user has at least a few key packages available server-side, for invites by other users.

(Note: this is needed because key packages must only be used once; each invitation “consumes” a key package.)
if client A wants to invite client B to a group, it first requires a key package for client B from the DS. Using the key package, it crafts two messages:
- a combination of “Add + Commit”, which informs all the pre-existing members of the group that client B is invited. This is broadcasted to the group.
- a “Welcome” message, sent to client B directly, which informs client B that the group exists, that they are invited to it, and contains an initial set of keys to exchange messages with the group.

There is also a variant of this protocol that allows a client to create an “open invite” for a user that is not yet known to the directory. In that case, client A prepares a secret “GroupInfo record” with an invitation token for the group. This can be distributed by the directory e.g. through a protocol agreement between the server and the client that client A selects the external identifier beforehand (e.g. a phone number) and the record can only be seen by a future client which can authenticate itself with the same identifier. When the future client finally registers, they can use the invitation token to publish a special “Commit” message that adds themselves to the group and “consumes” the special record.

Ratchet trees

The MLS ratchet tree is a binary tree of public keys maintained on each client for each group they’re a member of. The branches are updated when members are added or removed from the group. This is the main MLS innovation over Signal’s “sender keys” that require distributing new keys to every member individually when updating memberships, which scales quadratically with the group’s size.

This structure allows the group to evolve securely: members can be added, removed, or rotated out without exposing past or future messages. It enforces forward secrecy (old messages stay safe even if someone’s key is later compromised), post-compromise security (fresh updates heal the group after a breach), and exclusion (removed members can’t decrypt anything new).

The tree shape also brings scalability: rekeying costs grow logarithmically with the group size, not linearly, so the protocol stays efficient even as the group becomes large (a 1000-person group needs only ~10 encryption operations instead of 999).

Client and server responsibilities

On the surface, MLS looks and feels like a “simple” protocol:

the server is blind to the content of messages: its main role is to fan out (encrypted) messages from each client to the other group members. There should be message channels that clients can post to and subscribe to (e.g. HTTP POST and/or WebSocket).
the directory part of the server maps user identifiers to a pool of key packges.

Client side:

the client creates and “tops up” key packages to the server’s directory.
clients create groups (create a MLS ratchet tree), and inform the server of the creation of a new group channel.
client subscribe to the server group channels for the MLS groups they are a member of.
clients update the MLS tree for incoming message according to the protocol rules.
clients also manage invitations for new members: for the user that triggers the invitation (craft Add/Commit/Welcome messages) and for other existing members of the group (process incoming Add/Commit messages).
clients also process incoming Welcome messages to discover their membership to new groups.

So far, so good. However, there are multiple areas with considerable complexity under the hood:

the server should ensure the message queues are durable. For example, messages should not be lost if the server fails temporarily, and temporarily offline clients should see what they missed while offline. Typically, this is done with a combination of:
- local transactional storage on the server,
- cross-server data replication (for failovers),
- a server-side message buffer with a “cursor” and acknowledgement system, so clients can request to “catch up” on the message queue after a disconnect.
to prevent DoS attacks, servers should enforce message rate limiting per group.
the server must ensure that key packages are reliably deleted when they are consumed, and never used more than once.
if the server offers a directory using personal identifiers (e.g phone numbers or email addresses) they must protect this metadata as a sensitive asset.

Client side:

clients must regularly (typically, using a periodic background task) top-up their key packages on the directory, to ensure they remain available for invites by other clients.
clients must persist the state of the MLS group (mainly, the ratchet tree) so they can recover their membership across restarts of the client program.
if a directory is in use, clients must authenticate themselves to the directory using a pre-agreed standard (e.g. OAuth, TOTP, etc.) prior to publishing their key packages.

Together, these components / features form the proverbial “20% of the surface area that take 80% of the effort to complete”.

OpenMLS - a pre-defined Rust library

OpenMLS is a Rust project that implements the MLS protocol. It was originally published by Phoenix R&D and Cryspen and, at the time of this writing, is considered close to a “reference implementation.”

The building blocks that are ready for use include (again, at the time of this writing):

packages (rust “crates”) for maintaining the MLS group data structure client-side. This includes generating messages, processing incoming messages, maintaining client keys, etc. This has a plug-in dependency on cryptographic primitives (you can provide your own) but default implementations are provided.
packages for persisting the client state. This is architected around a “Storage provider” interface: you can define your own, or use the example SQLite/SQLX storage provider provided by the project.

In addition, the project’s repository contains stubs and starting blocks to build your own server, but this part is not really ready for use.

To summarize, here is how far you can rely on OpenMLS:

Feature / area	Included in OpenMLS?	Can override / define your own?
Server: reliable fanout	no	required!
Server: key package directory	no	required!
Server: authentication	no	required!
Client: authentication	no	required!
Client: key package pool management	no	required!
Client: base crypto	yes	yes - optional
Client: MLS message creation	yes
Client: MLS update from incoming messages	yes
Client: send/receive encrypted messages	yes
Client: MLS state persistence	yes	yes - optional

mls-chat: an example client/server chat based on OpenMLS

I have implemented an example application to demonstrate how to use OpenMLS: mls-chat.

This project provides a simple client/server group chat CLI application. It does support:

end-to-end encrypted chats (fully delegating that logic to OpenMLS).
a basic delivery service including keypackage pool and refills.

Note that it does not support authentication nor offline clients. This is because the goal of the project is to demonstrate OpenMLS only and OpenMLS does not currently aim to include these parts.

References

Raphael Poss is an entrepreneur who occasionally publishes field notes on systems, leadership, and the messy edge between technology and people.

Interested to discuss? Leave your comments below.

Comments

Investigation into Message Layer Security (MLS)

Introduction

Main protocol concepts

Adding/removing group members

Ratchet trees

Client and server responsibilities

OpenMLS - a pre-defined Rust library

mls-chat: an example client/server chat based on OpenMLS

References

Comments

Keep Reading

Reading Time

Published

Category

Tags

Stay in Touch