Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4256: RFC 9420 MLS mode Matrix #4256

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dklimpel
Copy link
Contributor

@dklimpel dklimpel commented Jan 24, 2025

@ara4n ara4n added the proposal A matrix spec change proposal label Jan 24, 2025
@ara4n ara4n changed the title MSC: RFC 9420 MLS mode Matrix MSC4256: RFC 9420 MLS mode Matrix Jan 24, 2025
@tulir tulir added e2e client-server Client-Server API kind:core MSC which is critical to the protocol's success labels Jan 24, 2025
@turt2live turt2live added the implementation-needs-checking The MSC has an implementation, but the SCT has not yet checked it. label Jan 25, 2025

![Invite flow][invite-flow]

## Potential issues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about display name and avatar url of members? Are they stored in the MLS group context too? And if so, how to prevent malicious users from changing them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, there will be a section on this here.

7. If the `powers` has entries not in `servers`, reject
8. If the `powers` start not with exactly the same entries in the same order as the subset of
entries in the previous `powers` above the `origin` , reject
9. If the `can_propose` has entries not in `servers`, reject
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we need to enforce that entries from powers also appear in can_propose to enable the process described in "Commit handling with multiple servers" below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically that is optional, since events can either be sent as a proposed commit or directly. Ideally every server can propose though, although the first power server really doesn't need to.

Currently redactions are not supported in the outer protocol layer. This has the benefit of making
redactions invisible to the server, but also prevents any server side aggregation. This also means
content can’t be deleted. As we require all events to be encrypted, this might not be a major
problem, but policy makers might disagree. Redactions could be supported in various ways and we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redactions are not only relevant in the context of data privacy but also for moderation. Not being able to delete malicious or illegal content seems like a significant problem. 😕

Copy link
Member

@uhoreg uhoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only read through part of this, but here are some (mostly nitpicky) comments so far. Generally, there's inconsistency between using "mls" and "MLS". It would be nice to use "MLS" everywhere.

encrypted events with only an `algorithm` and `ciphertext` field. It contains the current MLS
commit as a MLS private message.

The MLS commit contains a CBOR encoded field in its unencrypted authenticated data of the MLS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give some rationale here for why you're using CBOR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this CBOR-encoded data the only thing in the authenticated data, or are there other things that are in there alongside it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON would have some overhead when putting it directly into the bytes here, so CBOR seemed like the more natural fit, but it isn't really a requirement. We can change that.

Currently there are no other fields in the authenticated data. There is a way to structure extensions using authenticated data, that we aren't using here yet, which could be more appropriate? But so far we decided to go with the simplest approach.

4. Considering the events auth events:
1. If there is not exactly one `m.room.create` event (with implicitly an empty state key), reject
2. If the event is not the `m.room.create` event (we never get here) or an `m.mls.commit` event
with epoch == 0 (we do get here), then if there is no `m.mls.commit` event in the auth events,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anywhere in this MSC that mentions where the epoch comes from. I think it may be helpful to mention something about that somewhere, since this MSC will be read by people not completely familiar with MLS. (It comes from the ciphertext, which despite being called "ciphertext", contains some cleartext bits, including the epoch. Which means that servers will be expected to be able to parse the MLS ciphertext, at least enough to read the epoch, group ID, and anything else that's needed.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly what they said and we should put that somewhere

6. If the `powers` has entries not in `servers`, reject
7. If the `can_propose` has entries not in `servers`, reject
8. Otherwise, allow
11. If the epoch of the `ciphertext` is not the epoch of the current `m.mls.commit` event + 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the epoch of the ciphertext be equal to the epoch of the current m.mls.commit event (which I assume means the m.mls.commit event that is referenced in the event's auth_events)? Otherwise you're encrypting with an epoch that hasn't been created yet?


State resolution might result in some clients losing access to messages as they threw their private
state away immediately to provide PCS and forward secrecy. In that case the clients dropped by the
state event will have to be sent another welcome to rejoin them into the room.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for clients to ask to be re-added? How do others know that the client needs to be re-added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A client can be re-added by reinviting them. Usually there is no way to know this for certain, but clients can likely guess a client got dropped from the room by looking at the results of state resolution. But that is unreliably to do over federation, as you can't always know what fork a room ended up on on that server.

Having conflicting commits will lead to some undecryptable messages. For short durations that can be
mitigated by keeping the old state around for seconds to minutes. This is the common case when
people invite new users at the same time or do otherwise racing modifications on different servers.
For long durations no automatic recovery is possible and manual communication is necessary. We
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this "manual communication" would entail? Are users going to have to talk to each other out-of-band and poke at the cryptographic state in their clients?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the users must exchange information out of band. This can be by email, face to face or another intact or private room or similar.


As no member events are used in this room version, the direct /state endpoint can not be used to
change membership in a room

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are user's new devices/deleted devices handled? Who is in charge of adding/removing them from the group? How do people leave the group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can add their own devices as part of the verification process during login or other users might propose adding them, when they notice the new device and can verify it.

For leaving users can propose a self removal (if that extension is enabled) or ask someone else to remove them from the group via normal communication in the room.


### Room state

Currently all state is stored in one GroupContext extension. This has obvious size limitations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I believe that this means that having the state in the GroupContext means that the entire state will be encoded in every single commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that's a known issue and can be resolved with MLS extensions. Either a custom extension that only sends a diff, or by using mechanisms from the extensions draft.

However if there is no state change, no proposal for that needs to be sent, so it also isn't included in the commit.


There are several options for how a client can send a new commit into a room.

* Users on the first server listed in the "powers" list can send commit state events directly. They
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if multiple people on that first server send a commit state event at the same time? Does he server just select one to accept and return an error to the others? If so, what error code does it use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the regular MLS conflict case. The server selects one based on some criteria (e.g. the one that came first) and sends a rejection message to sender of the other commit.
That needs of course a message. But I don't think we defined the error messages/codes for any of the failure cases yet.


![Invite flow][invite-flow]

## Potential issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the possible delay in applying a commit due to "powers" servers being offline is a significant potential issue that should be listed here. For example, if the first server listed in "powers" is offline, then when you invite someone, it will take 5 minutes before they can actually be in the room. (Could they end up being halfway in the room, if the inviter closes their client after they try to invite, but before they see the commit event?)

This, combined with encrypted state, may also cause problems for MatrixRTC, which IIRC uses various state events. If it takes over 5 minutes to set up a call, or for someone to join a call, that could be problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MSC expects that the servers in the powers list are online.

people invite new users at the same time or do otherwise racing modifications on different servers.
For long durations no automatic recovery is possible and manual communication is necessary. We
expect those cases to be rare even in larger federations. Long term server outages usually don’t
include membership modifications at the same time as usually one server is completely offline. Even
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Membership modifications aren't the only reasons for sending commits, though. Group members are expected to periodically send commits (or Updates which are committed by others), which will probably be more frequent than membership changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no requirement to regularly send commits.
However, it's recommended for security. But that's a membership modification as well because updating the own leaf nodes changes the leaf node and such the membership.
But maybe it should be made clear here that any change of leaf nodes in the tree is meant.

* Servers not listed as the first entry in the "powers" list should send a "m.mls.pending_commit"
as a normal message. This will then be applied according to the sorting rules of that server, or
according to the rules in the following section, and a commit state event will be sent by one of the
homeservers listed in the "powers" list. Only users on servers in the "can_propose" list can send
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nearly tripped me the first time I read it, so a note for future readers: in the current existing room versions, if a server wants to send an event to a room, it needs to use a user ID of a room member that is on that server to send the event. However, in rooms covered by this MSC, there is no sender field, so servers can send events on their own, without needing to invoke a user ID.

user to the MLS group. The key packages should be validated and should be cross-signed as configured
in the additional authenticated data of the current MLS commit.
After this the commit in the room is updated. Once this commit is accepted by the delivery service,
the inviting user should send out the welcome message as a to_device message and send an invite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any definition of how the Welcome message is sent as a to-device event (e.g. what event type it uses, what properties the event has). Can you add that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are just normal encrypted events:

{
  "type": "m.room.encrypted",
  "content": {
    "algorithm": "m.mls.v1....",
    "ciphertext": "<encrypted_payload_unpadded_base_64>"
  }
}

user to the MLS group. The key packages should be validated and should be cross-signed as configured
in the additional authenticated data of the current MLS commit.
After this the commit in the room is updated. Once this commit is accepted by the delivery service,
the inviting user should send out the welcome message as a to_device message and send an invite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the impact if the invite was not sent, for example due to a buggy or malicious client, or due to a network error? It seems like the MLS membership list and Network membership list would get out of sync, which could be problematic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly, is there any risk of a buggy client sending an invite without having sent a welcome message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you get a welcome message but no invite, you can just ignore it. It is like getting a megolm key for a room you are not in. If a user sends out an invite without a welcome, then you need to talk to them and fix the bug or put them on your ignore list.

The invited user will then receive the invite and welcome message. If the user wants to join the
room, the client can send a request to the /join endpoint.

As no member events are used in this room version, the direct /state endpoint can not be used to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that member events are not used, how does network membership (invites, joins, etc) propagate between the homeservers that are participating in the room?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the local server needs to know which of its users are in the room. Invites and joins are handled using the existing federation APIs for invites and joins, but other servers never get told about those users, they only know which servers are participating.

"powers" list.

Servers have a 5 minute conflict window to apply `m.mls.proposed_commit` they receive. The
highest power server can apply the commit in the first 3 minutes after the `origin_server_ts` of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not subject to manipulation by a server wishing to put itself in charge of applying commits, by falsifying origin_server_ts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case the higher power server could still send a conflicting event, which would win during state.

Clients can know their `m.mls.proposed_commit` failed to be applied if they haven’t received a
`m.mls.commit` event matching their `m.mls.proposed_commit` after "len(powers)*5" minutes.

State resolution might result in some clients losing access to messages as they threw their private
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this section is that a netsplit of any non-trivial duration can lead to different commits being applied on each side of the split; when the split is eventually resolved, this will result in one side "losing" and any clients on the losing side will be effectively excluded from the room, since they will have discarded the necessary MLS state.

If that's correct, it feels akin to the "state reset" problems that current Matrix can suffer from, but much worse.

This seems very dangerous to me. IMHO it would be better not to allow MLS commits outside the current highest power server at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is expected. If you want to prevent such conflicts, you should only put a single server into the power list, which achieves the same result as only allowing commits from the highest power server. The option to add more servers in the powers list only exists for the cases where people expect servers to go offline eventually (like in smaller communities, where a friend decides to quit Matrix).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API e2e implementation-needs-checking The MSC has an implementation, but the SCT has not yet checked it. kind:core MSC which is critical to the protocol's success proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants