Security Issues in Matrix's Olm Library

Kye · on Aug 14, 2024

>> "3 of the 16 clients surveyed use the new vodozemac library. 10 still use libolm, and 3 don’t appear to implement end-to-end encryption at all."

This soesn't bode well for it as an option for secure messaging. Wouldn't be a problem if I didn't see people suggesting it as one.

Arathorn · on Aug 14, 2024

the author literally picked random projects from github tagged as matrix, without considering their prevalence or whether they are actually maintained etc.

if you actually look at % of impacted clients, it’s tiny.

meanwhile, it is very unclear that any sidechannel attack on a libolm based client is practical over the network (which is why we didn’t fix this years ago). After all, the limited primitives are commented on in the readme and https://github.com/matrix-org/olm/issues/3 since day 1.

mjg59 · on Aug 14, 2024

It being unclear that they're exploitable doesn't justify the fact that the reference implementation had known weaknesses that were only documented in the issue tracker and a readme buried several directories deep and not referenced from elsewhere! It also doesn't justify failing to proactively inform any of the clients making use of olm that these issues had been identified, and now anyone who does feel that this is inside their code's threat model has to rush to port to the new implementation after public disclosure.

Edit: but also if you're going to argue that it's unclear that these issues are remotely exploitable, it'd be helpful to discuss why - are there external constraints that ratelimit them such that the number of packets required is infeasible in any length of time? Is all the affected key material ephemeral and rotated before it's realistic to get a large enough sample? Just vibes?

some_furry · on Aug 14, 2024

> the author literally picked random projects from github tagged as matrix, without considering their prevalence or whether they are actually maintained etc.

I was very clear in my methodology: I grabbed everything tagged with that GitHub topic, and filtered out projects that were archived or marked as "old".

> meanwhile, it is very unclear that any sidechannel attack on a libolm based client is practical over the network (which is why we didn’t fix this years ago).

This is not an attitude that inspires confidence.

It's one thing to accidentally ship cryptography code with side-channels. It's another entirely to knowingly do so, and not fix it sooner.

What the fuck.

selfisekai · on Aug 15, 2024

> if you actually look at % of impacted clients, it’s tiny.

it's pretty much any client that has E2EE and is not Element. in my earlier quick look at Alpine Linux repos this included: Fluffychat, Nheko, gomuks, NeoChat, Chatty, weechat-matrix. then i already know that still didn't catch at least Cinny, which also is in Alpine, but includes libolm as wasm.

it's literally all "Featured clients" listed on Matrix.org except Element and Element X.

i'll put it another way. if anything other than New Vector is "tiny" and doesn't matter, is Matrix a "rich ecosystem"?

regalialong · on Aug 14, 2024

I usually keep myself out of HN discussions but some clients's issues are listed at https://github.com/NixOS/nixpkgs/pull/334638 from Nix's side, which includes Cinny, Fluffychat, nheko, which are the 3 non-Element "Featured clients" listed on Matrix.org.

pluto_modadic · on Aug 14, 2024

projects that import the SDK /written by and (un)maintained by the matrix org/

e.g.: https://github.com/matrix-org/matrix-python-sdk (dependents doesn't work on this python project I guess)

or matrix-org/olm for JS

or matrix-org/matrix-js-sdk

or matrix-nio/matrix-nio (based on an old standard written by matrix, that matrix stayed compatible with rather than deprecating)

the inability to address the third issue (a security issue, no less) in 7 years is an impressive world record in incompetent messenger design

MuteXR · on Aug 14, 2024

You SHIPPED CODE THAT YOU KNEW HAD A SIDE CHANNEL????? WHAT?

201984 · on Aug 14, 2024

In general, can timing attacks on a vulnerable implementation be prevented if sonething else guarantees the operation takes constant time?

For example, if I add a sleep after a non-constant-time operation to ensure it always takes 1s, would that prevent timing attacks? At least with this you can use a faster algorithm, and let the OS do useful work with the rest of the time.

thadt · on Aug 14, 2024

Theoretically? Sure. By holding your response to a fixed deadline, you're still using a variable time operation under the hood, but turning it into a fixed time operation, being careful not to leak that timing information to an external observer. So, no problem.

Practically? It's a whole lot harder. Having done exactly this in a non-cryptography related situation (sleeping a timer while trying to get time bins as close as possible to a fixed size for binning events without special hardware), you run into all manner of interesting sources of variability. Using hardware is often far simpler if it's available.

Back in a cryptography context, it is exponentially harder yet because you now have to worry about what other side channels are exposing your timing information. For example, if you're "letting the OS do useful work with the rest of the time" - what is it doing? If it's off doing anything else that I can observe, then I can time that instead. And we're back to square one.

Eliminating side channel leaks is entirely non-trivial.

201984 · on Aug 14, 2024

> If it's off doing anything else that I can observe, then I can time that instead. And we're back to square one.

Wouldn't these kinds of leaks always be present though? The scheduler can swap your process out at any time, regardless of whether it explicitly calls sleep() or not, so even with a constant time algorithm you'll have some variability.

some_furry · on Aug 14, 2024

The problem isn't variability in total.

The problem is variability based on a secret input.

some_furry · on Aug 14, 2024

https://security.stackexchange.com/a/96493

ghostpepper · on Aug 14, 2024

Unless I misunderstood (which is entirely possible, not a crypto expert) that article is answering a different question: whether adding random delays can thwart timing attacks. The GP was asking about implementing a constant-time delay.

some_furry · on Aug 14, 2024

Ok, let's assume you know how long an operation is supposed to take, but want to stop it from leaking if it returns faster.

To accomplish this, you would need a high-resolution timer that runs before the operation, after the operation, and then you can subtract the two and wait. And maybe that might work.

But you're not running in kernel space.

The resolution of timing information you'd get from this tactic would be noisy, and may reintroduce the signal you were trying to hide. Especially if you're running on multiple threads or physical processors, and aren't careful enough to constrain the operations and timings to the same core.

All it will do is slow your software down and maybe not actually help.

And that's the best case scenario!

Better to just use constant-time algorithms.

201984 · on Aug 14, 2024

Yeah, that's pretty much what I was asking.

  start = gettime()
  variable_time_crypto_operation()
  end = gettime()
  sleep(1sec - (end - start)) // Use a loop if necessary here

If the operation takes a couple ms and the sleep was 1sec, then how much information would realistically leak here? Sure the sleep might not be perfect, but I'd imagine the actual timing difference would get lost just in how much longer the sleep is.

theon144 · on Aug 15, 2024

Somewhat tangential, but there are much better options if you're looking for opportunities for optimization. It's literally trying to improve efficiency by skimping on safety features, like trying to save on vehicle weight by removing unnecessary seatbelts or crumple zones. Eliminating side channels concincingly is very difficult, you're just better off taking the tiny performance hit and virtually* eliminating that vector instead of trying to come up with a novel low-density seatbelt.

(I say virtually, because even constant time crypto isn't bulletproof - GoFetch, a recent Apple M-series CPU vulnerability inadvertently broke the "constant" part because of a quirk of the prefetcher. Side channels are hard, no need to make it harder.)

ghostpepper · on Aug 16, 2024

is a constant-time algorithm just an algorithm that completes the entire calculation before returning even if it knows the correct answer sooner?

some_furry · on Aug 16, 2024

"constant-time" isn't a literally technically correct descriptor, it's a short-hand

What matters is that the variance of execution time, which memory addresses are accessed, etc. is independent of any secret inputs.

ghostpepper · on Aug 16, 2024

so theoretically using timers and adding sleeps could solve the problem but practically there are better ways to achieve the goal of no timing information leakage?

some_furry · on Aug 17, 2024

Yes. I wrote this a few years ago as an introduction to how to write constant-time algorithms

https://soatok.blog/2020/08/27/soatoks-guide-to-side-channel...

sixthDot · on Aug 14, 2024

Might be slightly OT, but let's go. I think that this highly relates to how translations units are organized in a software project. At some point I think that mutual dependency is not a problem as long as the units form a unique "package".

pluto_modadic · on Aug 14, 2024

oh. ohhh damn. matrix... well then.