Well it is useful in practice, there are some pretty useful products based on eB...

lucianbr · on July 22, 2024

Bad wording on my part, and I still don't know how to word it better. I'm sure this thing is useful, I don't think everyone who contributed code was just clueless.

However, the claim "in the future, computers will not crash due to bad software updates, even those updates that involve kernel code" must be false. There is no way it is true. Whatever Cilium is, I cannot believe it generally prevents kernel crashes.

umanwizard · on July 22, 2024

Correct, you will never be able to write any possible arbitrary code and have it run in eBPF. It necessarily constrains the class of programs you can write. But the constrained set is still quite useful and probably includes the crowdstrike agent.

Also, although this isn't the case now, it's possible to imagine that the verifier could be relaxed to allow a Turing-complete subset of C that supports infinite loops while still rejecting sources of UB/crashes like dereferencing an invalid pointer. I suspect from reading this post that that is the future Mr. Gregg has in mind.

> Whatever Cilium is, I cannot believe it generally prevents kernel crashes.

It doesn't magically prevent all kernel crashes from unrelated code. But what we can say is that Cilium itself can't crash the kernel unless there are bugs in the eBPF verifier.

lucianbr · on July 22, 2024

If the verifier allowed a Turing-complete language, it would solve the halting probem, which is impossible.

umanwizard · on July 22, 2024

My point is that the verifier could be relaxed to accept programs that never halt, thus not needing to solve the halting problem. You could then have the kernel just kill it after running over a certain maximum amount of time.

lucianbr · on July 22, 2024

Why do you think the kernel crashes when crowdstrike attempts to reference some unavailable address (or whatever it does) instead of just denying that operation and continuing on? That would be the solution using this philosophy "just kill long running program". And no need for eBPF or anything complicated. But it doesn't work that way in practice.

This is just such a naive view. "We can prevent programs from crashing by just taking care to stop them when they do bad things". Well, sure, that's why you have a kernel and userland. But it turns out, some things need to run in the kernel. Or "just deny permission". Then it turns out some programs need to run as admin. And so on.

There is a generality in the halting problem, and saying "we'll just kill long runing programs" just misses the point entirely.

Likely what will happen is that you will kill useful long-running programs, then an exception mechanism will be invented so some programs will not be killed, because they need to run longer, then one of those programs will go into an infinite loop despite all your mechanisms preventing it. Just like the crowdstrike driver managed to bring down the OS despite all the work that is supposed to prevent the entire computer crashing if a single program tries something stupid.

umanwizard · on July 22, 2024

> Why do you think the kernel crashes when crowdstrike attempts to reference some unavailable address (or whatever it does) instead of just denying that operation and continuing on?

Linux and windows are completely monolithic kernels; the crowdstrike agent isn't running in a sandbox and has complete unfettered access to the entire kernel address space. There is no separate "the kernel" to detect when the agent does something wrong; once a kernel module is loaded, IT IS the kernel.

Lots of people have indeed realized this is undesirable and that there should be a sandboxed way to run kernel code such that bugs in it can't cause arbitrarily bad undefined behavior. Thus they invented eBPF. That's precisely what eBPF is.

I don't know whether it's literally true that someday you will be able to write all possibly useful kernel-mode code in eBPF. But the spirit of the claim is true: there's a huge amount of useful software that could be written in eBPF today on Linux instead of as kernel modules, and this includes crowdstrike. Thus Windows supporting eBPF, and crowdstrike choosing to use it, would have solved this problem. That set of software will increase as the eBPF verifier is enhanced to accept a wider variety of programs.

Just like you can write pretty much any useful program in JavaScript today -- a sandboxed language.

You're also correct that due to the halting problem, we'll either have to accept that eBPF will never be Turing complete, OR accept that some eBPF programs will never halt and deal with the issues in other ways. Just like Chrome's JavaScript engine has to do. I don't really view this as a fundamentally unsolvable issue with the nature of eBPF.

tptacek · on July 22, 2024

The claim isn't that eBPF generally prevents kernel crashes. It's that it prevents crashes in the subset of programs it's designed for, in particular for instrumentation, which Crowdstrike is (in this author's conception) an instance of.

lucianbr · on July 22, 2024

I have quoted the claim verbatim from the article. It is obviously the claim of the article.

tptacek · on July 22, 2024

It's referring to Windows security software. If you have a lot of context with eBPF, which Gregg obviously does, the notion that eBPF will subsume the entire kernel doesn't even need to be said: you can't express arbitrary programs in eBPF. eBPF is safe because the verifier rejects the vast majority of valid programs.