From the article: > If the verifier finds any unsafe code, the program is reject...

lolinder · on July 22, 2024

I'm not able to comment on what this code is doing, but as for the theory:

The halting problem is only unsolvable in the general case. You cannot prove that any arbitrary piece of code will stop, but you can prove that specific types of code will stop and reject anything that you're unable to prove. The trivial case is "no jumps"—if your code executes strictly linearly and is itself finite then you know it will terminate. More advanced cases can also be proven, like a loop over a very specific bound, as long as you can place constraints on how the code can be structured.

As an example, take a look at Dafny, which places a lot of restrictions on loops [0], only allowing the subset that it can effectively analyze.

[0] https://ece.uwaterloo.ca/~agurfink/stqam/rise4fun-Dafny/#h25

jkrejcha · on July 22, 2024

Adding on (and it's not terribly relevant to eBPF), it's also worth noting that there are trivial programs you can prove DON'T halt.

A trivial example[1]:

    int main() {
        while (true) {}
        int x = foo();
        return x;
    }

This program trivially runs forever[2], and indeed many static code analyzers will point out that everything after the `while (true) {}` line is unreachable.

I feel like the halting problem is incredibly widely misunderstood to be similar to be about "ANY program" when it really talks about "ALL programs".

[1]: In C++, this is undefined behavior technically, but C and most other programming languages define the behavior of this (or equivalent) function.

[2]: Fun relevant xkcd: https://xkcd.com/1266/

fwip · on July 22, 2024

EDIT: I am incorrect, please ignore. (Original text below, for posterity).

Nit: In many languages, doesn't this depend on what foo() does? e.g:

  foo() {
    exit(0);
  }

loeg · on July 22, 2024

No? The foo() invocation is never reached because the while loop never terminates.

fwip · on July 22, 2024

Apologies; I misread the function call as being inside the loop.

Retr0id · on July 22, 2024

The halting problem cannot be solved in the general case, but in many cases you can prove that a program halts. eBPF only allows verifiably-halting programs to run.

dathinab · on July 22, 2024

the halting problem is only true for _arbitrary_ programs

but there are always sets of programs for which it is clearly possible to guarantee their termination

e.g. the program `return 1+1;` is guaranteed to halt

e.g. given program like `while condition(&mut state) { ... }` with where `condition()` is guaranteed to halt but otherwise unknown is not guaranteed to halt, but if you turn it into `for _ in 0..1000 { if !condition(&mut state) { break; } ... }` then it is guaranteed to halt after at most 1000 iterations

or in other words eBPF only accepts programs which it can proof will halt in at most maxins "instruction" (through it's more strict then my example, i.e. you would need to unroll the for-loop to make it pass validation)

the thing with programs which are provable halting is that they tend to also not be very convenient to write and/or quite limited in what you can do with them, i.e. they are not suitable as general purpose programming languages at all

efee22 · on July 22, 2024

Infinite loops are not possible and would get rejected by the verifier since it cannot solve the halting problem. Here is a good overview on the options available: https://ebpf-docs.dylanreimerink.nl/linux/concepts/loops/

pkhuong · on July 22, 2024

The basic logic flags any loop ("back-edge").

rezonant · on July 22, 2024

This, others have said it less concisely, but a program without loops and arbitrary jumps is guaranteed to halt if we assume the external functions it calls into will halt.

umanwizard · on July 22, 2024

eBPF is not Turing complete. Writing it is very annoying compared to writing normal C code for exactly this reason.

aksdlf · on July 22, 2024

I'm glad to hear that Meta and Google code is "rigorous". I'd prefer INRIA, universities that fund theorem provers, industries where correctness matters like aerospace or semiconductors.

chc4 · on July 22, 2024

Windows doesn't use the Linux eBPF verifier, they have their own implementation named PREVAIL[0] that is based on an abstract interpretation model that has formal small step semantics. The actual implementation isn't formally proven, however.

0: https://github.com/vbpf/ebpf-verifier

SoftTalker · on July 22, 2024

Also that lines of code is a proxy for rigor, something new I learned today. /s

sunnyps · on July 22, 2024

I think they mean that the code base is small enough to be audited thoroughly. Maybe they should reword it to be clearer.

lmm · on July 22, 2024

> I think they mean that the code base is small enough to be audited thoroughly.

They wouldn't say it was "over 20,000 lines" in that case. And 20,000 lines of C is far too big to audit.

atrus · on July 22, 2024

The halting problem is exhaustive, there isn't an algorithm that is valid for all programs. You can still check for some kinds of infinite loops though!

roywiggins · on July 22, 2024

More specifically, you can accept a set of programs that you are certain do halt, and reject all others, at the expense of rejecting some that will halt. As long as that set is large enough to be practical, the result can be useful. If you eg forbid code paths that jump "backwards", you can't really loop at all. Or require loops to be bounded by constants.

dtx1 · on July 22, 2024

I have no insight into this particular project but you could work around the halting problem by only allowing loops you can proof will not go infinite. That would of course imply rejecting loops that won't go infinite but can't be proven not to.

skywhopper · on July 22, 2024

If the verifier can't determine that the loop will halt, the program is disallowed. Also, if the program gets passed and then runs too long anyway, it's force-halted. So... I guess that solves the halting problem.

neaanopri · on July 22, 2024

It's more accurate to say that in principle, there could be programs that would halt, but that the verifier will deny.

lucianbr · on July 22, 2024

So this "solves" the halting problem by creating a new class "might-not-halt-but-not-sure" and lumping it with "does-not-halt". I find it hard to believe the new class is small enough for this to be useful, in the sense that it will avoid all kernel crashes.

I rather expect useful or needed code would be rejected due to "not-sure-it-halts", and then people will use some kind of exception or not use the verifier at all, and then we are back to square one.

tptacek · on July 22, 2024

Lots of useful code is rejected due to "not-sure-it-halts". That's the premise.

umanwizard · on July 22, 2024

Well it is useful in practice, there are some pretty useful products based on eBPF on Linux, most notably Cilium (and, shameless plug for the one I’m working on: Parca, an eBPF-based CPU profiler).

lucianbr · on July 22, 2024

Bad wording on my part, and I still don't know how to word it better. I'm sure this thing is useful, I don't think everyone who contributed code was just clueless.

However, the claim "in the future, computers will not crash due to bad software updates, even those updates that involve kernel code" must be false. There is no way it is true. Whatever Cilium is, I cannot believe it generally prevents kernel crashes.

umanwizard · on July 22, 2024

Correct, you will never be able to write any possible arbitrary code and have it run in eBPF. It necessarily constrains the class of programs you can write. But the constrained set is still quite useful and probably includes the crowdstrike agent.

Also, although this isn't the case now, it's possible to imagine that the verifier could be relaxed to allow a Turing-complete subset of C that supports infinite loops while still rejecting sources of UB/crashes like dereferencing an invalid pointer. I suspect from reading this post that that is the future Mr. Gregg has in mind.

> Whatever Cilium is, I cannot believe it generally prevents kernel crashes.

It doesn't magically prevent all kernel crashes from unrelated code. But what we can say is that Cilium itself can't crash the kernel unless there are bugs in the eBPF verifier.

lucianbr · on July 22, 2024

If the verifier allowed a Turing-complete language, it would solve the halting probem, which is impossible.

umanwizard · on July 22, 2024

My point is that the verifier could be relaxed to accept programs that never halt, thus not needing to solve the halting problem. You could then have the kernel just kill it after running over a certain maximum amount of time.

lucianbr · on July 22, 2024

Why do you think the kernel crashes when crowdstrike attempts to reference some unavailable address (or whatever it does) instead of just denying that operation and continuing on? That would be the solution using this philosophy "just kill long running program". And no need for eBPF or anything complicated. But it doesn't work that way in practice.

This is just such a naive view. "We can prevent programs from crashing by just taking care to stop them when they do bad things". Well, sure, that's why you have a kernel and userland. But it turns out, some things need to run in the kernel. Or "just deny permission". Then it turns out some programs need to run as admin. And so on.

There is a generality in the halting problem, and saying "we'll just kill long runing programs" just misses the point entirely.

Likely what will happen is that you will kill useful long-running programs, then an exception mechanism will be invented so some programs will not be killed, because they need to run longer, then one of those programs will go into an infinite loop despite all your mechanisms preventing it. Just like the crowdstrike driver managed to bring down the OS despite all the work that is supposed to prevent the entire computer crashing if a single program tries something stupid.

umanwizard · on July 22, 2024

> Why do you think the kernel crashes when crowdstrike attempts to reference some unavailable address (or whatever it does) instead of just denying that operation and continuing on?

Linux and windows are completely monolithic kernels; the crowdstrike agent isn't running in a sandbox and has complete unfettered access to the entire kernel address space. There is no separate "the kernel" to detect when the agent does something wrong; once a kernel module is loaded, IT IS the kernel.

Lots of people have indeed realized this is undesirable and that there should be a sandboxed way to run kernel code such that bugs in it can't cause arbitrarily bad undefined behavior. Thus they invented eBPF. That's precisely what eBPF is.

I don't know whether it's literally true that someday you will be able to write all possibly useful kernel-mode code in eBPF. But the spirit of the claim is true: there's a huge amount of useful software that could be written in eBPF today on Linux instead of as kernel modules, and this includes crowdstrike. Thus Windows supporting eBPF, and crowdstrike choosing to use it, would have solved this problem. That set of software will increase as the eBPF verifier is enhanced to accept a wider variety of programs.

Just like you can write pretty much any useful program in JavaScript today -- a sandboxed language.

You're also correct that due to the halting problem, we'll either have to accept that eBPF will never be Turing complete, OR accept that some eBPF programs will never halt and deal with the issues in other ways. Just like Chrome's JavaScript engine has to do. I don't really view this as a fundamentally unsolvable issue with the nature of eBPF.

tptacek · on July 22, 2024

The claim isn't that eBPF generally prevents kernel crashes. It's that it prevents crashes in the subset of programs it's designed for, in particular for instrumentation, which Crowdstrike is (in this author's conception) an instance of.

lucianbr · on July 22, 2024

I have quoted the claim verbatim from the article. It is obviously the claim of the article.

tptacek · on July 22, 2024

It's referring to Windows security software. If you have a lot of context with eBPF, which Gregg obviously does, the notion that eBPF will subsume the entire kernel doesn't even need to be said: you can't express arbitrary programs in eBPF. eBPF is safe because the verifier rejects the vast majority of valid programs.

red_admiral · on July 22, 2024

eBPF is not Turing-complete, I suppose.

javierhonduco · on July 22, 2024

It is not, programs that are accepted are proved to terminate. Large and more complex programs are accepted by BPF as of now, which might give the impression that it's now Turing complete, when it is definitely not the case.

lizxrice · on July 22, 2024

In this talk we demo Conway's Game of Life implemented in eBPF: https://www.youtube.com/watch?v=tClsqnZMN6I

lizxrice · on July 22, 2024

I should clarify that individual eBPF programs have to terminate, but more complex problems can be solved with multiple eBPF programs, and can be "scheduled" indefinitely using BPF timers

ahepp · on July 22, 2024

If you’re wrong about the loop, you’ll still hit BPF_MAXINSNS, so it’s fine to use heuristics that could produce a false negative right?

hiddencost · on July 22, 2024

Unterminated loops might be a better phrasing.