More

matt_d · 2025-07-14T21:41:49 1752529309

Background:

https://discourse.llvm.org/t/announcing-the-lifetime-safety-...

Lifetime Analysis: Current Status

> For those not already familiar, we’re working on a new lifetime analysis in Clang to catch issues like use-after-scope or returning pointers to stack memory. The analysis is alias-based and draws inspiration from Rust’s borrow checker (specifically, [Polonius](https://smallcultfollowing.com/babysteps/blog/2018/04/27/an-...)). More details in the RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-a...

> The initial implementation targets intra-procedural analysis for C++ raw pointers. This keeps the surface area small while we iterate. Over time, we aim to enable this analysis by default in Clang, with both “permissive” and “strict” modes to balance noise and coverage.

Key Components

- Conceptual Model: Introduces the fundamental concepts of Loan, Origin, and Path to model memory borrows and the lifetime of pointers.

- Fact Generation: A frontend pass traverses the Clang CFG to generate a representation of lifetime-relevant events, such as pointer assignments, taking an address, and variables going out of scope.

- Testing: llvm-lit tests validate the analysis by checking the generated facts.

Example:

[LifetimeSafety] Introduce intra-procedural analysis in Clang

Commit: https://github.com/llvm/llvm-project/commit/3076794e924f

PR: https://github.com/llvm/llvm-project/pull/142313

Test source code: https://github.com/llvm/llvm-project/blob/3076794e924f30ae21...

matt_d · 2025-07-11T23:47:32 1752277652

Rex Kernel Extensions: a safe and usable kernel extension framework that allows loading and executing Rust kernel extension programs in the place of eBPF

https://github.com/rex-rs/rex

Abstract:

"Safe kernel extensions have gained significant traction, evolving from simple packet filters to large, complex programs that customize storage, networking, and scheduling. Existing kernel extension mechanisms like eBPF rely on in-kernel verifiers to ensure safety of kernel extensions by static verification using symbolic execution. We identify significant usability issues—safe extensions being rejected by the verifier—due to the language-verifier gap, a mismatch between developers’ expectation of program safety provided by a contract with the programming language, and the verifier’s expectation.

We present Rex, a new kernel extension framework that closes the language-verifier gap and improves the usability of kernel extensions in terms of programming experience and maintainability. Rex builds upon language-based safety to provide safety properties desired by kernel extensions, along with a lightweight extralingual runtime for properties that are unsuitable for static analysis, including safe exception handling, stack safety, and termination. With Rex, kernel extensions are written in safe Rust and interact with the kernel via a safe interface provided by Rex’s kernel crate. No separate static verification is needed. Rex addresses usability issues of eBPF kernel extensions without compromising performance."

matt_d · 2025-07-04T00:36:32 1751589392

Abstract:

"Extensions allow applications to expand the capabilities of database management systems (DBMSs) with custom logic. However, the extensibility environment for some DBMSs is fraught with perils, causing developers to resort to unorthodox methods to achieve their goals. This paper studies and evaluates the design of DBMS extensibility. First, we provide a comprehensive taxonomy of the types of DBMS extensibility. We then examine the extensibility of six DBMSs: PostgreSQL, MySQL, MariaDB, SQLite, Redis, and DuckDB. We present an automated extension analysis toolkit that collects static and dynamic information on how an extension integrates into the DBMS. Our evaluation of over 400 PostgreSQL extensions shows that 16.8% of them are incompatible with at least one other extension and can cause system failures. These results also show the correlation between these failures and factors related to extension complexity and implementation."

Database Extensions Analyzer: https://github.com/cmu-db/ext-analyzer

Talk: https://www.youtube.com/watch?v=U7v0fubktoY

matt_d · 2025-07-03T03:09:17 1751512157

Slides: https://llvm.org/devmtg/2025-06/#program

matt_d · 2025-06-21T04:55:31 1750481731

Preprint: https://williamjbowman.com/resources/wjb2024-ethical-compile...

DOI: https://doi.org/10.1145/3704253.3706135

William J. Bowman. The 2025 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM) – Invited Talk.

Abstract:

"The is-ought gap is a problem in moral philosophy observing that ethical judgments ("ought") cannot be grounded purely in truth judgments ("is"): that an ought cannot be derived from an is.

This gap renders the following argument invalid: "It is true that type safe languages prevent bugs and that bugs cause harm, therefore you ought to write in type safe languages".

To validate ethical claims, we must bridge the gap between is and ought with some ethical axiom, such as "I believe one ought not cause harm".

But what do ethics have to do with manipulating programs?

A lot!

Ethics are central to correctness!

For example, suppose an algorithm infers the type of is Bool, and is in fact a Bool; the program type checks.

Is the program correct-does it behave as it ought?

We cannot answer this without some ethical axioms: what does the programmer believe ought to be?

I believe one ought to design and implement languages ethically.

We must give the programmer the ability to express their ethics-their values and beliefs about a program-in addition to mere computational content, and build tools that respect the distinction between is and ought.

This paper is a guide to ethical language design and implementation possibilities."

matt_d · 2025-06-10T19:33:46 1749584026

3-min talk: https://www.youtube.com/watch?v=791PEI35tio

Abstract:

> Interleaving/Unrolling and Vectorization are two popular means to optimize applications. While the first one creates multiple copies of the loop body content, the second one focuses on operating on multiple data elements in parallel thanks to SIMD units available in the CPU. In theory, interleaving and vectorization are orthogonal optimizations, one relying on instruction-level parallelism/superscalarity, and the other on data-level parallelism within a single instruction. Modern CPU architectures provide both of these parallelism mechanisms at once, and the combination of vectorization and interleaving is complex, influencing each other due to instruction selection and complexity of underlying hardware, and the programmer often has to rely on the compiler's auto-vectorization.

> Based on a large evaluation of 642 loops coming from the literature, this paper demonstrates that significant gains (up to 20%) can be obtained by adapting the LLVM auto-vectorizer to better exploit interleaving and vectorization for a given AArch64 architecture. The proposed approach is flexible and can be easily applied at both loop level or application level. Experiments on 5 mini-apps coming from the HPC realm show similar improvements and demonstrates the co-design potential of the presented approach.

matt_d · 2025-03-27T04:54:46 1743051286

Abstract: https://arxiv.org/abs/2405.11182

PDF: https://arxiv.org/pdf/2405.11182

Section 8, Conclusion:

    In this paper, we quantify the overhead of running a state machine replication system for cloud systems written in a language with GC. To this end, we (1) design from scratch a canonical cloud system—a distributed, consensus-based, linearizable key-values store, (2) implement it in C++, Java, Rust, and Go, (3) evaluate implementations under an update-heavy and read-heavy workloads on AWS under different resource constraints while trying to hit the maximum throughput with a fixed low tail latency. Our results show that even with ample memory, GC has a non-trivial cost, and with limited memory, languages with memory management can achieve an order of magnitude higher throughput than the languages with GC on the same hardware. Our key observation is that if a cloud system is expected to grow to a large volume of users, building the system in a language with manual memory management and thereby paying a higher development cost than using a language with GC may result in a significant cloud cost savings in the long run.

matt_d · 2025-02-12T18:27:16 1739384836

On that note, there's an ongoing discussion on allowing early exits currently not allowed in the MLIR's SCF (structured control flow) dialect, https://discourse.llvm.org/t/rfc-region-based-control-flow-w..., https://github.com/google/heir/issues/922

matt_d · 2025-01-21T01:50:01 1737424201

Abstract:

"We present the design of CompFuzzCI, a framework for incorporating compiler fuzzing into the continuous integration (CI) workflow of the compiler for Dafny, an open-source programming language that is increasingly used in and contributed to by industry. CompFuzzCI explores the idea of running a brief fuzzing campaign as part of the CI workflow of each pull request to a compiler project. Making this effective involved devising solutions for various challenges, including how to deduplicate bugs, how to bisect the project’s revision history to find the commit responsible for a regression (challenging when project interfaces change over time), and how to ensure that fuzz testing complements existing regression testing efforts. We explain how we have engaged with the Dafny development team at Amazon to approach these and other problems in the design of CompFuzzCI, and the lessons learned in the process. As a by-product of our work with CompFuzzCI, we found and reported three previously-unknown bugs in the Dafny compiler. We also present a controlled experiment simulating the use of CompFuzzCI over time on a range of Dafny commits, to assess its ability to find historic bugs. CompFuzzCI prioritises support for the Dafny compiler and the fuzz-d fuzzer but has a generalisable design: with modest modification to its internal interfaces, it could be adapted to work with other fuzzers, and the lessons learned from our experience will be relevant to teams considering including fuzzing in the CI of other industrial software projects."

https://github.com/CompFuzzCI

matt_d · 2025-01-19T20:29:02 1737318542

More on compiler correctness in general: https://github.com/MattPD/cpplinks/blob/master/compilers.cor... and fuzzing in particular: https://github.com/MattPD/cpplinks/blob/master/compilers.cor...

fruffy · 2025-01-20T15:25:32 1737386732

This is a great collection, thank you. :)