> We present a calculational approach to the design of type checkers, showing how they can be derived from behavioural specifications using equational reasoning. We focus on languages whose semantics can be expressed as a fold, and show how the calculations can be simplified using fold fusion. This approach enables the compositional derivation of correct-by-construction type checkers based on solving and composing fusion preconditions. We introduce our approach using a simple expression language, to which we then add support for exception handling and checked exceptions.
> This week I gave a lecture series at the School on Logical Frameworks and Proof Systems Interoperability. I spoke about programming language techniques for proof assistants. The lecture slides and the reference implementations of a minimalist type theory are available at:
As far as compiling continuations goes, sequent calculus (specifically as a compiler IR) is an interesting research direction. See Grokking the Sequent Calculus (Functional Pearl) from ICFP 2024: https://dl.acm.org/doi/abs/10.1145/3674639
"This becomes clearer with an example: When we want to evaluate the expression (2 + 3) ∗ 5,
we first have to focus on the subexpression 2 + 3 and evaluate it to its result 5. The remainder
of the program, which will run after we have finished the evaluation, can be represented with
the evaluation context □ ∗ 5. We cannot bind an evaluation context like □ ∗ 5 to a variable in
the lambda calculus, but in the λμ~μ-calculus we can bind such evaluation contexts to covariables.
Furthermore, the μ-operator gives direct access to the evaluation context in which the expression
is currently evaluated.
Having such direct access to the evaluation context is not always necessary for a programmer who wants to write an application, but it is often important for compiler implementors who write optimizations to make programs run faster. One solution that compiler writers use to represent evaluation contexts in the lambda calculus is called continuation-passing style. In continuation-passing style, an evaluation context like □ ∗ 5 is represented as a function λ x . x ∗ 5. This solution works, but the resulting types which are used to type a program in this style are arguably hard to understand. Being able to easily inspect these types can be very valuable, especially for intermediate representations, where terms tend to look complex. The promise of the λμ~μ-calculus is to provide the expressive power of programs in continuation-passing style without having to deal with the type-acrobatics that are usually associated with it."
"In some sense the Sequent Calculus (SC) is quite similar to a CPS based IR. To sum up the benefits of SC over CPS in two concepts, I would say: Symmetry and non-opaque continuations.
Symmetry: A lot of pairs of dual concepts are modelled very naturally and symmetrically in SC: call-by-value and call-by-name, data types and codata types, exception based error handling and Result-type error handling, for example.
Non-Opaque continuations: Instead of continuations in CPS which are just a special kind of function whose internals cannot be inspected (easily), SC introduces consumers whose internal structure can be inspected easily, for example when writing optimization passes."
There's an OOPSLA paper (referred to from the link starting that thread) from this year with 2 of the same authors that goes into more detail about using it as a compiler IR: https://dl.acm.org/doi/10.1145/3720507
> Programs written in C/C++ often include inline assembly: a snippet of architecture-specific assembly code used to access low-level functionalities that are impossible or expensive to simulate in the source language. Although inline assembly is widely used, its semantics has not yet been formally studied.
> In this paper, we overcome this deficiency by investigating the effect of inline assembly to the consistency semantics of C/C++ programs. We propose the first memory model of the C++ Programming Language with support for inline assembly for Intel’s x86 including non-temporal stores and store fences. We argue that previous provably correct compiler optimizations and correct compiler mappings should remain correct under such an extended model and we prove that this requirement is met by our proposed model.
"This pearl grew out of the authors’ frustration with textbook presentations of binary search. Given that binary search is one of the standard algorithms every programmer and computer science student should know, the subject is inadequately covered at best. Many textbooks mention binary search only in passing or they treat only special cases, such as computing the square root or searching an ordered table. A negative mindset prevails: the search possibly fails; in the divide&conquer step one half of the search space is excluded from consideration because searching this half will definitely fail. The correctness argument requires that the data is ordered, suggesting that monotonicity in some sense drives the search. One notable exception is Anne Kaldewaij’s textbook (Reference Kaldewaij1990): when discussing “function inversion” (given n , find an argument i such that fi⪯n≺f(1+i) ) he emphasises repeatedly that the correctness of the search does not require that f is an ascending function.
The gist of this pearl is to approach search problems with a positive mind-set: the search always succeeds; in the divide&conquer step the search continues with the half that guarantees success. The correctness argument relies on a suitable functional invariant, not on monotonicity. The “Beat Your Neighbours!” problem, a concrete make-up for local maximum search, shows that the extra generality is actually needed."
It's very brief (so if you only have time to read one thing you won't waste a single minute: it's to the point) and (perhaps surprisingly, given its publication date) highly relevant: In fact, right after reading Section IV.C (AMD K5) you should be able to immediately jump to and understand the microarchitectural diagrams of, say, AMD Zen 2:
http://web.archive.org/web/20231213180041/https://en.wikichi...http://web.archive.org/web/20231213180041/https://en.wikichi...
...and any other contemporary CPU.
Smith & Sohi paper is going to give you great intuition for what's essentially implemented by all modern CPUs: restricted dataflow (RDF) architecture.
For context, all superscalar dynamically scheduled (out-of-order) CPUs implement RDF, introduced by Yale Patt's research group's work on High Performance Substrate (HPS). This work pretty much defined the modern concept of a restricted dataflow CPU: breaking complex instructions into smaller micro-operations and dynamically scheduling these to execute out-of-order (or, to be more specific, in a restricted dataflow order) on multiple execution units.
(If you're curious, the "restricted" part comes from the finite instruction window delimiting the operations to be scheduled, which stems from the finite size of all the physical resources in real hardware, like the ROB/reorder buffer. The "dataflow" comes from having to respect data dependencies, like `A = 1` and `B = 2` having to execute before `C = A + B`, or `MOV A, 1` and `MOV B, 2` having to execute before `ADD C, A, B`; but note that you can freely reorder the aforementioned moves among themselves as long as you execute them before the add: a schedule/execution order of `MOV B, 2; MOV A, 1; ADD C, A, B` is just as valid).
For the historical background, see "HPS papers: A retrospective", https://www.researchgate.net/publication/308371952_HPS_paper...
(Minor per peeve warning:) It also sets the record straight w.r.t. to the RISC being orthogonal to the historical development of modern superscalar out-of-order CPUs: In particular, it's worth noting that the aforementioned micro-operations have absolutely _nothing_ to do with RISC! Another great resource on that is Fabian's video "CPU uArch: Microcode", https://www.youtube.com/watch?v=JpQ6QVgtyGE (also worth noting that micro-operations and microcode are _very_ different concepts; that's also very well covered by the video).
Another good, succinct description of the historical background is the 2024 IEEE CS Eckert-Mauchly Award (Wen-mei Hwu was a PhD student in the aforementioned Yale Patt's group): "Hwu was one of the original architects of the High-Performance Substrate (HPS) model that pioneered superscalar microarchitecture, introducing the concepts of dynamic scheduling, branch prediction, speculative execution, a post-decode cache, and in-order retirement." - https://www.acm.org/articles/bulletins/2024/june/eckert-mauc...
On a side note, load-store architecture introduced by CDC 6600 (designed by Seymour Cray in the 1960s) is sometimes mistaken for RISC (which came decade+ later, arguably introduced in IBM 801 designed by John Cocke in the late 1970s/early 1980), https://en.wikipedia.org/wiki/Load%E2%80%93store_architectur...
One could say load-store architecture does have an influence on compiler backends implementation, after a fashion (thinking of instruction scheduling, with a complex operation combining, say, LOAD with arithmetic operation OP, broken down in the scheduling models as separate LOAD and OP operations/effects).
These are absolutely fantastic--I've followed all of these a few years back and can vouch for the quality and being up-to-date or at least decades ahead of the general computer architecture textbooks: the lectures and readings cover contemporary work including ISCA and MICRO papers that have only recently found their way to the implementation in production CPUs (e.g., the hashed perceptron predictor that's one of the branch predictor units in AMD Zen 2, http://web.archive.org/web/20231213180041/https://en.wikichi...).
There are more, topic-specific texts that are very good, e.g., A Primer on Memory Consistency and Cache Coherence, Second Edition; Synthesis Lectures on Computer Architecture 15:1 (2020); Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, David A. Wood; open access (freely available) at https://doi.org/10.2200/S00962ED2V01Y201910CAC049 but when you are at this point you're likely going to be good at finding the relevant resources yourself, so I'm going to leave it to you to explore further :-)
reply