This is a basic concept in accounting. The general ledger is an immutable log of...

inopinatus · on Jan 24, 2025

I’ve always concurred with the Helland/Kleppman observation mentioned viz. that the transaction log of a typical RDBMS is the canonical form and all the rows & tables merely projections.

It’s curious that over those projections, we then build event stores for CQRS/ES systems, ledgers etc, with their own projections mediated by application code.

But look underneath too. The journaled filesystem on which the database resides also has a log representation, and under that, a modern SSD is using an adaptive log structure to balance block writes.

It’s been a long time since we wrote an application event stream linearly straight to media, and although I appreciate the separate concerns that each of these layers addresses, I’d probably struggle to justify them all from first principles to even a slightly more Socratic version of myself.

formerly_proven · 2025-01-25T15:12:20 1737817940

This is similar to the observation that memory semantics in most any powerful machine since the late 60s are implemented using messaging, and then applications go ahead and build messaging out of memory semantics. Or the more general observation that every layer of information exchange tends towards implementing packet switching if there's sufficient budget (power/performance/cost) to support doing so.

globular-toast · on Jan 24, 2025

> It’s curious that over those projections, we then build event stores for CQRS/ES systems, ledgers etc, with their own projections mediated by application code.

The database only supports CRUD. So while the CDC stream is the truth, it's very low level. We build higher-level event types (as in event sourcing) for the same reason we build any higher-level abstraction: it gives us a language in which to talk about business rules. Kleppmann makes this point in his book and it was something of an aha moment for me.

withinboredom · on Jan 24, 2025

I'm sorry; but have you ever actually used a database before? A database supports FAR more than "only CRUD". Some really simple examples are CTEs, SELECT ... INTO (or INSERT ... SELECT for some dialects), triggers, views, etc.

nyrikki · on Jan 24, 2025

CTE's are extensions to try an regain some of what was lost when adopting a model loosely based on Codd's declarative relational algebra, specifically the lack of transitive closure, the rest mostly fit into the CRUD world.

It is a bit circular, CRUD's elements create, read, update, and delete were chosen to represent the core features of a persistence layer.

slt2021 · on Jan 24, 2025

what you mention is a high level projection over transaction log and is subject to transaction isolation levels

https://www.postgresql.org/docs/current/transaction-iso.html

globular-toast · 2025-01-25T16:52:26 1737823946

Have you completely missed the context of this thread? We're talking about the transaction log aka CDC. This consists purely of CRUD operations because that's all the database understands. Maybe read the books cited?

namtab00 · 2025-01-25T11:57:38 1737806258

we're a very short step away from a "hardware" database

ianburrell · on Jan 24, 2025

The table data in database is the canonical form. You can delete the transaction logs, and temporarily lose some reliability. It is very common to delete the transaction logs when not needed. When databases are backed up, they either dump the logical data or take snapshot of the data. Then can take stream of transaction logs for syncing or backup until the next checkpoint.

I'm pretty sure journalled filesystem recycle the journal. There are log-structured filesystem but they aren't used much beyond low-level flash.

inopinatus · on Jan 24, 2025

Sorry, this is mistaking the operational for the fundamental.

If a transaction log is replayed, then an identical set of relations will be obtained. Ergo, the log is the prime form of the database.

It’s that simple.

fweimer · 2025-01-25T11:22:34 1737804154

At a very abstract level, maybe. But it's common not to log changes that can trivially be rolled back, like insertions into a table that was created or truncated within the transaction. Of course, such optimizations are incompatible with log-based replication. So the statement should probably be, “in a system with log-based replication, the log is authoritative, and the tables are just an optimization”. This framing also avoids ambiguities because a transaction log may not be fully serialized, and might not fully determine table contents.

magicalhippo · on Jan 24, 2025

At work we need to distribute daily changes to a dataset, so we have a series of daily deltas. If a new client is brought up, they need to apply all the deltas to get the current dataset.

This is time consuming, so we optimized it by creating "base versions" every month. So a client only needs to download the latest base version and the apply the deltas since then...

Animats · on Jan 24, 2025

Which is what accountants call "closing the books". Once all ledgers have been reconciled, old ledgers can be archived and you go forward from the last closing.

Forensic accounting, incidentally, is when something went badly wrong and outside accountants have to go back through the old ledgers, and maybe old invoices and payments and reconstruct the books. FTX had to do that after the bankruptcy to find out where the money went and where it was supposed to go.

isignal · on Jan 24, 2025

The transaction log maintained from time 0 would be equivalent but too expensive to store compared to the tables.

10000truths · on Jan 24, 2025

If you relax your constraint to "retain logs for the past N days", you can accumulate the logs from T=0 to T=(today - N) into tables and still benefit from having snapshots from that cutoff onwards.

inopinatus · on Jan 24, 2025

On the contrary, I’ve known plenty of sites that keep their logs.

Often written to tape, for obvious reasons.

jiggawatts · 2025-01-25T07:08:51 1737788931

Conversely, given a database, you can't (in general) reconstruct the specific transaction log that resulted in it. You can reconstruct some log, but it's not uniquely defined and is missing a lot of potentially relevant information.

LeanOnSheena · 2025-01-26T18:46:31 1737917191

You're correct on all points. Some additional refining points regarding accounting concepts:

- General legers are formed by way of transactions recorded as journal entries. Journal entries are where two or more accounts from the general ledger are debited & credited such that total debits equals total credits. For example, a sale will involve a journal entry which debits cash or accounts receivable, and credits revenue.

- The concept of the debits always needing to equal credits is the most important and fundamental control in accounting. It's is the core idea around which all of double entry bookkeeping is built.

- temporally ordered Journal entries are what form a log from which a general ledger can be derived. That log of journal entries is append-only and immutable. If you make an mistake with a journal entry, you typically don't delete it, you just make another adjusting (i.e. correcting) entry.

Having a traditional background in accounting as a CPA, as a programmer I have written systems that are built around a log of temporally ordered transactions that can be used to construct state across time. To my colleagues that didn't have that background they found it interesting but very strange as an idea (led to a lot of really interesting discussions!). It was totally strange to me that they found it odd because it was the most comfortable & natural way for me to think about many problems.

funcDropShadow · 2025-01-25T08:16:27 1737792987

Could you recommend some resource to understand this view of accounting better?

fellowniusmonk · on Jan 24, 2025

This is why EG-Walker is so important, diamond types adoption and a solid TS port can't come soon enough for distributed systems.

calvinmorrison · on Jan 25, 2025

You over estimate ERP and accounting systems.

baq · 2025-01-25T08:19:54 1737793194

That was basic accounting when computers were people with pencils and paper.

calvinmorrison · 2025-01-25T14:15:03 1737814503

Same applies on accounting systems. It's just usually backed by a SQL database you can fudge (and often need to fix)