Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a basic concept in accounting. The general ledger is an immutable log of transactions. Other accounting documents are constructed from the general ledger, and can, if necessary, be rebuilt from it. This is the accepted way to do money-related things.

Synchronization is called "reconcilation" in accounting terminology.

The computer concept is that we have a current state, and changes to it come in. The database with the current state is authoritative. This is not suitable for handling money.

The real question is, do you really care what happened last month? Last year? If yes, a log-based approach is appropriate.



I’ve always concurred with the Helland/Kleppman observation mentioned viz. that the transaction log of a typical RDBMS is the canonical form and all the rows & tables merely projections.

It’s curious that over those projections, we then build event stores for CQRS/ES systems, ledgers etc, with their own projections mediated by application code.

But look underneath too. The journaled filesystem on which the database resides also has a log representation, and under that, a modern SSD is using an adaptive log structure to balance block writes.

It’s been a long time since we wrote an application event stream linearly straight to media, and although I appreciate the separate concerns that each of these layers addresses, I’d probably struggle to justify them all from first principles to even a slightly more Socratic version of myself.


This is similar to the observation that memory semantics in most any powerful machine since the late 60s are implemented using messaging, and then applications go ahead and build messaging out of memory semantics. Or the more general observation that every layer of information exchange tends towards implementing packet switching if there's sufficient budget (power/performance/cost) to support doing so.


> It’s curious that over those projections, we then build event stores for CQRS/ES systems, ledgers etc, with their own projections mediated by application code.

The database only supports CRUD. So while the CDC stream is the truth, it's very low level. We build higher-level event types (as in event sourcing) for the same reason we build any higher-level abstraction: it gives us a language in which to talk about business rules. Kleppmann makes this point in his book and it was something of an aha moment for me.


I'm sorry; but have you ever actually used a database before? A database supports FAR more than "only CRUD". Some really simple examples are CTEs, SELECT ... INTO (or INSERT ... SELECT for some dialects), triggers, views, etc.


CTE's are extensions to try an regain some of what was lost when adopting a model loosely based on Codd's declarative relational algebra, specifically the lack of transitive closure, the rest mostly fit into the CRUD world.

It is a bit circular, CRUD's elements create, read, update, and delete were chosen to represent the core features of a persistence layer.


what you mention is a high level projection over transaction log and is subject to transaction isolation levels

https://www.postgresql.org/docs/current/transaction-iso.html


Have you completely missed the context of this thread? We're talking about the transaction log aka CDC. This consists purely of CRUD operations because that's all the database understands. Maybe read the books cited?


we're a very short step away from a "hardware" database


The table data in database is the canonical form. You can delete the transaction logs, and temporarily lose some reliability. It is very common to delete the transaction logs when not needed. When databases are backed up, they either dump the logical data or take snapshot of the data. Then can take stream of transaction logs for syncing or backup until the next checkpoint.

I'm pretty sure journalled filesystem recycle the journal. There are log-structured filesystem but they aren't used much beyond low-level flash.


Sorry, this is mistaking the operational for the fundamental.

If a transaction log is replayed, then an identical set of relations will be obtained. Ergo, the log is the prime form of the database.

It’s that simple.


At a very abstract level, maybe. But it's common not to log changes that can trivially be rolled back, like insertions into a table that was created or truncated within the transaction. Of course, such optimizations are incompatible with log-based replication. So the statement should probably be, “in a system with log-based replication, the log is authoritative, and the tables are just an optimization”. This framing also avoids ambiguities because a transaction log may not be fully serialized, and might not fully determine table contents.


At work we need to distribute daily changes to a dataset, so we have a series of daily deltas. If a new client is brought up, they need to apply all the deltas to get the current dataset.

This is time consuming, so we optimized it by creating "base versions" every month. So a client only needs to download the latest base version and the apply the deltas since then...


Which is what accountants call "closing the books". Once all ledgers have been reconciled, old ledgers can be archived and you go forward from the last closing.

Forensic accounting, incidentally, is when something went badly wrong and outside accountants have to go back through the old ledgers, and maybe old invoices and payments and reconstruct the books. FTX had to do that after the bankruptcy to find out where the money went and where it was supposed to go.


The transaction log maintained from time 0 would be equivalent but too expensive to store compared to the tables.


If you relax your constraint to "retain logs for the past N days", you can accumulate the logs from T=0 to T=(today - N) into tables and still benefit from having snapshots from that cutoff onwards.


On the contrary, I’ve known plenty of sites that keep their logs.

Often written to tape, for obvious reasons.


Conversely, given a database, you can't (in general) reconstruct the specific transaction log that resulted in it. You can reconstruct some log, but it's not uniquely defined and is missing a lot of potentially relevant information.


You're correct on all points. Some additional refining points regarding accounting concepts:

- General legers are formed by way of transactions recorded as journal entries. Journal entries are where two or more accounts from the general ledger are debited & credited such that total debits equals total credits. For example, a sale will involve a journal entry which debits cash or accounts receivable, and credits revenue.

- The concept of the debits always needing to equal credits is the most important and fundamental control in accounting. It's is the core idea around which all of double entry bookkeeping is built.

- temporally ordered Journal entries are what form a log from which a general ledger can be derived. That log of journal entries is append-only and immutable. If you make an mistake with a journal entry, you typically don't delete it, you just make another adjusting (i.e. correcting) entry.

Having a traditional background in accounting as a CPA, as a programmer I have written systems that are built around a log of temporally ordered transactions that can be used to construct state across time. To my colleagues that didn't have that background they found it interesting but very strange as an idea (led to a lot of really interesting discussions!). It was totally strange to me that they found it odd because it was the most comfortable & natural way for me to think about many problems.


Could you recommend some resource to understand this view of accounting better?


This is why EG-Walker is so important, diamond types adoption and a solid TS port can't come soon enough for distributed systems.


You over estimate ERP and accounting systems.


That was basic accounting when computers were people with pencils and paper.


Same applies on accounting systems. It's just usually backed by a SQL database you can fudge (and often need to fix)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: