What people call “edge” is a single optimization bringing code near the user. I think we should go far beyond that. Sure, use something like Cloudflare Workers for your “edge” needs (ie bringing your React app close to the end user/doing server side rendering). But don’t stop there because where data resides, what APIs you call are all going to matter.
How's D1 meant to be used, since it has a very small maximum size (100mb I believe)? Should I create one database per user, for example? Genuine question.
The maximum size will increase to at least 1GB in the near future, enabled by our recent rewrite of the underlying storage layer. Perhaps we can push it further, even, we'll see.
But yes, I think the next step is then some sort of sharding. Sharding by user would be an obvious approach for many apps. I think we should build a framework to help manage this, so apps would only need to provide some callbacks e.g. to compute shard key for a particular query.
Alternatively, apps that want full control will be able to use Durable Objects directly. We'll soon (in a few months, probably?) enable the new storage engine for all Durable Objects which means every object will have a private SQLite database.
(I'm the tech lead for Workers in general, and currently focused on this project in particular.)
do you think you will have any API driven access to the data held in these services? IE, I don't per se want to have to use a worker to arbitrage between the Cloudflare platform and HTTP access to the data within Cloudflare systems.
Much like R2 has an S3 compat API as well as a way to retrieve files outside of workers (IIRC) will this be true of all forms of storage on the platform?
Sometimes it'd be nice to prime the cache with KV for example, without having to invoke a worker
> do you think you will have any API driven access to the data held in these services? IE, I don't per se want to have to use a worker to arbitrage between the Cloudflare platform and HTTP access to the data within Cloudflare systems.
For D1, yes — on the roadmap is a native HTTP API. KV has a HTTP API as well, so you can write directly.
Not really sure how you distinguish between runtime and something else. It seems to me that you could provision a "shard" as simply as executing in a worker:
What I mean is that workers cannot create databases. You have to create each database using wrangler (or the UI) and then edit the configuration file to add a binding. The binding becomes a variable in the JavaScript environment, which means you have to change the source code every time a new database is created.
Therefore, creating a database per user as part of the user signup flow is not possible.
Ah, I see. I thought it would be possible to dynamically set a secret which would effectively be a database uri. But upon further investigation, it does indeed need a redeploy.
I suppose you could make something to write the bindings to the toml and trigger a redeploy, but that's definitely not pretty.
Thank for sharing this, that makes lot of sense to me. The focus on running everything on edge has been confusing to me for a while due to the increased db latency. That makes so much sense to dynamically move processing to the best location in the current network instead of the binary choice “close to db”, “close to users”.
For our product, (SQL your cloud & saas resources as a service) we want to consider D1, but for security we would want database per tenant. Would that model work for Cloudflare? 1000 databases for 1000 tenants. Or should we use something else with shared tenancy?
Not quite sure understanding SQL your cloud (using their AWS account?) but from what I guess by the consideration of D1: I'd wager you want something like Neon but self-hosted. I am building a similar product where I give each customer a full-blown Postgres (without individual extensions) and that is the solution I landed on with the most flexibility and security after testing out SQLite vm per user (via Fly.io), SQLite on the edge (Worker + S3, before D1 existed) and Postgres cluster per user (via Stackgres)
The nice part is the decoupling. User not using their DB for 5 minutes? Time to idle them and use that compute for another customer.
And it's also really nice that storage becomes way way way easier. Storage is the biggest hurdle when your database count scales with your user count so you can instead build out the compute system once, fix a few issues along the way and from then on just babysit a single big storage cluster.
And if that becomes too big at some point, replicate it, shard the tenants by region or whatever and you "only" have to manage storage a dozen times or something. It's obviously always annoying, but at least the setup is similar enough that you can park on FTE per Storage cluster and serve enough customers to make it worth it
Edge does make a lot of sense, but only if you’re actually talking about physical computers that need to exist at physical locations to do no-network/embedded systems/rtos without the cloud, but with a cloud-metrics or aggregation layer (or other apps) which explicitly do not need to be on a nearby computer to help the users, and often serve as a control plane for a set of edge resources. Think Tesla wanting to do BI on every car vs. having the car actually work for the user when they decide to drive through the network-disconnected wilderness.
Server side rendering or “single optimization” as the “edge” is kind of an abuse of terms.
I extensively used SQLite in a telemetry system for an electric race car. The car has an onboard computer, first a Raspberry Pi then a dual core Arm processor. Onboard code logs ~4000 messages a second into three SQLite databases. After a drive session a script merges the three databases into a single SQLite session log. The session log is decoded on a different computer to ~400 columns of time series data again stored in SQLite. This data is then finally converted into a proprietary format for visual analysis.
Prior to SQLite I tried CSV files and raw binary formats. None of them could match the flexibility, ease of use, and throughput of SQLite. My setup has been in use for several weeks now and has processed numerous rows of data. As the SQLite project outlines on its website:
> SQLite does not compete with client/server databases. SQLite competes with fopen().
Other have asked, but I'd also be very interested in knowing more if you can share more details.
I had to spent a non trivial amount of time convincing others to use SQLite instead of dumb files in similar situations, stories like yours are pure gold.
There are a lot of SQLite praises in this thread, on the internet and on SQLite's website so here are a couple of reasons that drove me to choose it. My experience with traditional formats like CSV or raw binary is that they have a somewhat lower overhead to "get working" but are quite difficult to scale and adapt to changing needs, a phase most projects are bound to go through.
SQLite's reputation and adoption makes me comfortable to assume its operational correctness so I can focus on data itself, not reader/writer logics for it. Even though a SQLite database's schema may change, I'm glad knowing that the database file can still be easily opened and read many years into the future, something that custom formats may not guarantee.
On CSV: Syntax and library differences aside, SQLite almost works like a drop-in replacement for CSV since a single table database can be essentially treated as a glorified spreadsheet/CSV if wished. My post processing data pipeline is built with pandas so it is virtually the same for me to read a SQLite database vs. a CSV file.
I've had a huge amount of success using sqlite to collect and analyze logs. The obvious approach would have been to dump enormous csv or json files, but doing in sqllite gave me (a) much better reliability, and (b) the ability to come up with post-hoc analysis approaches without changing the main process.
> Onboard code logs ~4000 messages a second into three SQLite databases. After a drive session a script merges the three databases into a single SQLite session log.
A bit unrelated, but curious as to why you wrote to three separate databases only later to merge them.
Could be three different processes interacting with different systems. Say one process listening to a CAN bus, another to local Ethernet[1] and a third to some I2C/SPI sensors. If they don't interact otherwise, which they probably wouldn't given they're just logging telemetry, it's more flexible to just have them as separate processes. The I2C/SPI might be a Python script, the others in C/C++ or Rust say, whatever is easier.
Precisely so! There are two CAN buses and one UART port to log from, and the Python code spawns subprocesses for each of them. They each create or open a SQLite database named after their bus/port and get to work.
This setup is easy to put together but might not be optimal. I don't have a lot of experience with SQLite performance tuning, and I wonder if it will be faster to have worker threads pass everything thru IPC to a writer thread, which batches rows and writes to a centralized database.
Good question. As a non-racer who enjoys watching racing, I'd imagine at least some accelerometers/gyros and load cells if they got them, and possibly Pitot tubes or similar for measuring aerodynamics when testing, to get a detailed understanding on the forces on the car and how it behaves. Motor currents probably up there too, to catch current spikes.
> detailed understanding on the forces on the car and how it behaves
But at 4000 samples per second? You'd think twenty would be plenty.
NyQuist: 4000 samples per second means we are looking to get information about an up to 1800 Hz signal. A6 on the piano keyboard (excluding harmonics).
Sorry, I mean Nyquist. I was confused by NyQuil, the night time flu medicine.
A sound wave moves 85 mm in that time at Mach 1, around 340 m/s. That doesn't mean we need to sample a sound that has no components above 20 Hz, using 4000 samples per second.
The propagation speed can be misleading.
All that matters is whether there are oscillations in the suspension that go up to 1800 Hz, not how fast the car is going forward. If there are, those would have to be harmonics. You'd think would be beyond the frequency response of the suspension. Suspensions are heavy, bulky components and are heavily dampened (which is a kind of low-pass filter).
If the suspension moves anywhere near 17 mm per sample (250 km/h vertically), that car is in serious trouble.
> All that matters is whether there are oscillations in the suspension that go up to 1800 Hz
Surely you care about more than just oscillations, for example how exactly the suspension compresses during breaking. If you've seen slow-mo shots from high-speed race cars riding a curb, it can be quite violent with lots of movement.
In this article[1] about Formula 1, they state they sample vibration data at 200kHz. This is then filtered to a lower rate for logging, but getting say 5kHz out of 200kHz raw sensor data doesn't seem unreasonable to me.
They also mention they collect about 30MB per lap of sensor data from more than 250 sensor, and laps are typically around 1.5-2 minutes long. If one assumes a 2 minute lap and 250 sensors, that's 1kB/s per sensor on average.
> how exactly the suspension compresses during breaking
The entire graph of suspension vs time should be well below the frequency range. It doesn't matter whether we are looking for frequency domain features or time domain features.
(Like someone said upthread, it's probably 4000 events per second as an aggregate from numerous sensors, mutiplexed into one sqlite. Or maybe even a total across the three sqlites.)
I found this page[1] which mentions dynoing shocks at 3 inches per second, as that what one could see in the pits. If you want some spatial-temporal resolution for that then you'll want a fairly decent sample rate, no?
Maybe not quite 4kSps is needed but for vibration and such it'd make sense to me.
3 inches per second is very slow. I think shocks compress quite a bit faster when you hit a speedbump in a parking lot.
The sample rate you need to reconstruct the motion of shocks is entirely determined by Nyquist, just like sampling audio or any other signal.
You need some 2.2x the highest frequency you want to capture, and make sure you filter out anything above that (if it exists).
If there is nothing above 20 Hz in the suspension's movement, or nothing you're interested in, then you need a 44 Hz sample rate. (More if you implement oversampling, but not 10 times more let alone 100.)
All that and more! Sensors like shock potentiometers, strain gauges, temperatures from all over the car, IMU readings ([ag][xyz]), and electrical measurements like HV/LV voltage, current, power, energy, as well as controls like motor commands and speeds, controller states, etc.
Lots of folks put work into a car and they all have metrics of interest to tell how their systems perform. An ideal telemetry system for us captures all this information and makes it available both in real time and analysis.
I’m going to take a guess that the rationale is file locking. If you have three separate processes using three separate database files, you can write all the data you want, as fast as you can, without worrying about file locking. SQLite can handle three processes writing at the same time, but it can be faster to do it split into separate files and then merge after the fact.
One of SQLite's weaknesses is it's not client-server, so it's ill-advised to have multiple applications try to use a single sqlite server. Concurrency is another issue.
One of SQLite's strengths is that it's not client-server, so there's no single sqlite server. It is simply not possible to have multiple applications try to use a single sqlite server, because you cannot use what does not exist. No sane person would ever say "it's ill-advised to have multiple ambassadors try to speak to the King of the USA", and likewise no sane person should ever say "it's ill-advised to have multiple applications try to use a single sqlite server".
It is however possible to have multiple applications access the same sqlite database on the same filesystem (i.e. locally, not via network access, not via remounts or symlinks or other forms of synonyms) and it will sort out the locking for you. Depending on what you need to do, there's I think three possible strategies you can use so it can probably handle your concurrency needs faster than Postgres via the network can. For instance, this particular task is write-heavy, and it would benefit from a different setting than a read-heavy application.
There are certainly occasions when Postgres or MySQL or MS SQL or Oracle is a better choice than SQLite, and sometimes concurrency contributes to that decision. But it's unlikely to be the case that you had an SQLite solution that worked really well with one process reading and writing from the database, and now all of a sudden you need a second or third application to use the database concurrently with the first, and you find you need to switch to a client-server model And even if sometimes you need to take specific steps (like adopting a different locking model), you also have to take specific steps to run Postgres (e.g. using a pool to speed up connections and limit the number of concurrent connections because each connection is a separate process). It's a matter of knowing your tools.
> None of them could match the flexibility, ease of use, and throughput of SQLite.
It sounds unbelievable that you couldn't append some CSV or binary record to file via an open file descriptor more efficiently than inserting into SQLite.
(It's clear that to have the data already in SQLite form saves post-race steps.)
The onboard code processes binary data: CAN bus messages and alike, along with their nonbinary metadata. The first version of the code used CSV because they are so trivial to read and write, but CSV as a text format required me to either hex or base64 the bytes which incurred a non-trivial waste of time and space. For a custom binary record I would need to write custom (de)serializers, which I wasn't willing to do given frequent changes in types of data I need to record during development.
SQLite is the only solution for me so far that provides a structured data store for mixed data types, is power loss safe (important to an embedded system,) and is a high-quality and portable industry standard. Let along the SQL language itself, which has proved to be hugely useful for first-pass post processing given the amount of data that I ingest.
Perhaps there are other technology that will work just as well or better: maybe Protocol Buffers and alike? Yet SQLite currently ticks all my boxes and I'm quite happy with it.
It's an AND statement, not an OR statement. You can append to a CSV file efficiently, but while flexibility is context-dependent and ease of use is very subjective, we can assume the GP knows their own use case).
Sadly my team is pretty light on public tech blogs, but there are plenty of projects like this one! Copying from my comment above:
> The system is for a Formula SAE (FSAE) Electric style race car. FSAE is a collegiate competition, and I'm part of a US Pacific Northwest team. Our car uses a 550 V battery and is 4WD. Most of our electronics and firmware/software are in-house besides motors and inverters. I lead the telemetry project and work with 1-2 other members on the system.
> The system consists of an on-vehicle computer (Raspberry Pi or NXP i.MX) and a ground station (Rockchip RK3588S). We use Python heavily and the real-time dashboard part is done with InfluxDB and Grafana. Both computers and any user devices are on a Wi-Fi network made with a long-range WISP access point.
> Our team is unfortunately very light on public tech blogs and how I wished I can change that... There is plenty of information on FSAE electric cars, vehicle telemetry, and both on the internet and they were a source of inspiration as I designed my own solution.
Interesting! I'm currently working on a system that writes time series data to raw binary files, but we're considering switching to a different file format for the same reasons. Have you considered any other formats, such as hdf5?
The one potential requirement I'll caution you on is resilience against write failures, in case you're collecting time-series data and can't afford to lose a "session" or spend time messing with recovery options. HDF5 is not made for that. Binary and SQLite are better in that respect. SQLite wins on usability against binary and HDF5.
I'm not familiar with HDF5 but agree on resiliency. We are a collegiate racing team and our car's power rails aren't stable and redundant at all times, so power loss failure is something I've kept in mind from day one on my telemetry project. SQLite is generally equipped to handle power losses and write thread crashes:
> An SQLite database is highly resistant to corruption. If an application crash, or an operating-system crash, or even a power failure occurs in the middle of a transaction...
Quite often after a run the entire car is turned off and, on next power up, the databases are left as .db and .db-journal files. The code has no problem processing or even continuing on logging with DBs in this state.
HDF5 is more for storing and exchanging numerical simulation data. It doesn't have to be resilient to write failures because worst-case scenario you rerun the simulation or try again to copy the data into the file.
SQLite writes are "atomic" transactions. After writing new data, it goes back to the index and registers that new data has been written using a single instruction. That's why interrupting it in the middle of a write doesn't result in partial data or a corrupted index.
I'd love to know more about this electric race car, the telemetry system, technology behind it, etc. Do you have a blog (or can you recommend a place) where you do deeper on this subject matter?
The system is for a Formula SAE (FSAE) Electric style race car. FSAE is a collegiate competition, and I'm part of a US Pacific Northwest team. Our car uses a 550 V battery and is 4WD. Most of our electronics and firmware/software are in-house besides motors and inverters. I lead the telemetry project and work with 1-2 other members on the system.
The system consists of an on-vehicle computer (Raspberry Pi or NXP i.MX) and a ground station (Rockchip RK3588S). We use Python heavily and the real-time dashboard part is done with InfluxDB and Grafana. Both computers and any user devices are on a Wi-Fi network made with a long-range WISP access point.
Our team is unfortunately very light on public tech blogs and how I wished I can change that... There is plenty of information on FSAE electric cars, vehicle telemetry, and both on the internet and they were a source of inspiration as I designed my own solution.
Awesome. If you dont mind me asking, what sort of dashboarding/visualization software do you use? Highchart homebrewn web apps? Some closed source auto software? What's that world look like? Always been curious about it from a front-end perspective.
Real time data is fed into an InfluxDB instance and shown on Grafana hosted on the ground station. This data is downsampled to cope with the volume of raw traffic. We also tried some solutions used by other teams like KX [0] but decided to go with a FOSS stack. A post processing pipeline converts original data to a however proprietary format used by AEMdata [1], a proprietary data analysis tool from our ECU vendor and used somewhat commonly in the racing scene.
For a tenanted SAAS app SQLite at the edge is a really compelling architecture. You have a single DB per customer/company/group with all their users woking against that. It can operate at the edge, closest to where the majority of the customers users are.
SQLite could scale to even quite large customers with this.
Another really compelling architecture is a DB per user, with partial/selective sync between the nodes. If you then couple this with a "local first" design, the "edge" just becomes an other local deployment that the users db can sync against. Collaborative apps, where the users have their own documents but can also share/fork them would align well with this.
I believe starting with "local first" and using the edge for sync and "online only" modes is going to become the default for a significant number of apps moving forward. SQLite with CRDT based syncing and merge conflict resolution is the way to do this. There are a couple of exciting projects working on this:
In broad strokes having 100k SQLite dbs is a bit like having 100k git repositories. It's doable and just about having tooling to manage it.
It actually makes a few things simpler, say progressively rolling out visiting upgrades. You need to write schema migrations anyway, this lets you upgrade one customer at a time.
Yea, pretty much tooling and stuff is what I doubt scales. I'm sure a machine can handle it. It will just be very hard to manage because lack of tooling.
> Yea, but does it scale? Yes, for 100 users maybe. But you have 100 000 small sqlite db:s
If it's a case of 1 node per customer (node being vm, lambda, cloudflare worker, whatever) then there should be no limit.
Maybe you're thinking of a more "traditional" approach with a server process managing 1 sqlite db per customer, which might make sense in order to keep cloud costs down. Even in that case it should be trivially easy to distribute the load.
So store sqlite on S3. Start a new lambda (or similar) per user, download the db from sqlite and write to it and upload it again. And making sure there can never be more than 1 function running per user (to avoid two lambdas editing the same database). Seems a bit fragil but I might be wrong.
> It’s borderline impossible to compare it against networked database management systems like MySQL or Postgres, because SQLite is a library that operates on a local file — it bypasses all the costs incurred by the network, layers of serialization and deserialization, authentication, authorization, and more.
Postgres can run locally, communicating via a Unix socket. You should try benchmarking this before stating that it's so much slower than SQLite.
Even with Unix socket there will be at least one additional copy across kernel - userpace boundary compare to SQLite (and likely at least one copy inside the kernel too from one socket buffer to another). Yes, a Unix socket has lower overhead than a TCP one, but still not free:
kernel (filesystem) -> app with SQLite library
kernel (filesystem) -> Postgres -> kernel -> app with Postgres client
If Postgres reads data from own cache (not from disk) the chain will be one step shorter: Postgres -> kernel -> client, but the same true for SQLite own cache.
If mmap is used (make sense for small data sets cached in RAM) then reading data by SQLite can be zero-copy [1].
Benchmark result highly depends on a use case - neither is better for all use cases but at least for some use cases SQLite is faster.
No matter what, PostgreSQL can't (trivially) run in-memory only, and even Unix sockets has a performance cost compared to a library running in the same address space.
I've done the SQLite benchmark a few times. You can insert somewhere in the range of 10~20k rows per second if you are using NVMe (i.e. ~50uS per row). Requires some pragmas (i.e. WAL). This is 100% serialized throughput.
No clue what Postgres would manage, but I suspect it would be about an order of magnitude higher latency in the happy case.
> No clue what Postgres would manage, but I suspect it would be about an order of magnitude higher latency in the happy case.
Unless you’re talking 1 versus 10 microseconds (or less), I don’t think Postgres will have an order of magnitude higher latency. And if we are talking this range, why would it matter for a web app where the client’s latency is almost certainly >1 millisecond?
I'm pretty sure sqlite is capable of significantly higher throughput that 10-20k rows per second depending on the workload. Inserts can be much faster if they are batched in large transactions and prepared statements are used to avoid sql parsing for each row. This only works of course if your workload can be batched.
I don’t disagree with this. I disagree with whether these microseconds of difference matter when you’re serving a web page with milliseconds of latency.
Someone should benchmark that. Unix sockets in my mind should be faster as in theory you have less overhead. But the flow using TCP over loopback might have received many more optimizations over the last decades.
$ pgbench -h localhost -p 5432 -b select-only -T 60 -c 10 -j 2 bench
Password:
pgbench (15.3 (Ubuntu 15.3-1.pgdg22.04+1))
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 10370147
number of failed transactions: 0 (0.000%)
latency average = 0.058 ms
initial connection time = 54.881 ms
tps = 172993.008029 (without initial connection time)
$ pgbench -h /var/run/postgresql -p 5432 -b select-only -T 60 -c 10 -j 2 bench
pgbench (15.3 (Ubuntu 15.3-1.pgdg22.04+1))
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 2
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 16890415
number of failed transactions: 0 (0.000%)
latency average = 0.036 ms
initial connection time = 7.186 ms
tps = 281540.260418 (without initial connection time)
YMMV depending on your workload, but Unix sockets should always be significantly faster.
I understood "networked database" to mean a client-server DB architecture. So that Postgres for example is running locally, but communicating with a local server.
It's so insane that he links to a whole another post to clarify what he means by "the edge", and he struggles to state that in the first 5 paragraphs of it.
Well, the post "about the edge" also links to some announcement post there they "tried" to talk about the edge in more edgy details. Yet, it's like paragraph 67 on that 3rd post they were mention something about "the edge".
No foreign keys by default. Using them means that every connecting session must toggle it on. Forget one and it will be free to violate referential integrity.
One writer. Any session that issues BEGIN TRANSACTION and then hangs halts all dml.
WAL mode confusion. WAL cannot safely be used on network filesystems, and it breaks ACID on ATTACHed databases, among other problems.
Date and time types don't really exist. There are functions to assemble your own, but it does require some thought. The ODBC driver for SQLite does have options to "emulate" this.
Length specifications on a column are ignored. CHAR(2) will allow the insertion of a blob. I think that check constraints could be used to enforce this.
Type affinity means that any data type can be inserted into columns declared as any other data type. Rigid type enforcement can be done, but it is not the default.
I believe there is a recent (optional) strict mode that will enforce types. I haven't used it so I don't know how full-featured it is. Your other points are of course correct
Unfortunately, strict mode is incompatible with name-based type mapping: if you have a column that you want to be an RFC 3339 date-time (e.g. "2023-06-06T23:25:27+11:00"), for example, the only spelling permitted for the column type is “TEXT”, whereas without strict mode you can name it “DATETIME” or similar and popular libraries in various languages will automatically map that to the language’s datetime type (according to local conventions, not anything universally defined—e.g. https://docs.rs/sqlx/latest/sqlx/sqlite/types/index.html shows the built-in mappings the Rust sqlx library has decided to support). This destroys the ergonomics of strict mode for me, rendering strict mode quite undesirable, since it’s basically fighting with a more useful sort of strictness.
These are all fair points though they mostly can be addressed at the application layer. One could, for example, use a wrapper that opens the sqlite library and immediately turns foreign keys on. One does this once per language in your project and never think about it again. Similarly, type constraints can be enforced at the application layer.
I agree that it would be nice if some of these things were built in but at least sqlite is reliable when configured and used in particular ways.
>I want to hear "What's not great using SQLite compare to PostgreSQL" instead .
The context of this thread's article is about "edge computing" which means resource-constrained endpoints like IoT, mobile devices, cloud "workers" on CDN, etc.
Those scenarios will inherently favor a lightweight database like SQLite over a full-blown heavyweight RDBMS like Postgresql/MySQL. RDBMS engines have extra code and cpu/RAM requirements for handling multi-user concurrency, locking, etc. You don't need all that machinery for edge computing.
Another example of the above tradeoffs is smartphones like iPhone and Android. Their default persistence API framework in iOS/Android use embedded SQLite as the local backing database. It doesn't make sense for a battery-powered device to waste cpu and RAM on a multi-user Postgresql/MySQL engine when there's only a single user of a smartphone.
Personally I'm not yet convinced that the normal SQL representation for what a 'date' object is matches real world use cases that well, or at least doesn't cover all of them.
As a programmer, what I find I want is a 'moment' which retains the input specification. Possibly in a sanitized binary format that's not the literal text, but also isn't a single numeric value either. E.G. Timezone Specifier / City (not just one per TZ!) + Human timespec (5 PM Tuesday the Whatever day of Month in Year) AND a pair of evaluation rules version and 'effective UTC||TAI time' for comparisons. The evaluation version thing needn't be a full version it might even just be a couple bits at the top or bottom of a long range 64 or 128 etc time number of some units. Something that can be used to determine if a value still needs to be updated with the latest rules set rather than using the cached representation.
> Offhand, Date types are really strings in SQLite.
It’s more that there’s no such thing as a date type, but that date and time functions can work with text or numbers, in a few different interpretations.
If you load the ODBC driver version of SQLite, there is an option to transparently convert the ODBC date time expressions into Oracle style dates.
It's an option for those who want easier handling.
"When the DSN Option "JDConv" (Julian Day conversion) is enabled
the SQLite 3 driver translates floating point column data
interpreted as Julian Day to/from SQL_DATE, SQL_TIME, and
SQL_TIMESTAMP data types (supported since May 2013)."
Good point / ideas about SQL date objects... but you wrote about date in response to a comment about datatypes. Is your point that datatypes more generally may be less useful than parent implied?
SQLite is not a good choice if you want to access a DB over the network from different hosts or even from many processes on the same host (later may work good in some cases but not all). Large write (insert/update) volume is also not the best load for SQLite. And of course where replication is needed I would prefer PostgreSQL or MySQL with mature replication support to things like SQLite+litestream.
If the best engineers from 20+ years ago were told that the future of computing would be running weird packed-up assembly-javascript in colos close to users, I wonder if they would just quit immediately and move to a farm
The “edge” for data storage doesn’t really make sense.
With SQLite, the data is stored very local to where it’s being accessed.
Either move it local to the user (SQLite with WASM/OPFS does this — this is mentioned at the end of the article, but that’s not the edge), or centralize it — for SQLite this means centralizing the execution of the code accessing the data too, since it’s designed for high granularity, low latency access, which, again, isn’t the edge.
SQLite is so good and convenient that you might still use it for some cases at the edge (e.g. some read-only or precomputed, predistributed chunk of data you want to access in a flexible way) though it would never be the only option for that kind of thing.
I think the read-only use case you mention is a compelling one. I’m working on a developer product that makes it easy to prematerialize data in the way you describe.
It works by consuming data out of upstream APIs and then publishes a new, faster version of that API in edge locations. The underlying data there is stored in SQLite (plus some custom bits), but that’s just an implementation detail.
This is where we are at. SQLite took care of us for ~8 years, and will certainly continue to do in many cases moving forward. But, we are finding that as we scale up beyond customer #5, the idea of having tons of small databases scattered across our customer base becomes concerning.
I am looking at moving us towards 1 gigantic SQL Server Hyperscale DB for most things. Force our clients into a PaaS solution over time, enforce one standardized product model, etc.
Where you probably don't want to be is somewhere in the middle where half your data lives in some centralized place and the other half is in your edge/client nodes. Synchronizing across these domains can rapidly become a nightmare.
You are unfairly associating the inconvenience of "tons of small databases" with SQLite only because SQLite is good enough to make such an architecture practically plausible; it shouldn't be assumed that it is a good architecture in the first place, or that problems are SQLite's fault.
If you start having trouble "beyond customer #5", you have a problem of labor-intensive manual changes and insufficient automation, not a task that a different DBMS could do better.
I use it to provide persistence to an in memory counter server. It is written to in batches of 4096 (total 25k/second). For restarting he server, the server gets a signal, stops listening, flushes the remaining writes to SQLite, and exits. On start up, it reads the SQLite then starts processing accepted reads. It was a couple hundred lines of go and makes the server much better operationally. It is not involved in the request servicing, just this back up persistence.
Recently I got curious about SQLite in serverless settings and eventually figured out a way to combine GCP Cloud Run with Seafowl, an early stage database. SQLite has been quite helpful!
Did a write up [0] in case anyone is interested. (also about to submit it for my first Show HN.)
Interesting how little TFA says about libSQL, even though TFA is ostensibly about libSQL, and how little the comments here say (so far nothing) about libSQL. Does anyone use or care about libSQL?
well, not only lately, SQLite was always very popular
My personal opinion, is that the relational model is the best option to model data, any attempt to avoid it, replace it, mimic it adds more complexity than it solve
Linked Tables (relations) is the ultimate data modeling tool
"Arguably the main advantage of SQLite is that you don’t need to spend dozens of hours trying to set up a distributed system in order to even start using it."
Last time I set up MySql it took me around 10 minutes. I don't what this person is smoking.
Also, I think the idea of “edge” doesn’t make a ton of sense. What we really need is code and data that move around as needed for the best performance. See: https://blog.cloudflare.com/announcing-workers-smart-placeme...
What people call “edge” is a single optimization bringing code near the user. I think we should go far beyond that. Sure, use something like Cloudflare Workers for your “edge” needs (ie bringing your React app close to the end user/doing server side rendering). But don’t stop there because where data resides, what APIs you call are all going to matter.
That's the vision of something I called the "Supercloud": https://blog.cloudflare.com/welcome-to-the-supercloud-and-de...