"A while" is underselling it. As long as you have people who are half-decent wit...

baq · on Sept 12, 2023

50M is something that can easily be done with just a dev who understands that sql is more than select, insert and update with the manual, google and chatgpt.

You can get really damn far with a fat postgres box.

dalbasal · on Sept 12, 2023

The problem with these discussions is that you can always get a lot out of any given architecture, structure, db approach or whatnot. Can. There's just a lot of daylight between "can" and "likely will."

Ultimately, everything has its limitations and tradeoffs. If we respect them, it's generally a smooth ride. Problem is that we rarely do... within a company (startup or otherwise) under real conditions. There's also a dynamic where we build until the point where something stops us. Tech debt, complexity, over-engineering, under-engineering, feature bloat or antagonism between early decisions and current goals.

There's a self-regulating aspect to this. If architecture is spot on, perfect for the task at hand we can move faster to reach the point where it no longer is.

baq · on Sept 12, 2023

You're right of course.

My point is that postgres is, compared to almost everything else, easy to get from "can" to "likely will" with just somebody with a brain, a manual and google. In absolute terms it of course depends, but the point is relative.

dalbasal · on Sept 12, 2023

A brain, a manual and google is usually a pretty good way generally to make solid decisions, respect the limits of your chosen stack.

What happens when there is >1 brain involved... or when brainless, manualless decisions eventually get made... or two pivots from now...

I'm not disagreeing with your approach. I agree with it, especially as starting point. I'm cautioning that resilience against complexity isn't about how easy it is to make good decisions when you understand the spec, read the manual and calmly proceed. Complexity and fragility accumulate when one or all of these are absent. How easily you can (and thus inevitably will) make a mess... not how easily you can keep it clean.

IRL situations with regular rdbs, a very common trend seems to be long term drift between schema and spec. The flexibility and approachability of postgres often enables a lot of kludge eventually.

Data stores have this dichotomy between "look how easy" and "is limiting factor" that speaks to difficulties we don't know how to articulate or isolate.

MichaelZuo · on Sept 12, 2023

Of course if there are lots of bozos in the startup then even getting to and maintaining 50 million rows is going to be very difficult.

But if you have a small group of folks that are at least as competent as the folks at WhatsApp pre-acquistion, then there really shouldn't be any doubt whatsoever.

dalbasal · on Sept 12, 2023

> a small group of folks that are at least as competent as the folks at WhatsApp pre-acquistion

That's a success case... beware survivorship bias.

> if there are lots of bozos in the startup

No arguing that quality engineers are fundamental to quality engineering. That said... by this standard, there's no point in having this entire discussion. Every good db/store out there is good. They all work very well if used as they should be, with due respect to tradeoffs. Yet, almost everyone has db problems. Almost every one of these problems occur well within the technical limits of postgres or whatnot.

"It shouldn't be a problem" when it usually is irl is tunnel vision. There is an empirical reality disagreeing with you. Walking into it with "this shouldn't be a problem unless everyone is a moron" is bad strategy. If you can't think of reasons why architecture can and will become a problem, then just assume that you (or some of you, some of the time) are morons, and try to make it moron proof.

MichaelZuo · on Sept 14, 2023

> That said... by this standard, there's no point in having this entire discussion.

Yes, I agree just repeating tautologies is unlikely to be meaningful.

szundi · on Sept 12, 2023

The real tragedy is this clever people usually talking about their usecases and those present the requirements. There is no average system out there.

cnity · on Sept 12, 2023

"There is no average system" is such a good insight.

pizza234 · on Sept 12, 2023

I've been on both the sysadmin, development, and hiring sides, and with data models at scales of 50M+ records, devs who "understand that sql is more than select [...]" are rare in my experience, as they're a cross between db admins and developers.

Administrating (in particular, query planning and production operations) databases with tables sizes with magnitude of 10M and more records is challenging, and requires a skill set that is very different from pure development.

One won't get "really damn far with a fat postgres box", unless they're doing very simple SELECTs, which is not the case with modern web apps.

lelanthran · on Sept 12, 2023

> Administrating (in particular, query planning and production operations) databases with tables sizes with magnitude of 10M and more records is challenging, and requires a skill set that is very different from pure development.

Is it more or less challenging than the alternatives? Is it less challenging enough to add a new tech to your stack, add the required knowledge to the team, etc?

I mean, knowing "enough-to-perform-CRUD" SQL is table stakes for developing on the back-end, but knowing $CURRENT-FLAVOUR-NOSQL (of which there are multiple products, all with such substantial differences that there is no knowledge transfer between using them) isn't, so there's going to be ramp-up time for every dev, and then every dev that is added to the team.

I'm not disputing your argument, I'm just pointing out that, sometimes, it's easier and faster to upskill your PostgreSQL developer to "scale the DB" than it is to teach them how to properly use, maintain, architect and code for DynamoDB and others.

dgroshev · on Sept 12, 2023

It's not just $CURRENT-FLAVOUR-NOSQL. It's also doing custom transactions and locking on top and/or thinking in terms of eventual consistency. It's so, so much more complex than just SELECT FOR UPDATE/BEGIN TRANSACTION.

baq · on Sept 12, 2023

> are rare in my experience, as they're a cross between db admins and developers.

That's fine. You only ( ;) ) need one on the team.

hnpxr · on Sept 12, 2023

It's not even funny how many software engineers just don't know that, SQL is crazy fast and performant, if you have basic understanding about it - I was once refactoring (or, rather, getting rid of) a microservice that was just a JSON blob storage on top of Postgres, without having any schema for blobs, with 100s of 1000s of them, no indices, and main complaint was - it's slow.

renegade-otter · on Sept 13, 2023

Ignorance of database basics is something I allude to in the article, but it's a larger topic that I might vent about later.

jedberg · on Sept 12, 2023

True. "While" for some large value of time. And if you set up the auto-vacuumer correctly from the start, you can go even further!

turtles3 · on Sept 12, 2023

Unpopular opinion: if you're skimping on ops/DBA resources (as you may need to do in a startup), then MySQL is a better default. By all means use postgres if your use case demands it, but personally I find the ops story for MySQL takes less engineering overhead.

PedroBatista · on Sept 12, 2023

Yes, and most successful companies who started in the last ~20 years started ( and many continue ) with a monolith and a MySQL database.

Only the mega-cap ones started to pursue other options mostly due to their type of business and bucket-loads of "free" VC money with explicit orders to burn it and get "unicorn" status - which involves hiring thousands of developers in record time and the whole thing turns into a zoo. Which is an organizational problem, mostly not a tech one.

Other than the ones we pretend are the whole Universe, there are thousands and thousands of medium to big companies with billions of revenue who started their product with a monolith and a MySQL database and many still do just that.

lenkite · on Sept 12, 2023

I agree with this "unpopular opinion". Worked with both MySQL and postgres based mid-scale apps of several thousands of users. Postgres is so deeply lauded here at HN yet requires two more magnitudes of operations work to keep it up and running. Vacuuming sucks hard.

avereveard · on Sept 12, 2023

That's not wrong, but postgres allows for cramming a lot of functionality in the db, and it's fast at following storage trend, i.e. with pgvector

turtles3 · on Sept 12, 2023

That is a very specific use case, and might only be a small subset of your actual data. If you don't have these specific requirements (eg. CRUD apps), you can save yourself a lot of unnecessary headaches by defaulting to MySQL.

My main point is attempting to counter the narrative popular on HN that postgres should be an automatic default. For sure there are many aspects in which postgres is superior, I absolutely do not debate that, especially when it comes to developer experience. But there is much more to it than that when it comes to delivering business value. That's where ops and DBA concerns start to matter, and IMO MySQL is so far ahead in this regard that it outweighs all the other hideous warts of working with it, when you consider the bigger picture of the business as a whole.

savrajsingh · on Sept 12, 2023

Yes! Vanilla MySQL and a good ORM (peewee, anyone?) gets you extremely far.

jiggawatts · on Sept 12, 2023

It’s so cute that you think 50M rows is big. Your phone can handle many times that, and update it tens of thousands of times per second.

dagw · on Sept 12, 2023

The problem isn't storing or inserting 50M rows, it querying 50M rows in non trivial ways. And the difference in performance between doing that 'right' and 'wrong' is orders of magnitude.

forgetfreeman · on Sept 12, 2023

Eh, intelligent table design should knock most of that out. If you've got an 8 page query implementing a naïve solution to the knapsack problem (I've seen this in the wild) several mistakes have been made.

omegabravo · on Sept 12, 2023

doing it on your phone only has one person querying it. Scale that to several thousand and it might appear different.

jiggawatts · on Sept 12, 2023

I've scaled 300M rows in just one of many similarly sized tables to 1M users... in 2007... on a single box with spinning rust in it. Heck, my laptop could handle 100x the production load.

It amazes me that my comment (while admittedly flippant) got voted down.

It really is true that your phone can update a 50M row table about 10K times per second!

That people are incredulous of this is in itself a stunning admission that developers these days don't have the faintest idea what computers can or cannot actually do.

Just run the numbers: 50M rows with a generous 1 KB per row is 50 GB. My iPhone has 1TB of flash storage that has a random access latency of something like 50 microseconds, which equates to 200K IOPS. An ordinary NVMe laptop SSD can now do 2M. Writing even 10K random locations every second is well within mobile device capability, with 50% headroom to "scale". At 1 KB per row, this is just 10 MB/s, which is hilariously low compared to the device peak throughput of easily a few GB/s.

sethammons · on Sept 12, 2023

I believe you are downvoted due to the "how cute" part, not the scale part. You could have delivered useful info without a put down

quyse · on Sept 12, 2023

It's not that good usually, e.g. PostgreSQL writes data in pages (8 KB by default), and changing 10K random rows in the 50M rows table can be quite close to the worst case of 1 changed page per changed row, so 8x of your estimate. Also need to multiply x2 to account for WAL writes. Also indexes. It's not hard to hit a throughput limit, especially with HDDs or networked storage. Although local SSDs are crazy fast indeed.

jiggawatts · on Sept 12, 2023

Agreed: 80MB/s for the random 8K page updates. However, transaction logs in modern databases are committed to disk in batches, and each log entry is smaller than a page size. So a nice round number would be 100 MB/s for both.[1]

For comparison, that's about 1 gigabit per second in the era of 200 Gbps networking becoming common. That's a small fraction of SSD write throughput of any modern device, mobile or not. Nobody in their right mind would use HDD storage if scaling was in any way a concern.

[1] Indexes add some overhead to this, obviously, but tend to be smaller than the underlying tables.

peteradio · on Sept 12, 2023

> spinning rust

That's your reason right there, you used rust! It's both performant and secure!

n0us · on Sept 12, 2023

> As long as you have people who are half-decent with SQL

:/