Hacker Newsnew | past | comments | ask | show | jobs | submit | gopalv's commentslogin

> Well, if you don't fsync, you'll go fast, but you'll go even faster piping customer data to /dev/null, too.

The trouble is that you need to specifically optimize for fsyncs, because usually it is either no brakes or hand-brake.

The middle-ground of multi-transaction group-commit fsync seems to not exist anymore because of SSDs and massive IOPS you can pull off in general, but now it is about syscall context switches.

Two minutes is a bit too too much (also fdatasync vs fsync).


IOPS only solves throughput, not latency. You still need to saturate internal parallelism to get good throughput from SSDs, and that requires batching. Also, even double-digit microsecond write latency per transaction commit would limit you to only 10K TPS. It's just not feasible to issue individual synchronous writes for every transaction commit, even on NVMe.

tl;dr "multi-transaction group-commit fsync" is alive and well


> produce something as high-quality as GoT

Netflix is a different creature because of streaming and time shifting.

They don't care about people watching a pilot episode or people binge watching last 3 seasons when a show takes off.

The quality metric therefore is all over the place, it is a mildly moderated popularity contest.

If people watch "Love is Blind", you'll get more of those.

On the other hand, this means they can take a slightly bigger risk than a TV network with ADs, because you're likely to switch to a different Netflix show that you like and continue to pay for it, than switch to a different channel which pays a different TV network.

As long as something sticks the revenue numbers stay, the ROI can be shaky.

Black Mirror Bandersnatch for example was impossible to do on TV, but Netflix could do it.

Also if GoT was Netflix, they'd have cancelled it on Season 6 & we'd be lamenting the loss of what wonders it'd have gotten to by Season 9.


> For double/bigint joins that leads to observable differences between joins and plain comparisons, which is very bad.

This was one of the bigger hidden performance issues when I was working on Hive - the default coercion goes to Double, which has a bad hash code implementation [1] & causes joins to cluster & chain, which caused every miss on the hashtable to probe that many away from the original index.

The hashCode itself was smeared to make values near Machine epsilon to hash to the same hash bucket so that .equals could do its join, but all of this really messed up the folks who needed 22 digit numeric keys (eventually Decimal implementation handled it by adding a big fixed integer).

Databases and Double join keys was one of the red-flags in a SQL query, mostly if you see it someone messed up something.

[1] - https://issues.apache.org/jira/browse/HADOOP-12217


> trauma that our parents, or grandparents experienced could lead to behavior modifications and poorer outcomes in us

The nurture part of it is already well established, this is the nature part of it.

However, this is not a net-positive for the folks who already discriminate.

The "faults in our genes" thinking assumes that this is not redeemable by policy changes, so it goes back to eugenics and usually suggests cutting such people out of the gene pool.

The "better nurture" proponents for the next generation (free school lunches, early intervention and magnet schools) will now have to swim up this waterfall before arguing more investment into the uplifting traumatized populations.

We need to believe that Change (with a capital C) is possible right away if start right now.


I would think it's the opposite. Intervention is preventative of further sliding. The alternative - genocide - is expensive; they're generally a luxury of states benefiting from a theft-based windfall.


> Can you build a Linux version? :-)

Generally speaking, it is the hardware not the OS that makes it easier to build for Macs right now.

Apple Neural Engine is a sleeping giant, in the middle of all this.


Parakeet still runs at 5x realtime on a middle-of-the-road CPU; it should be quite doable (at the cost of some battery life).


> would a fixed line in India typically be above that speed?

My family lives outside of a tier 2 city border, in what used to be farmland in the 90s.

They have Asianet FTTH at 1Gbps, but most of the video/streaming traffic ends at the CDN hosts in the same city.

That CDN push to the edge is why Hotstar is faster to load there - the latency on seeks isn't going around the planet.


That is really cool, but sad to see it's only at around 15% penetration.


The useful part is that duckdb is so easy to use as a client with an embedded server, because duckdb is a great client (+ a library).

Similar to how git can serve a repo from a simple http server with no git installed on that (git update-server-info).

The frozen part is what iceberg promised in the beginning, away from Hive's mutable metastore.

Point to a manifest file + parquet/orc & all you need to query it is S3 API calls (there is no metadata/table server, the server is the client).

> Creating and publishing a Frozen DuckLake with about 11 billion rows, stored in 4,030 S3-based Parquet files took about 22 minutes on my MacBook

Hard to pin down how much of it is CPU and how much is IO from s3, but doing something like HLL over all the columns + rows is pretty heavy on the CPU.


> will try to learn more about normal sockets to see if I could perhaps make them work with the app.

There's a whole skit in the vein of "What have the Romans ever done for us?" about ZeroMQ[1] which has probably lost to the search index now.

As someone who has held a socket wrench before, fought tcp_cork and dsack, Websockets isn't a bad abstraction to be on top of, especially if you are intending to throw TLS in there anyway.

Low level sockets is like assembly, you can use it but it is a whole box of complexity (you might use it completely raw sometimes like a tickle ack in the ctdb[2] implementation).

[1] - https://news.ycombinator.com/item?id=32242238

[2] - https://linux.die.net/man/1/ctdb


> I had a friend who would drink a gallon of whole milk a day to maintain weight because he did so much at the gym.

That honestly might be an absorption issue, not an intake issue - you can hit aerobic limits enough for your body to skip digesting stuff & just shove protein directly out of the stomach instead of bothering to break it down.

My experience with this was a brief high altitude climb above 5km in the sky, where eating eggs & ramen stopped working and only glucon-d kept me out of it.

The way I like to think of it is that the fat in your body can be eaten or drank, but needs to be breathed out as CO2 to leave it.

The rate at which you can put it in and the rate of letting it go are completely different.


GOMAD (Gallon of milk a day) has been a standard weight gain diet for decades


UUIDv7 is only bad for range partitioning and privacy concerns.

The "naturally sortable" is a good thing for postgres and for most people who want to use UUID, because there is no sorted distribution buckets where the last bucket always grows when inserting.

I want to see something like HBase or S3 paths when UUIDv7 gets used.


> UUIDv7 is only bad for range partitioning and privacy concerns.

It's no worse for privacy than other UUID variants if the "privacy" you're worried about leaking is the creation time of the UUID.

As for range partitioning, you can of course choose to partition on the hash of the UUIDv7 at the cost of giving up cheaper rights / faster indices. On the other hand, that of course gives up locality which is a common challenge of partitioning schemes. It depends on the end-to-end design of the system but I wouldn't say that UUIDv7 is inherently good or bad or better/worse than other UUID schemes.


Isn't it at least a bit worse than v4, which has no timestamp at all? There might be concerns around non-secure randomness being used to generate the bits, but I don't feel like it's accurate to claim that's indistinguishable from a literal timestamp.


UUIDv4 doesn't leak creation time.


Why is it bad for range partitioning? If anything, it's better? With UUIDv7, you basically can partition on primary key, thus you can have "global" unique constraint.


confused why it would be worse for range partitioning?

I assume there would be some type of index on the timestamp portion & the uuid portion?

wouldn’t that make it better for partitioning since we’d only need to query partitions that match the timestamp portion


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: