Also important to remember that Google is years ahead of most other AI shops in that they're running on custom silicon. This makes their inference (and maybe training) cheaper then almost any other company. People don't realize this when compared to OpenAI/Anthropic where most folks are utilizing NVIDIA GPUs, Google is completely different in that aspect with their custom TPU platform.
> Also important to remember that Google is years ahead of most other AI shops in that they're running on custom silicon.
Not just the chips, Google's entire datacenter setup seems much more mature (e.g. liquid cooling, networking, etc.). I saw some video of new Amazon datacenter (https://www.youtube.com/watch?v=vnGC4YS36gU) and it looks like a bunch of server racks in a warehouse.
Google’s datacenters are excellent, from what I’ve seen in my career. They genuinely had so many amazingly talented SMEs pushing boundaries for decades without executive intervention or deterrence, and that’s paid dividends in the subsequent tenure under Pichai and external shareholders (in that they have “infinite” runway and cash reserves to squander on moonshots before risking the company’s core businesses). That said, nothing lasts forever, and if their foray into LLMs don’t pay off, their shareholders are going to be pissed.
And not just pushing the boundaries, working with the HW vendors to define them, asking for features and design elements that others don't really even see the point of.
Anthropic uses TPUs as well as nvidia. Compiler bugs in the tooling around the platform caused most of their quality issues and customer churn this year, but I think they've since announced a big expansion in use:
Where I work, we primarily use Ceph for the a K8s Native Filesystem. Though we still use OpenEBS for block store and are actively watching OpenEBS mayastor
I looked into mayastor and the NVME-of stuff is interesting, but it is so so so far behind ceph when it comes to stability and features.
One ceph has the next generation crimson OSD with seastore I believe it should close a lot of the performance gaps with ceph.
This is a feature that’s required in Government environments. You need a check at runtime to ensure that FIPS is set or you run the risk of breaking compliance. Which leads to inevitable audits and endless meetings. I would much prefer a panic causing an issue for 30 minutes vs. endless days of meetings to set up new controls and validations that will make your life more miserable.
As someone who works for a company that’s transitioning out of the cloud into their own data centers, the supply chain factor is difficult. You have to be really good at forecasting and planning with a large upfront cost. But the savings are substantial (up to 70%)
And I wouldn't necessarily blame the developer in either scenario - they received a card that says "hey the channel file will now have an extra field in it's schema"... noone said "btw it's optional".
Calling it a "first year programming mistake" like I'm reading in some media is somewhat incendiary. I see unmarshalling errors happen all the time.
The forest that we must not miss is the kernel-level driver simply dies with no error recovery and bricks the system.
I think that’s just the nature of kernel programming. Once you’re running in kernel space, there are essentially no safety guards, which is why kernel programming is so difficult. Any faults that occur in user space causing a seg fault + core dump do not exist in kernel space. Especially since kernel code generally has to be written in C, it can be quite difficult even for the best engineers to get everything right.
Yeah, my read was that they changed an interface to include an optional parameter but never actually tested the underlying code by providing said optional parameter.
The bug in clients (sensors) wasn't due to regex, the regex was in their integration unit testing which also had a bug and was never supplying the 21st parameter to the client code.
What you would be looking for is actually the forks of redis that came about after the license change. This has existed for awhile as a redis alternative, but not a step-in replacement.
The 2 main ones are:
Valkey - run by most of the large corporations (AWS, Google, Microsoft, Alibaba, etc.) that used to have developers assigned to the Redis project doing Open source work and they just run this fork now
Redict - Another fork that seems to have quite a bit of engineers behind it