Agreed. It's pretty trivial to add a few images to your markdown. I had to hunt for the screenshots, which are full size entire desktop grabs for what is a web app -- odd.
Azure is easily the most expensive, least reliable and worst cloud available. It's borderline scam. An example today, I provisioned high IOPS SSDs (supposedly) and what is actually connected to the instance? A spinning hard drive! I didn't even know they were still made, but I guess Azure uses them and scams their users into thinking you're getting an SSD for $700/mo when its really an old hard drive.
I would warn anyone far and wide to avoid Azure at all costs, especially if you are a startup. And especially if you are doing any kind of AI because the only GPUs they have available are ancient and also crazy over-priced.
If I cared more, I'd try to migrate away from Azure. But I don't, and that's probably Azure's business model at this point.
I’d love to see proof of your claim that they provisioned a hard disk when you requested an SSD, or, at the very least, tests that showed that the IOPS you requested were not delivered. Can you show us the receipts?
Azure using SRE, I call BS. You don’t see underlying storage, it’s mounted as either SCSI or NVMe device as one HD. It’s obviously backed by massive fleet of drives just like EBS.
I was wrong about it being a spinning disk, ROTA=1 is just how Linux reports Azure virtual disks. But the underlying frustration stands: my home NVMe does the same copy in a fraction of the time because it can do 500K+ IOPS with no virtualization overhead. Azure caps this "Premium SSD" at 7,500 IOPS, so a small-file-heavy copy crawls at 85 MB/s despite 250 MB/s provisioned throughput. You're paying SSD prices for artificially throttled performance — the hardware may be SSD, but the performance is just awful. Paying $900/month for the highest level Premium SSD, attached to a large instance, and it's significantly slower than a $200 SSD from 5 years ago.
Sure the downside of virtualization is all disk calls are over the network which is way slower then local NVMe call. Upside is hardware failures are quickly handled.
The solution to this problem is for LLMs to get better at producing code and descriptions that doesn't look LLM generated.
It's possible to prompt and get this as well, but obviously any of the big AI companies that want to increase engagement in their coding agent, and want to capture the open source market, should come up with a way to allow the LLM to produce unique of, but still correct code so that it doesn't look LLM-generated and can evade these kinds of checks.
I wouldn't trust any of these benchmarks unless they are accompanied by some sort of proof other than "trust me bro". Also not including the parameters the models were run at (especially the other models) makes it hard to form fair comparisons. They need to publish, at minimum, the code and runner used to complete the benchmarks and logs.
Not including the Chinese models is also obviously done to make it appear like they aren't as cooked as they really are.
The problem with this is context. Whatever examples you provide compete with whatever content you want actually analyzed. If the problem is sufficiently complex, you quickly will run out of context space. You must also describe your response, in what you want. For many applications, it's better to fine-tune.
It's really easy to overcome that -- just sponsor some IndieDevs to flood the internet with scripts and tools to migrate all your conversations from OpenAI. Make it easy for people to switch using a simple process, make sure it's well distributed, and BOOM! Watch their user count drop like a rock. People act like just because a service has a lot of users it can't be destroyed. Anyone who has ever worked at a large web company can tell you otherwise. These things can be destroyed in a just a few days if they are targeted.
They look like fortresses from the outside, but they are all incredibly vulnerable. That's the truth they don't want people to know or realize just how vulnerable they all are.
I keep hearing people say "but as humans we actually understand". What evidence do you have of the material differences in what understanding an LLM has, and what version a human has? What processes do we fundamentally do, that an LLM does not or cannot do? What here is the definition of "understanding", that, presumably an LLM does not currently do, that humans do?
Well a material difference is we don’t input/output in tokens I guess. We have a concept of gaps and limits to knowledge, we have factors like ego, preservation, ambition that go into our thoughts where LLM just has raw data. Understanding the implication of a code change is having an idea of a desired structure, some idea of where you want to head to and how that meshes together. LLM has zero of any of that. Just because it can copy the output of the result of those factors I mention doesn’t mean they operate the same.
It's not an easy thing to box up and ship, unfortunately. Also, another problem (and another incentive to strip the hardware and stuff the column with diodes and capacitors) is that it has seen a lot of salt air exposure from a nearby reef tank sump. In short, you don't want this particular one.
reply