As in, I had a physical CD I had purchased, ripped to MP3, and loaded onto my iPhone.
iTunes recognised it, linked it to the matching official entry in their music store, they lost the licence and deleted all customer copies including mine.
Hundreds of millions of businesses (and individuals) transacted $83 billion to Microsoft just last quarter, so clearly they're doing something right.
Any "big enough" organisation will eventually do something stupid, disgraceful, or even illegal. Once you have over a hundred thousand staff, there's just no way to guarantee that they all row in the same direction and nobody gives in to the temptation to cut corners or outright cheat.
If you think you can judge the entire rest of an organisation by a few bad actors within it, you'll be perpetually disappointed.
The "space economy" is not yet a certainty, other than in the mind of science fiction fans. (Unsurprisingly, hard to reach irradiated rocks of undifferentiated boring minerals in a cold vacuum are of negligible value to humans here on Earth.)
Even if the Star Trek utopian future materialises, it is very likely to be a long time from now.
1. SpaceX has competitors. Most are making reusable rockets.
2. SpaceX has no moat.
3. The concept of money itself might change dramatical by the time SpaceX becomes a multi-planetary mega corporation. Investing now may not return returns in any meaningful sense.
True, and that's exactly the reason why people want to buy this stock now.
If future returns were already (almost) certain, they would have been priced in and you couldn't make any money with this stock.
This is a classic high risk / high reward stock. IF the space economy takes off you might 10X your investment. If it doesn't, you might lose most of it.
Rich people (who own most of the stock market) can afford to make such high risk bets, because they can afford to lose the money and thus many will make that bet.
Communism, or more accurately, mechanised collective farming practices in the early 1900s in Russia resulted in revolutions and world wars. When tens of millions of inefficient farmers were replaced by tractors needing only a fraction of the labour force the excess population was disposed of.
Sorry, bad phrasing!
They were put to work in new roles enabled by technological advancements:
wielding mass manufactured rifles and operating artillery.
This has played out over and over throughout history whenever a large fraction of the population suddenly becomes surplus to requirements.
They never get to enjoy utopia. They are expended in warfare or low value forced labour until the labour pool once again matches the requirements.
You don't even need to look at the Soviets. Life for the average person in Britain became worse between 1760 until about 1920. That meant about 3 generations of people were lost.
I'm super happy about this idilic AI future my great grandchildren will enjoy...
Many of the rows in that spreadsheet reference "current events", which models aren't expected to do much better at than a human making an educated guess! They all have cutoff dates either last year or early this year and know nothing about what happened in "April 2026".
This is doubly problematic because you evaluated earlier models like Gemini Pro 3 instead of 3.1, GPT 5.4 instead of 5.5, etc...
Given that it's only a thousand short questions, you should be able to re-run your test in about an hour with the latest models, so... why haven't you?
Similarly, LLM output is non-deterministic, so if you could get more interesting stats of your data set by repeating each question 'n' times for each model.
Comparing models with search tools to models without - when there's no option for "I am unable to answer this question without access to search" - doesn't make sense to me.
Agree about comparing models with and without search capabilities. Even the two models with search capabilities (Sonar Pro and Gemini) agree only on 58% of the claims.
The title mention "fact-checks", but "fact checking" is a process in which facts are checked against sources, not one where you are given a random fact and have to tell if it's true or false from your own memory. That's what is normally called a quiz game. So a more honest title for this research would be "Models answer differently to quiz questions".
The Grenfell fire was caused by petty corruption. Someone involved in its construction used a cheaper flammable cladding material instead of the (slightly!!) more expensive fire resistant version.
It’s very on-brand for places like Russia and China but clearly western countries are not immune to this kind of thing either.
After the fire there were investigations into towers constructed here in Australia. Many used the cheaper flammable cladding material also. Just like with Grenfell, nothing much was done and nobody went to prison.
What does that have to do with the actual idea that being in a tall building could make it difficult to escape. It doesn't matter if the cause of the disaster is cheap building materials or an external force acting on a properly built building.
The funny thing is that you've just described an idealised development process as would be used by effective, skilled humans in a heterogenous team where everyone has a speciality.
If only things were so! If only code was discussed, reviewed, iterated on! If only the "manager" actually read the code, provided actionable feedback, and disseminated PRs to multiple people with diverse skill sets.
(If you can't tell, I'm a jaded consultant desperately trying to make the horse drink the water.)
I've worked in large teams for many years and yes it's just like that, but without the time constraint. PR's can only go back and forth so many times. Depending on the reviewer they may phone it in, or focus on different things depending on the person. You yourself aren't able to implement every piece of feedback due to constraints and it ends up as tech debt.
So AI definitely changes the game. I feel like we almost need something higher level for reviewers to review changes faster. Todays code is starting to feel like assembler. Too much of it, too low level. We need even higher level constructs to be able to more in less time. I'm just not sure what that is.
I just had a thought: is there some API so obscenely baroque and painful to use that even AIs would flatly refuse to work with them?
It would be an interesting exercise to keep feeding a coding agent ever crazier interface designs until it cracks.
“The base64 of the rot13 encrypted EBCDIC string has to be included in a JSON in the XML SOAP request, but both the JSON and XML escaping is manual and incorrect...”
"...but first split the string into chunks no bigger than 64 bytes and spread the request amongst HTTP headers instead of the POST body. Reassemble by trying every possible ordering until one passes the decoding steps."
>I just had a thought: is there some API so obscenely baroque and painful to use that even AIs would flatly refuse to work with them?
Copilot Studio. It's painful to try to set up any sort of logic within Copilot Studio. Worse if you're not on the most bleed-edging-new machine with overkill levels of ram. So I had a thought... why am I doing this when I have Claude with absolutely no quotas?
Turns out, there's just no way to drive it from Claude. It first started with the pac command line tool, but that's agonizingly broken. Tried to use Chrome next, but even it can't navigate that UI from the browser (neither could I, you'd click and sometimes the response occurs 10 seconds later). Copilot Studio is the quintessential Microsoft technology. Shortly after, Claude began experiencing what I can only call schizophrenic symptoms. It imagined that every time I queried it that there were embedded hacking attempts in my reply and that soon spread to every conversation I had with it even in new chats.
I’ve been trying for the last few weeks to get a really solid local model workflow going, and every single tool I use feels hostile af, whereas the stuff work pays for, it all “just works” together. It really irritates me.
As in, I had a physical CD I had purchased, ripped to MP3, and loaded onto my iPhone.
iTunes recognised it, linked it to the matching official entry in their music store, they lost the licence and deleted all customer copies including mine.
reply