I keep wondering when this discussion comes up… If I take an apple and paint it like an orange, it’s clearly not an orange. But how much would I have to change the apple for people to accept that it’s an orange?
This discussion keeps coming up in all aspects of society, like (artificial) diamonds and other, more polarizing topics.
It’s weird and it’s a weird discussion to have, since everyone seems to choose their own thresholds arbitrarily.
I feel like these examples are all where human categorical thinking doesn’t quite map to the real world. Like the “is a hotdog a sandwich” question. “hotdog” and “sandwich” are concepts, like “intelligence”.
Oftentimes we get so preoccupied with concepts that we forget that they’re all made-up structures that we put over the world, so they aren’t necessarily going to fit perfectly into place.
I think it’s a waste of time to try and categorize AI as “intelligent” or “not intelligent” personally. We’re arguing over a label, but I think it’s more important to understand what it can and can’t do.
For YouTube, this already exists and I‘m using it. The extension is caller DeArrow and aims to reduce sensationalism via crowdsourcing, though I wouldn’t be surprised if top contributors are bots using LLMs.
Man, that before-after slider on the home page makes me so sad... YouTube used to just be random people sharing cool stuff, and those de-sensationalized titles really brought me back to that time for a second! Cool stuff.
For people like me had tried it in the past and found it annoying, note that it now has a 'casual' mode where it only changes the truly useless titles and leaves reasonable ones alone.
AI in it‘s current phase, definitely. However, we‘ve been seeing the transformer architecture plateauing in the last couple of years. There are still improvements, but open source models are catching up.
I feel like at this point it’s an inevitability that given enough time, capable models will be cheap enough for everyone.
If poor students have capable models but rich students have much better models that go the extra mile for a great mark and do everything in a single prompt, it would still be unfair.
For it to be fair, you would not only need good free models, but actual parity between free models and the highest subscription tier the big AI companies can offer. And I don't think that will happen in the short or mid term future.
When I was in AP classes in high school, you were required to have a TI-89 calculator. If you couldn't afford one, there were assistance programs.
You were not allowed to use a TI-92, which was the next step up. It had built-in solvers for many kinds of problems.
I'm not saying this isn't a concern, but addressing financially-based inequities in learning is not a new problem within certain bounds. There's established ways to deal with it. If we can get AI cheap enough that you can cover a year of education with $100 then we're in a good range.
Having a tool that instantly searches through the first 50 pages of google and comes up with a reasonable solution is just speeding up what I would have done manually anyways.
Would I have learned more about (and around) the system I‘m building? Absolutely. I just prefer making my system work over anything else, so I don’t mind losing that.
The multi week debugging sessions weren't fun, but that doesn't mean they weren't valuable and important and a growth and learning opportunity that we now will no longer experience.
IMO the more salient point is that bugs requiring multiple weeks of human work aren't going away! Claude has actually not been trained on, say, a mystifying and still poorly-explained Java concurrency bug I experienced in 2012, which cost a customer $150,000. Now in 2026 we have language-side tooling that mitigates that bug and Claude can actually help a lot with the rewrite. But we certainly don't have language tooling around the mysterious (but now perfectly well-explained) bug I experienced in 2017 around daylight saving's time and power industry peak/off-peak hours. I guess I haven't asked, but I can almost guarantee Claude would be no help there whatsoever.
Just so many confusing things go wrong in real-world software, and it is asinine to think that Mythos finding a ton of convoluted memory errors in legacy native code means we've solved debugging. People should pay more attention to the conclusion of "Claude builds a C compiler" - eventually it wasn't able to make further progress, the code was too convoluted and the AI wasn't smart enough. What if that happens at your company in 2027, and all the devs are too atrophied to solve the problem themselves?
I don't think we're "doomed" like some anti-AI folks. But I think a lot of companies - potentially even Anthropic! - are going to collapse very quickly under LLM-assisted technical debt.
Seems like there's a good argument to be made that we'll have plenty of opportunities for valuable growth and learning, just about different things. Just like it's always been with technology. The machine does some of the stuff I used to do so now I do some different stuff.
I think this is actually the correct way to move forward.
We should be able to verify facts about people on the internet without compromising personal data. Giving platforms the ability to select specific demographics will, in my view, make the web a better place. It doesn’t just let us age restrict certain platforms, but can also make them more authentic. I think it’s really important to be able to know some things to be true about users, simply to avoid foreign election interference via trolling, preventing scams and so much more.
With this, enforcement would also be increasingly easy: Platforms just have to prove that they’re using this method, e.g. via audit.
The fact that a language model can „reason“ (in the LLM-slang meaning of the term) about 3D space is an interesting property.
If you give a text description of a scene and ask a robot to perform a peg in hole task, modern models are able to solve them fairly easily based on movement primitives. I implemented this on a UR robot arm back in 2023
The next logical step is, instead of having the model output text (code representing movement primitives), outputting tokens in action space. This is what models like pi0 are doing.
I mean semantically language evolved as an interpretation for the material world, so assuming that you can describe a problem in language, and considering that there exists a solution to said problem that is describable in language, then I'm sure a big enough LLM could do it... but you can also calculate highly detailed orbital maps with epicycles if you just keep adding more... you just don't because it's a waste of time and there's a simpler way.
The latter part is interesting. I'm not sure how the performance of one of those would be once they are working well, but my naive gut feeling is that splitting the language part and the driving part into two delegates is cleaner, safer, faster and more predictable.
note that the control systems you were talking about before (i.e. PID) would probably take hold pretty directly in a tiny network, and exactly because of that limitation, be far less likely to contain 'hallucinations'. object avoidance and path planning are likely similar.
since this is a limited and continuous domain, its a far better one for neural training than natural language. I guess this notion that a language model should be used for 3d motion control is a real indicator about the level of thought going into some of these applications.
> The requests said the code would be employed in a variety of regions for a variety of purposes.
This is irrelevant if the only changing variable is the country. From a ML-perspective adding any unrelated country name shouldn’t matter at all.
Of course there is a chance they observed an inherent artifact, but that should be easily verified if you try this same exact experiment on other models.
> From a ML-perspective adding any unrelated country name shouldn’t matter at all.
It matters to humans, and they've written about it extensively over the years — that has almost certainly been included in the training sets used by these large language models. It should matter from a straight training perspective.
> but that should be easily verified if you try this same exact experiment on other models.
Of course, in the real world, it's not just a straight training process. LLM producers put in a lot of effort to try and remove biases. Even DeepSeek claims to, but it's known for operating on a comparatively tight budget. Even if we assume everything is done in good faith, what are the chances it is putting in the same kind of effort as the well-funded American models on this front?
Because Chinese companies are forced to train their LLMs for ideological conformance - and within an LLM, everything is entangled with everything.
Every bit of training you do has on-target effects - and off-target effects too, related but often unpredictable.
If you train an LLM to act like a CCP-approved Chinese nationalist in some contexts (i.e. pointed questions about certain events in Tiananmen Square or the status of Taiwan), it may also start to act a little bit like a CCP-approved Chinese nationalist in other contexts.
Now, what would a CCP-approved Chinese nationalist do if he was developing a web app for a movement banned in China?
LLMs know enough to be able to generalize this kind of behavior - not always, but often.
I keep wondering when this discussion comes up… If I take an apple and paint it like an orange, it’s clearly not an orange. But how much would I have to change the apple for people to accept that it’s an orange?
This discussion keeps coming up in all aspects of society, like (artificial) diamonds and other, more polarizing topics.
It’s weird and it’s a weird discussion to have, since everyone seems to choose their own thresholds arbitrarily.
reply