I'm always surprised by the number of people posting here that are dismissive of...

mvieira38 · 2025-07-18T22:52:06 1752879126

Your examples are not LLMs, though, and don't really behave like them at all. If we take the chess analogy and design an "LLM-like chess engine", it would behave like an average 1400 London spammer, not like Stockfish, because it would try to play like the average human plays in it's database.

It isn't entirely clear what problem LLMs are solving and what they are optimizing towards... They sound humanlike and give some good solutions to stuff, but there are so many glaring holes. How are we so many years and billions of dollars in and I can't reliably play a coherent game of chess with ChatGPT, let alone have it be useful?

charcircuit · 2025-07-19T09:15:02 1752916502

>because it would try to play like the average human plays in it's database.

Why would it play like the average? LLMs pick tokens to try and maximize a reward function, they don't just pick the most common word from the training data set.

throw310822 · 2025-07-18T23:01:25 1752879685

Maybe you didn't realise that LLMs have just wiped out entire class of problems, maybe entire disciplines- do you remember "natural language processing"? What, ehm, happened to it?

Sometimes I have the feeling that what happened with LLMs is so enormous that many researches and philosophers still haven't had time to gather their thoughts and process it.

I mean, shall we have a nice discussion about the possibility of "philosophical zombies"? On whether the Chinese room understands or not? Or maybe on the feasibility of the mythical Turing test? There's half a century or more of philosophical questions and scenarios that are not theory anymore, maybe they're not even questions anymore- and almost from one day to the other.

jpc0 · 2025-07-18T23:31:28 1752881488

> do you remember "natural language processing"? What, ehm, happened to it

There’s this paper[1] you should read, is sparked an entire new AI dawn, it might answer your question

1. https://arxiv.org/abs/1706.03762

mvieira38 · 2025-07-19T01:48:33 1752889713

How is NLP solved, exactly? Can LLMs reliably (that is, with high accuracy and high precision) read, say, literary style from a corpus and output tidy data? Maybe if we ask them very nicely it will improve the precision, right? I understand what we have now is a huge leap, but the problems in the field are far from solved, and honestly BERT has more use cases in actual text analysis.

"What happened with LLMs" is what exactly? From some impressive toy examples like chatbots we as a society decided to throw all our resources into these models and they still can't fit anywhere in production except for assistant stuff

throw310822 · 2025-07-19T08:24:56 1752913496

> Can LLMs reliably (that is, with high accuracy and high precision) read, say, literary style from a corpus and output tidy data?

I think they have the capability to do it, yes. Maybe it's not the best tool you can use- too expensive, or too flexible to focus with high accuracy on that single task- but yes you can definitely use LLMs to understand literary style and extract data from it. Depending on the complexity of the text I'm sure they can do jobs that BERT can't.

> they still can't fit anywhere in production

Not sure what do you mean for "production" but there's an enormous amount of people using them for work.

bigyabai · 2025-07-18T22:02:21 1752876141

People assume (rightly so) that the progress in AI should be self-evident. If the whole thing is really working that great, we should expect to see real advances in these fields. Protein-folding AI should lower the prices of drugs and create competitive new treatments at an unprecedented rate. Photo and video AI should be enabling film directors and game directors to release higher-quality content faster than ever before. Text AI should be spitting out Shakespeare-toppling opuses on a monthly basis.

So... where's the kaboom? Where's the giant, earth-shattering kaboom? There are solid applications for AI in computer vision and sentiment analysis right now, but even these are fallible and have limited effectiveness when you do deploy them. The grander ambitions, even for pared-back "ASI" definitions, is just kicking the can further down the road.

TheBicPen · 2025-07-18T22:24:41 1752877481

The kaboom already happened on user-generated media platforms. YouTube, Facebook, tiktok, and so on are flooded with AI-generated videos, photos, sounds, and so on. The sheer volume of this low-quality slop is because AI lowered the barrier of entry for creating content. In this space the progress is not happening through pushing the upper bound of quality higher but by reducing the cost for minimal quality to down to near-0.

mvieira38 · 2025-07-18T23:00:46 1752879646

Another perspective for the kaboom is search and programming tasks for the average person.

For the average consumer, LLM chatbots are infinitely better than Google at search-like tasks, and in effect solve that problem. Remember when we had to roll our eyes at dad because he asked Google "what are some cool restaurants?" instead of "nice restaurants SF 2018 reddit"? Well, that is over, he can ask that to ChatGPT and it will make the most effective searches for him, aggregate and answer. Remember when a total noob had to familiarize himself with a language by figuring out hello world, then functions, etc? Now it's over, these people can just draft a toy example of what they want to build with Cursor instantly, tell it to make everything nice and simple, and then have ChatGPT guide them through what is happening.

In some industries you just don't need that much more code quality than what LLMs give you. A quick .bat script doesn't need you to know the best implementation of anything, neither does a Python scraper using only the stdlib, but these were locked behind programming knowledge before LLMs

dontlaugh · 2025-07-19T23:49:42 1752968982

So it’s either useless or harmful? Why should we be excited then?

tim333 · 2025-07-21T15:01:21 1753110081

It's like the web in 1998. Everyone could see the potential but nothing actually worked very well.

bwfan123 · 2025-07-19T02:09:37 1752890977

> I'm always surprised by the number of people posting here that are dismissive of AI and the obvious unstoppable progress

Many of us have been through previous hype-cycles like the dot-com boom, and have learned to be skeptical. Some of that learning has been "reinforced" by layoffs in the ensuing bust (reinforcement learning). A few claims in your note like "it's only a matter of time before we have domain-specific ASI" are jarring - as you are "assuming the sale". LLMs are great as a tool for some usecases - nobody denies that.

The investment dollars are creating a class of people who are fed by those dollars, and have the incentive to push the agenda. The skeptics in contrast have no ax to grind.

overgard · 2025-07-18T22:13:33 1752876813

We need to stop calling what we have AI. LLMs can't reliably reason. Until they can the progress is far from unstoppable.

kadushka · 2025-07-19T05:24:27 1752902667

I love it how people are transitioning from “LLMs can’t reason” to “LLMs can’t reliably reason”.

overgard · 2025-07-20T20:13:27 1753042407

Well, I was hedging a bit because I try to not overstate the case, but I'm just as happy to say: LLM's can't reason. Because it's not what they're built to do. They predict what text is likely to appear next.

But even if they can appear to reason, if it's not reliable, it doesn't matter. You wouldn't trust a tax advisor that makes things up 1/10 times, or even 1/100 times. If you're going to replace humans, "reliable" and "reproducible" are the most important things.

kadushka · 2025-07-21T03:36:25 1753068985

Frontier models like o3 reason better than most humans. Definitely better than me. It would wipe the floor with me in a debate - on any topic, every single time.

charleshn · 2025-07-19T10:23:29 1752920609

Frontier models went from not being able to count the number of 'r's in "strawberry" to getting gold at IMO in under 2 years [0], and people keep repeating the same clichés such as "LLMs can't reason" or "they're just next token predictors".

At this point, I think it can only be explained by ignorance, bad faith, or fear of becoming irrelevant.

[0] https://x.com/alexwei_/status/1946477742855532918

bwfan123 · 2025-07-19T15:22:42 1752938562

> At this point, I think it can only be explained by ignorance, bad faith, or fear of becoming irrelevant.

Based on the past history with frontier-math & AIME 2025 [1],[2] I would not trust announcements which cant be independently verified. I am excited to try it out though.

Also, the performance of LLMs was not even bronze [3].

Finally, this article shows that LLMs were just mostly bluffing [4].

[1] https://www.reddit.com/r/slatestarcodex/comments/1i53ih7/fro...

[2] https://x.com/DimitrisPapail/status/1888325914603516214

[3] https://matharena.ai/imo/

[4] https://arxiv.org/pdf/2503.21934

yeasku · 2025-07-20T01:41:53 1752975713

Open AI is 10 years old and and llm just told me a dolar is 1.03 euros.

kadushka · 2025-07-20T02:02:01 1752976921

I don’t know which llm you used - I just asked gpt-4.1 - it did a web search and provided the correct exchange rate. It took about 5 seconds.

yeasku · 2025-07-20T20:44:29 1753044269

That is what we did after chat gpt failed, a web search.

It took us about 5 seconds.

oytis · 2025-07-18T21:57:58 1752875878

It's very different from chess etc. If we could formalise and "solve" software engineering precisely, it would be really cool, and probably indeed just lift programming to a new level of abstraction.

I don't mind if software jobs move from writing software to verifying software either if it makes the whole process more efficient and the software becomes better as a result. Again, not what is happening here.

What is happening, at least in AI optimist CEO minds is "disruption". Drop the quality while cutting costs dramatically.

charleshn · 2025-07-18T22:04:28 1752876268

I mentioned algorithms, not software engineering, precisely for that reason.

But the next step is obviously increased formalism via formal methods, deterministic simulators etc, basically so that one could define an environment for a RL agent.

puchatek · 2025-07-18T22:51:27 1752879087

It's unlikely that LLMs are gonna get us there though. They ingested all relevant data at this point at the net effect might very well kill future sources of quality data. How is e.g. stackoverflow gonna stay alive if the next generation of programmers relies mainly on copilot and vibe coding? And what will the LLMs scrape once it's gone?

bigyabai · 2025-07-18T22:17:02 1752877022

I'll bet you $1,000*10^32 that AI never formalizes a novel FFT algorithm worth more than a dime.

yeasku · 2025-07-20T01:44:58 1752975898

Open AI failed to create a bot that can play Dota. Llms cant learn complex heuristics.

tim333 · 2025-07-21T15:03:39 1753110219

I'm surprised too. You'd think tech people would understand what's going on. But of the prior replies to your comment 7 out of 8 seem dismissive. I'm in the "obvious unstoppable progress" camp but we seem to be a minority.

I guess maybe it isn't that obvious - I've read quite a lot in the area. People saying LLMs aren't very good are a bit like people long ago saying chess programs aren't very good. It was true but there was an inevitable advance as the hardware got better and then that led to enthusiasm to improve the software and computers became better than humans in a rather predictable way. It's driven in the end by hardware improvements. Whether the software is LLM or some other algo is kind of unimportant.

Tainnor · 2025-07-19T11:10:21 1752923421

Mathematics cannot be "solved", that's a consequence of Gödel's First Incompleteness Theorem.

It can already be "cheaply verified" in the sense that if you write a proof in, say, Lean, the compiler will tell if you if it's valid. The hard part is coming up with the proof.

It may be possible that some sort of AI at some stage becomes as good, or even better than, research mathematicians in coming up with novel proofs. But so far it doesn't look like it - LLMs seem to be able to help a little bit with finding theorems (e.g. stuff like https://leansearch.net/), but to my understanding they are rather poor beyond that.

charleshn · 2025-07-19T10:26:54 1752920814

You can now add getting gold at IMO [0] to the above list.

[0] https://x.com/alexwei_/status/1946477742855532918

bwfan123 · 2025-07-19T15:01:36 1752937296

on the surface this is a great achievement - if it holds . alpha-geometry required 1) human formalization of the question and 2) a solver for geometry

If the questions were given as-is (without a human formalizing it) and the llm didnt need domain solvers, and the llm was not trained on it already (which happened with frontier math) - I would be impressed.

Based on the past history with frontier math [1][2] I remain skeptical. The skeptic in me says that this happens prior to big announcements (GPT-5) to create the hype.

Finally, this article shows that LLMs were just bluffing in the usamo 2025 [3].

[1] https://www.reddit.com/r/slatestarcodex/comments/1i53ih7/fro...

[2] https://x.com/DimitrisPapail/status/1888325914603516214

[3] https://arxiv.org/pdf/2503.21934

rcpt · 2025-07-18T23:26:07 1752881167

Have you ever seen a company say "welp, we wrote all the code. Now we're done?"