Hacker Newsnew | past | comments | ask | show | jobs | submit | lpapez's commentslogin

If you only realized how ridiculous your statement is, you never would have stated it.

It's also literally factually incorrect. Pretty much the entire field of mechanistic interpretability would obviously point out that models have an internal definition of what a bug is.

Here's the most approachable paper that shows a real model (Claude 3 Sonnet) clearly having an internal representation of bugs in code: https://transformer-circuits.pub/2024/scaling-monosemanticit...

Read the entire section around this quote:

> Thus, we concluded that 1M/1013764 represents a broad variety of errors in code.

(Also the section after "We find three different safety-relevant code features: an unsafe code feature 1M/570621 which activates on security vulnerabilities, a code error feature 1M/1013764 which activates on bugs and exceptions")

This feature fires on actual bugs; it's not just a model pattern matching saying "what a bug hunter may say next".


Was this "paper" eventually peer reviewed?

PS: I know it is interesting and I don't doubt Antrophic, but for me it is so fascinating they get such a pass in science.


Modern ML is old school mad science.

The lifeblood of the field is proof-of-concept pre-prints built on top of other proof-of-concept pre-prints.


Sounds like you agree this “evidence” lacks any semblance of scientific rigor?

(Not GP) There was a well recognized reproducibility problem in the ML field before LLM-mania, and that's considering published papers with proper peer-reviews. The current state of afairs in some ways is even less rigourous than that, and then some people in the field feel free to overextend their conclusions into other fields like neurosciences.

Frankly, I don't see a reason to give a shit.

We're in the "mad science" regime because the current speed of progress means adding rigor would sacrifice velocity. Preprints are the lifeblood of the field because preprints can be put out there earlier and start contributing earlier.

Anthropic, much as you hate them, has some of the best mechanistic interpretability researchers and AI wranglers across the entire industry. When they find things, they find things. Your "not scientifically rigorous" is just a flimsy excuse to dismiss the findings that make you deeply uncomfortable.


This is more of an article describing their methodology than a full paper. But yes, there's plenty of peer reviewed papers on this topic, scaling sparse autoencoders to produce interpretable features for large models.

There's a ton of peer reviewed papers on SAEs in the past 2 years; some of them are presented at conferences.

For example: "Sparse Autoencoders Find Highly Interpretable Features in Language Models" https://proceedings.iclr.cc/paper_files/paper/2024/file/1fa1...

"Scaling and evaluating sparse autoencoders" https://iclr.cc/virtual/2025/poster/28040

"Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning" https://proceedings.neurips.cc/paper_files/paper/2024/hash/c...

"Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2" https://aclanthology.org/2024.blackboxnlp-1.19.pdf


> This feature fires on actual bugs; it's not just a model pattern matching saying "what a bug hunter may say next".

You don't think a pattern matcher would fire on actual bugs?


Mechanistic interpretability is a joke, supported entirely by non-peer reviewed papers released as marketing material by AI firms.

Some people are still stuck in the “stochastic parrot” phase and see everything regarding LLMs through that lense.

Current LLMs do not think. Just because all models anthropomorphize the repetitive actions a model is looping through does not mean they are truly thinking or reasoning.

On the flip side the idea of this being true has been a very successful indirect marketing campaign.


What does “truly thinking or reasoning” even mean for you?

I don’t think we even have a coherent definition of human intelligence, let alone of non-human ones.


Everyone knows to really think you need to use your fleshy meat brain, everything else is cheating.

Oh, yes. The trope of "but what does it even mean to think".

If you can't speak, can you think? Yes. Large Language model. Thinking is not predicated on language.

A few good starts for you. Please refute all of these arguments in your spare time to convince me otherwise:

* https://machinelearning.apple.com/research/illusion-of-think... * https://archive.is/FM4y8 * https://www.raspberrypi.org/blog/secondary-school-maths-show...


My point was not that I’m 100% convinced that LLMs can think or are intelligent.

My point was that we don’t have a great definition for (human) intelligence either. The articles you posted also don’t seem to be too confident in what human intelligence actually entails.

See https://en.wikipedia.org/wiki/Intelligence

> There is controversy over how to define intelligence. Scholars describe its constituent abilities in various ways, and differ in the degree to which they conceive of intelligence as quantifiable.

Given that an LLM isn’t even human but essentially an alien entity, who can confidently say they are intelligent or not?

I’m very sceptic of those who are very convinced one or the other way.

Are LLMs intelligent in the way that humans are? I’m quite sure they aren’t.

Are LLMs just stochastic parrots? I don’t find that framing convincing anymore either.

Either way it’s not clear, just check how this topic is discussed daily in most frontpage threads for the last couple of years


Amazing idea and execution, the sort of stuff I wish there was more of on HN.


Sure it could have been, if you knew about SSH packet inspectors in Wireshark...

The author didn't, and used a general tool to their aid - why is that unfortunate?


Stop right there you terrorist antifa leftie commie scum! You are being arrested for thought crime!


> So if you can sell those MT for $1-5, you're printing money.

The IF is doing a lot of heavy lifting there.

I understood the OP in the context of "human history has not produced sufficiently many tokens to be sent into the machines to make the return of investment possible mathematically".

Maybe the "token production" accelerates, and the need for so much compute realizes, who knows.


Surely you realize that OpenAI is not a publicly traded company?


nope RIP me


Slowly but surely turning into 4chan /pol/, now with country flags too!


that happened like 3 years ago THOUGH


Less headcount usually means faster pace - less lines of comminication, less red tape etc.

Or at least that's what many C-level people believe to be true. "We need to move like a startup" is a common mantra repeated by executives, even megacorps like Amazon.

I guess it's true to some degree though, anecdotally as an IC at a tech company, I feel like I could move a lot faster if some people around me removed and replaced by an automation instead.


But that's not the claim being made. It's about only slightly lowering output or even reducing output for a much larger reduction in spending.

What they're doing is "doing less with less", as they say.


Don't share your code publicly then?


> I think politically, everyone would want airlines to have working IT-systems and they would probably want to pay $100 (rationally, closer to $1000) amortized over 50 years to pay for that, but apparently humanity is just too stupid to make it work.

Not stupid, just corrupt :)

If we did this, the money would get misappropriated or stolen - most likely completely legally through overpaid consulting fees.

So clearly we should pay someone to prevent that from happening.

Wait a minute...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: