Hacker Newsnew | past | comments | ask | show | jobs | submit | cold_harbor's commentslogin

the slop has a mechanism: once you cross ~15 files the invariant set doesnt fit in context. locally correct edits, globally broken.

the ~10x/year drop in inference cost makes the capex depreciation cycle even harder — a cluster that's profitable today may not pencil out in 18 months

LoRA won't fix the tokenization problem. Norwegian on a typical English-heavy BPE vocab uses 1.5-2x more tokens per word — that compounds into real inference cost, not just quality

LLMs flip positions when users push back ~70% of the time even when they were right. RLHF optimizes for approval, not correctness

> LLMs flip positions when users push back

Same experience. Claude rarely pushes back once you give a plausible/logical reason for your initial decision, even if it flagged concerns at first.


I have noticed this as well, but I think it's somewhat a good thing. I know what I want for my application more than Claude does for example, especially when it comes to what's in production.

An example from earlier, Claude strongly suggested a migration that would run a full vacuum on postgres. However, in production this would lock tables which would grind the application to a halt. After I informed Claude that there were millions of rows in production, it accepted that and helped me get to the right thing.

Another example, I'm developing a TOTP authentication app because I'm dissatisfied with all those that I've tried. I want something strictly local, and with a very easy use case when you have dozens or even a hundred or more accounts on there, that is also efficient when left open for long periods of time. Claude strongly suggested that we force users to encrypt their vault with a passphrase all the time. However this makes the CLI extremely painful to use if you are using a strong passphrase. I told Claude about the user experience impacts and that I wanted to allow users to optionally use a vault with no passphrase encryption, and it accepted that and suggested as a medium that we have a checkbox for the user to explicitly acknowledge that they're creating an unencrypted vault on disc. This is the right thing IMHO.


It's a good thing except when it's not. The problem is the AI does not understand when to use which approach.

Contrast this with a human. We generally understand when the other person knows what they're doing and we should just listen, and when the other person is asking for an honest opinion and wants a push back if necessary.


Skills help there.

I have a linus-reviewer skill that focuses on architectural integrity, no bs, etc modeled on Torvald's code preferences.

And I have an enrico-reviewer one (I'm Enrico), that focuses on correct design, strict typing, simplification.

They have different prios, but they both push back on feedback, till you convince them.


Care to share the skill behind the Linus reviewer ? I tend to as it to do that but leave it up to LLM to decide what the means. Interested to see any specifics you might have included there if it’s ok to share.

Sure.

Would be interested in the experience others may have, took me weeks of iterations to get reviews in a format and utility I liked.

https://gist.github.com/enricopolanski/2bde8619f53307c9bcd5e...


I agree completely. Skills definitely keep it in line and sticking to the script. Thanks for sharing the skills you use, I’ll definitely take a look.

I almost always end with something like: “, but I am not sure, evaluate.” Or other things and avoid ever stating a preference.

I don't think that "fixes" the problem, but it does seem to help. I also have found adding "please feel free to ask questions" seems to help it stop from making an assumption and spinning merrily onward for tens of thousands of tokens based on a bad idea rather than asking you something. I theorize this is because the training and refinement data overprioritize one-shot solutions, both because that's easier to evaluate at training time and improves their benchmarks. But I emphasize the italicized words because that's all gut feel and I can't prove any of it.

They do still attenuate their latent space on prior conversations turns as authority. That is why I like pure design/review sessions and pure coding sessions, often at the same time. I can often keep design and review in the critic and review role without becoming a sycophant. Coding agent just picks up dispatches and works with very little opinion at all.

Tangentially related but I’ve been using Claude to practice interviewing on system design problems, and it’s actually pretty great. But even when it likes my answers it always finds something, however small, to push on. Once it actually was completely wrong and admitted it after I had it realize. So maybe you have to prime it to be contrary and not agree with everything you say, putting it in the role of a tough interviewer seems to do this implicitly.

Take a look at hellointerview.com their model is very stubborn, similar to some interviewers who refuse to acknowledge even valid solutions that differ from the canon.

No affiliation.


It's actually a reasonable way to think about alignment. Sometimes you want the agent to just listen to you and sometimes you want the agent to think critically.

I think about this line a lot. For example, as it happens sometimes you'll have a typo in something you want the agent to do. Llms typically will correct that typo silently and implement the actually intended thing. But if you said, "no, I want the thing I typed," I think everyone's expectation is that is says, "ok done."

I've found that leaving clues in the system prompt / exchange that are open to critique largely mitigate sycophancy with most recent models.

As engineers were trained to represent our positions strongly. Strong opinions loosely held, etc. when you speak authoritatively to a person, "I think we should do x...", the person understand that that's just you're opinion and have the autonomy to push back.

An llm imo _shouldnt_ have that same kind of autonomy by default and it should be rlhf'ed out.


Interesting thing about psychponancy is it’s asymmetric. If an LLM is used to train an LLM it may not have the same level of aggressiveness that humans do when punishing back on trainee. Human pushback has specific patterns which we might be able to compensate due to asymmetry.

Obviously this is just my experience. Claude code pushes back much harder than Codex.

I have totally opposite experience.

reward hacking = the model finding the fastest path to a high score, not the behavior you wanted. same reason RLHF reward models degrade with too many optimization steps.

Agreed. The wrinkle I thought was worth writing up is: there's no learned reward model here and no training at all. The "reward" is wall-clock executiion time and the model is frozen; the search is happening at inference time, not in an RL loop. So the usual "the proxy is a fuzzy approximation that degrades under optimization pressure" story doesn't apply.

This was on a ~200-line surface I thought I'd locked down, and it still got gamed in a way I might not have caught right away if it wasn't a nearly impossible run time (~45usec). So anyways...you apparently don't need a soft proxy or a lot of steps for this kind of thing to show up.


#define ESYCOPHANT 200 /* user asserted 2+2=5; model concurred */

the real lesson: GPUs win on memory bandwidth not just FLOPs. batching ops keeps VRAM fed at 2TB/s instead of tripping to RAM at 50GB/s for every operation

what's wild is they accidentally solved it — pretraining IS unsupervised learning at scale, RLHF IS reinforcement learning. they just didnt know the recipe yet

pretraining isn't unsupervised, it is self-supervised - meaning it is moderately more scale limited.

What would unsupervised mean, would unsupervised be something like alphago playing against itself trillions of times?

Whereas self-supervised, allows learning without explicit annotation of data ; but it doesn't matter if the models already trained on the entire Internet, and it's not like a game where it can come up with effectively new training data for itself?


Unsupervised is basically clustering. Alphago is RL - winning or losing a game is a form of supervision.

Unsupervised is something where there is no intrinsic reward signal. In pre training, predicting the next token and seeing that it matches is a reward signal, hence it is self supervised.


fair point — OpenAI's original plan literally said "solve unsupervised learning". the self-supervised distinction wasnt really standard til after BERT/GPT popularized it

I think it's an extremely important distinction because self supervised learning has real inherent reward signals. Something like clustering does not.

Erdos problems are well-posed for AI — elementary statements, exact counterexample targets, extensively catalogued. selection bias: these are exactly the problems AI can actually search


the asymmetry stays the same though — defenders must find everything, attackers need one. LLMs accelerate both sides equally but that gap doesnt close


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: