Hacker Newsnew | past | comments | ask | show | jobs | submit | exfalso's commentslogin

It's failing when there is no data in the training set, and there are no patterns to replicate in the existing code base.

I can give you many, many examples of where it failed for me:

1. Efficient implementation of Union-Find: complete garbage result 2. Spark pipelines: mostly garbage 3. Fuzzer for testing something: half success, non-replicateable ("creative") part was garbage. 4. Confidential Computing (niche): complete garbage if starting from scratch, good at extracting existing abstractions and replicating existing code.

Where it succeeds: 1. SQL queries 2. Following more precise descriptions of what to do 3. Replicating existing code patterns

The pattern is very clear. Novel things, things that require deeper domain knowledge, coming up with the to-be-replicated patterns themselves, problems with little data don't work. Everything else works.

I believe the reason why there is a big split in the reception is because senior engineers work on problems that don't have existing solutions - LLMs are terrible at those. What they are missing is that the software and the methodology must be modified in order to make the LLM work. There are methodical ways to do this, but this shift in the industry is still in baby shoes, and we don't yet have a shared understanding of what this methodology is.

Personally I have very strong opinions on how this should be done. But I'm urging everyone to start thinking about it, perhaps even going as far as quitting if this isn't something people can pursue at their current job. The carnage is coming:/


Nope. Especially with these agents the thinking trace can get very large. No human will ever read it, and the agent will fill up their context with garbage trying to look for information.

I understand the drive for stabilizing control and consistency, but this ain't the way.


This is a terrible idea


There's a fun hypothesis I've read about somewhere, goes something like this:

As the universe expands the gap between galaxies widens until they start "disappearing" as no information can travel anymore between them. Therefore, if we assume that intelligent lifeforms exist out there, it is likely that these will slowly converge to the place in the universe with the highest mass density for survival. IIRC we even know approximately where this is.

This means a sort of "grand meeting of alien advanced cultures" before the heat death. Which in turn also means that previously uncollided UUIDs may start to collide.

Those damned Vogons thrashing all our stats with their gazillion documents. Why do they have a UUID for each xml tag??


It is counter intuitive but information can still travel between places that are so distant that expansion between them is faster than the speed of light. It's just extremely slow (so I still vote for going to the party at the highest density place).

We do see light from galaxies that are receding away from us faster than c. At first the photons going in our direction are moving away from us but as the universe expands over time at some point they find themselves in a region of space that is no longer receding faster than c, and they start approaching.


That's not exactly it. Light gets redshifted instead of slowing down, because light will be measured to be the same speed in all frames of reference. So even though we can't actually observe it yet, light traveling towards us still moves at c.

It's a different story entirely for matter. Causal and reachable are two different things.

Regardless, such extreme redshifting would make communication virtually impossible - but maybe the folks at Blargon 5 have that figured out.


I think I missed something: how do galaxies getting further away (divergence) imply that intelligent species will converge anywhere? It isn’t like one galaxy getting out of range of another on the other side of the universe is going to affect things in a meaningful way…

A galaxy has enough resources to be self-reliant, there’s no need for a species to escape one that is getting too far away from another one.


You'll run out of resources eventually. Moving to the place with the most mass gives you the most time before you run out.


Yes that's the idea. The expansion simply means that the window of migration will close. Once it's closed, your galaxy is cut off and will run out of fuel sooner than the high-density area.


Well eventually there are no galaxies just a bunch of cosmic rays. Some clusters of matter will last longer.

I think for this to work, either life would have to plentiful near the end, or you’d need FTL travel.


Social aspect. There is no need but it's more fun to spend the end of the Universe with other intelligences than each in its own place.


I think I sense a strange Battle Royale type game…


Assuming these are advanced enough aliens, they'll also be bringing with them all the mass they can, to accentuate the effect? I'm imagining things like Niven's ringworld star propulsion.


> Maybe when AIs are able to say: "I don't know how this works" or "This doesn't work like that at all." they will be more helpful.

Funny you say that, I encountered this in a seemingly simple task. Opus inserted something along the lines of "// TODO: someone with flatbuffers reflection expertise should write this". I actually thought this was better than I anticipated even though the task was specifically related to fbs reflection. And it was because I didn't waste more time and could immediately start rewriting it from scratch.


I have the same experience and still use it. It's just that I learned to use it for simplistic work. I sometimes try to give it more complex tasks but it keeps failing. I don't think it's bad to keep trying, especially as people are reporting insane productivity gains.

After all, it's through failure that we learn the limitations of a technology. Apparently some people encounter that limit more often than others.


> I have the same experience and still use it. It's just that I learned to use it for simplistic work.

OP said "I don't see the productivity boost from AI" and that they don't "believe the hype" without any qualification, but then went on to say that they use it every day. This makes no sense to me.

Isn't this like saying "I don't get anything out of reading books" immediately followed by "I read books for 4 hours every night"?


to be fair, every other comment is usually screaming about how if you aren't able to utilize LLMs effectively, you will be without a job soon. most people want to keep their job, or be employable, so if LLMs are a required tool to know, they're trying to become fluent in it by using it.


> to be fair, every other comment is usually screaming about how if you aren't able to utilize LLMs effectively, you will be without a job soon

I think a lot of these "it's all overhyped crap" posts are a hypocritical.

If someone wants to be consistent with their "it's crap" argument, they wouldn't be using it for anything. Period.

If someone says they need it for their job, then they are admitting that it's useful for their job. Because it would otherwise be irrational to use a tool that makes them worse at their job.


Perhaps the out of job prediction is actually reversed. True, LLMs will become an efficiency increasing tool. But in terms of job security, doesn't that mean that if your whole job can be driven by an LLM then demand for that job decreases?

In other words, people claiming these high productivity increases may be the ones at actual risk. Why employ 3 people when 1 can write the prompts?


Exact same experience.

Here's what I find Claude Code (Opus) useful for:

1. Copy-pasting existing working code with small variations. If the intended variation is bigger then it fails to bring productivity gains, because it's almost universally wrong.

2. Exploring unknown code bases. Previously I had to curse my way through code reading sessions, now I can find information easily.

3. Google Search++, e.g. for deciding on tech choices. Needs a lot of hand holding though.

... that's it? Any time I tried doing anything more complex I ended up scrapping the "code" it wrote. It always looked nice though.


>> 1. Copy-pasting existing working code with small variations. If the intended variation is bigger then it fails to bring productivity gains, because it's almost universally wrong.

This does not match my experience. At all. I can throw extremely large and complex things at it and it nails them with very high accuracy and precision in most cases.

Here's an example: when Opus 4.5 came out I used it extensively to migrate our database and codebase from a one-Postgres-schema-per-tenant architecture to a single schema architecture. We are talking about eight years worth of database operations over about two dozen interconnected and complex domains. The task spanned migrating data out of 150 database tables for each tenant schema, then validating the integrity at the destination tables, plus refactoring the entire backend codebase (about 250k lines of code), plus all of the test suite. On top of that, there were also API changes that necessitated lots of tweaks to the frontend.

This is a project that would have taken me 4-6 months easily and the extreme tediousness of it would probably have burned me out. With Opus 4.5 I got it done in a couple of weeks, mostly nights and weekends. Over many phases and iterations, it caught, debugged and fixed its own bugs related to the migration and data validation logic that it wrote, all of which I reviewed carefully. We did extensive user testing afterwards and found only one issue, and that was actually a typo that I had made while tweaking something in the API client after Opus was done. No bugs after go-live.

So yeah, when I hear people say things like "it can only handle copy paste with small variations, otherwise it's universally wrong" I'm always flabbergasted.


Interesting. I've had it fail on much simpler tasks.

Example: was writing a flatbuffers routine which translated a simple type schema to fbs reflection schema. I was thinking well this is quite simple, surely Opus would have no trouble with it.

Output looked reasonable, compiled.. and was completely wrong. It seemed to just output random but reasonable looking indices and offsets. It also inserted in one part of the code a literal TODO saying "someone who understands fbs reflection should write this". Had to write it from scratch.

Another example: was writing a fuzzer for testing a certain computation. In this case, there was existing code to look at (working fuzzers for slighly different use cases), but the main logic had to be somewhat different. Opus managed to do the copy paste and then messed up the only part where it had to be a bit more creative. Again, showing the limitation of where it starts breaking. Overall I actually considered this a success, because I didn't have to deal with the "boring" bit.

Another example: colleague was using Claude to write a feature that output some error information from an otherwise completely encrypted computation. Claude proceeded to insert a global backdoor into the encryption, only caught in review. The inserted comments even explained the backdoor.

I would describe a success story if there was one. But aside from throwing together simple react frontends and SQL queries (highly copy-pasteable recurring patterns in the training set) I had literally zero success. There is an invisible ceiling.


I find LLMs to be absolutely worst at "take this content and put (a copy) there" tasks. They slightly subtly mutate the content while doing that! I keep having to e.g. restore some explanatory comments.


Highly recommend https://grml.org/zsh/

For nixos users: https://discourse.nixos.org/t/using-zsh-with-grml-config-and...

Super easy to setup, and works very well


Although it's sad, I have to agree with what you're alluding to. I think there is huge overhead and waste (in terms of money, compute resources and time) hidden in the software industry, and at the end of the day it just comes down to people not knowing how to write software.

There is a strange dynamic currently at play in the software labour market where the demand is so huge that the market can bear completely inefficient coders. Even though the difference between a good and a bad software engineer is literally orders of magnitude.

Quite a few times I encountered programmers "in the wild" - in a sauna, on the bus etc, and overheard them talking about their "stack". You know the type, node.js in a docker container. I cannot fathom the amount of money wasted at places that employ these people.

I also project that actually, if we adopt LLMs correctly, these engineers (which I would say constitute a large percentage) will disappear. The age of useless coding and infinite demand is about to come to an end. What will remain is specialist engineer positions (base infra layer, systems, hpc, games, quirky hardware, cryptographers etc). I'm actually kind of curious what the effect on salary will be for these engineers, I can see it going both ways.


Same experience here, using new models. Every time it's a disappointment. Useful for search queries that are not too specialized. That's it.


I get pretty good results with Claude code, Codex, and to a lesser extend Jules. It can navigate a large codebase and get me started on a feature in a part of the code I'm not familiar with, and do a pretty good job of summarizing complex modules. With very specific prompts it can write simple features well.

The nice part is I can spend an hour or so writing specs, start 3 or 4 tasks, and come back later to review the result. It's hard to be totally objective about how much time it saves me, but generally feels worth the 200/month.

One thing I'm not impressed by is the ability to review code changes, that's been mostly a waste of time, regardless of how good the prompt is.


Company expectations are higher too. Many companies expect 10x output now due to AI, but the technology has been growing so quick that there are a lot of people/companies who haven't realized that we're in the middle of a paradigm shift.

If you're not using AI for 60-70 percent of your code, you are behind. And yes 200 per month for AI is required.


We've been trialing code rabbit at work for code review. I have various nits to pick but it feels like a good addition.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: