Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I decided to try seriously the Sonnet 3.7. I started with a simple prompt on claude.ai ("Do you know claude code ? Can you do a simple implementation for me ?"). After minimal tweaking from me, it gave me this : https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016...

After interacting with this tool, I decided it would be nice if the tool could edit itself, so I asked (him ? it ?) to create its next version. It came up with a non-working version of this https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016.... I fixed the bug manually, but it started an interactive loop : I could now describe what I wanted, describe the bugs, and the tool will add the features/fix the bugs itself.

I decided to rewrite it in Typescript (by that I mean: can you rewrite yourself in typescript). And then add other tools (by that: create tools and unit tests for the tools). https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... and https://gist.github.com/sloonz/3eb7d7582c33e95f2b000a0920016... have been created by the tool itself, without any manual fix from me. Setting up the testing/mock framework ? Done by the tool itself too.

In one day (and $20), I essentially had recreated claude-code. That I could improve just by asking "Please add feature XXX". $2 a feature, with unit tests, on average.



So you’re telling me you spent 20 dollars and an entire day for 200 lines of JavaScript and 75 lines of python and this to you constitutes a working re-creation of Claude Code?

This is why expectations are all out of whack.


That amount of output is comparable to what many professional engineers produce in a given day, and they are a lot more expensive.

Keep in mind this is the commenters first attempt. And I'm surprised he paid so much.

Using Aider and Sonnet I've on multiple occasions produced 100+ lines of code in 1-2 hours, for under $2. Most of that time is hunting down one bug it couldn't fix by itself (reflective of real world programming experience).

There were many other bugs, but I would just point out the failures I was seeing and it would fix it itself. For particularly difficult bugs it would at times even produce a full new script just to aid with debugging. I would run it and it would spit out diagnostics which I fed back into the chat.

The code was decent quality - better than what some of my colleagues write.

I could probably have it be even more productive if I didn't insist on reading the code it produced.


The lines of code isn’t the point. Op claimed they asked Claude to recreate Claude code and it was successful. This is obviously an extreme exaggeration. I think this is the crux of a lot of these posts. This code generator output a very basic utility. To some this is a revelation, but it leaves others wondering what all the fuss is about.

It seems to me people’s perspective on code gen has largely to do with their experience level of actually writing code.


It's a very narrow reading of his comment. What he meant to say was it quickly created a rudimentary version of an AI code editor.

Just as a coworker used it to develop an AI code review tool in a day. It's not fancy - no bells and whistles, but it's still impressive to do it in a day with almost no manual coding.


> In one day (and $20), I essentially had recreated claude-code.

Not sure it’s a narrow reading. This is my point, if it’s a basic or rudimentary version people should be explicit about that. Otherwise these posts read like hype and only lead to dissatisfaction and disappointment for others.


s/reading/interpretation/

Reading something literally is by definition the narrowest interpretation.


> Using Aider and Sonnet I've on multiple occasions produced 100+ lines of code in 1-2 hours, for under $2. Most of that time is hunting down one bug it couldn't fix by itself (reflective of real world programming experience).

Was this using technologies you aren't familiar with? If not, the output rate seems pretty low (very human-paced, just with an extra couple bucks spent.)


By 100+ I mean 100-300 lines. I think most people aren't churning out 100 lines of code per hour unless it involves boilerplate.

More importantly, the 100-300 lines was very low effort for me. That does have its downsides (skills atrophy).


Remember that input tokens are quadratic with the length of the conversation, since you re-upload the n previous messages to get the (n+1)-nth message. When Claude completes a task in 3-4 shots, that’s cents. When he goes down in a rabbit hole, however…


I'm aider there's a command to "reset" so it doesn't send any prior chat. Whenever I complete a mini feature I invoke the command. It helpfully shows the size of the current contact in tokens and the cost so I keep an eye on it.

Doesn't Code have a similar option?


It does — /clear. It also has /compact to summarize previous tasks to preserve some situational awareness while reducing context bulk.


2200 lines. Half of them unit tests I would probably have been too lazy to write myself even for a "more real" project. Yes, I consider $20 cheap for that, considering:

1. It’s a learning experience 2. Looking at the chat transcripts, many of those dollars are burned for stupid reasons (Claude often fails with the insertLines/replaceLines functions and break files due to miss-by-1 offset) that are probably fixable 3. Remember that Claude started from a really rudimentary base with few tools — the bootstrapping was especially inefficient

Next experiment will be on an existing codebase, but that’s probably for next weekend.


Thanks for writing up your experience and sharing the real code. It is fascinating to see how close these tools can now get to producing useful, working software by themselves.

That said - I'm wary of reading too much into results at this scale. There isn't enough code in such a simple application to need anything more sophisticated than churning out a few lines of boilerplate that produce the correct result.

It probably won't be practical for the current state of the art in code generators to write large-scale production applications for a while anyway just because of the amount of CPU time and RAM they'd need. But assuming we solve the performance issues one way or another eventually it will be interesting to see whether the same kind of code generators can cope with managing projects at larger scales where usually the hard problems have little to do with efficiently churning out boilerplate code.


aider has this great visualisation of "self written code" - https://aider.chat/HISTORY.html


I suspect it would be somewhat challenging to do, but I'd love to see something like this where the contributions are bucketed into different levels of difficulty. It is often the case for me that a small percentage of the lines of code I write take a large percentage of the time I spend coding (and I assume this is true for most people).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: