I basically fully agree with this. I am not sure how to handle the ramifications of this in my day to day work yet. But at least one habit I have been forming is sometimes I find that even though the cost of writing code is immensely cheap, reviewing and validating that it works in certain code bases (like the millions of line mono repo I work in at my job) is extremely high. I try to think through, and improve, our testability such that a few hundred line of code change that modifies the DB really can be a couple of hours of work.
Also, I do want to note that these little "Here is how I see the world of SWE given current model capabilities and tooling" posts are MUCH appreciated, given how much you follow the landscape. When a major hype wave is happening and I feel like I am getting drowned on twitter, I tend to wonder "What would Simon say about this?"
> I find that even though the cost of writing code is immensely cheap, reviewing and validating that it works in certain code bases (like the millions of line mono repo I work in at my job) is extremely high.
That is my observation as well. Churning code is easy, but making sure the code is not total crap is a completely new challenge and concern.
It's not like prior to LLMs code reviews didn't required work. Far from it. It's just that how the code is generated in a completely different way, and in some cases with barely any oversight from vibecoders who are trying to punch way above their weight. So they generate these massive volumes of changes that fail in obvious and subtle ways, and the flow is relentless.
> What tremendously helps is asking the LLM to add a lot a lot explanations by adding comments to each and every line or function.
No, it doesn't. It's completely useless and unhelpful. These machine-generated comments are only realizations of the context that already outputted crap. Dumping volumes of this output adds more work to reviewers to parse through to figure out the mess presented by vibecoders who didn't even bothered to check what they are generating.
You don't get me. I don't say that those are good comments, I even say that you should probably delete them afterwards.
But as you say, they are realization of the context of the LLM. Their role is not helping you to understand what the code is doing, but how the LLM understood the problem and how it tried to solve it. Now you can compare its own understanding with yours.
Now I need to add context myself : I'm not talking about vibe coding entire apps, here adding verbosity wouldnt help a lot. My main usage of LLMs is at $JOB where I need to execute "short" tasks into codebases I barely know most of the times, that's where I use this trick. It also have the side benefit to help me understand the codebase better.
This is also my feeling. When people keep referring to big jumps or inflection points, I am left confused, because the models have felt good for a long time and feel like they are getting steadily better. This could be biased by what I use them for though.
It is enormously useful for the author to know that the code works, but my intuition is if you asked an agent to port files slowly, forming its own plan, making commits every feature, it would still get reasonably close, if not there.
Basically, I am guessing that this impressive output could have been achieved based on how good models are these days with large amounts of input tokens, without running the code against tests.
I think the reason this was an evening project for Simon is based on both the code and the tests and conjunction. Removing one of them would at least 10x the effort is my guess.
The biggest value I got from JustHTML here was the API design.
I think that represents the bulk of the human work that went into JustHTML - it's really nice, and lifting that directly is the thing that let me build my library almost hands-off and end up with a good result.
Without that I would have had to think a whole lot more about what I was doing here!
See also the demo app I vibe-coded against their library here: https://tools.simonwillison.net/justhtml - that's what initially convinced me that the API design was good.
I used to think this way too. Here are a few ways I've tried to re frame things that has helped.
1. When I work on side projects and use AI, sometimes I wonder "what's the point if I am just copy / pasting code? I am not learning anything" but what I have come to realize is building apps with AI assistance is the skill that I am learning, rather than writing code per se as it was a few years ago.
2. I work in high scale distributed computing, so I am still presented with ample opportunities to get very low level, which I love. I am not sure how much I care about writing code per se anymore. Working with AI still is tinkering, it has not changed that much for me. It is quite different, but the underlying fun parts are still present.
I have been trying to find such an article for so long, thank you!
I think a common reaction to Agents is “well, it probably cannot solve a really complex problem very well”. But to me, that isn’t the point of an agent.
LLMs function really well with a lot of context, and agent allows the LLM to discover more context and improve its ability to answer questions.
As others have mentioned please add more docs / details to the README
I want to mention my current frustration with cursor recently and why I would love an OSS alternative that gives me control; I feel cursor has dumped agentic capabilities everywhere, regardless of whether the user wants it or not. When I use the Ask function as opposed to Agent, it seems to still be functioning in an agentic loop. It takes longer to have basic conversations about high level ideas and really kills my experience.
I hope void doesn’t become an agent dumping ground where this behavior is thrust upon the user as much as possible
Not to say I dislike agent mode, but I like to choose when I use it.
When you say "vibe code" do you mean the true definition of that term, which is to blindly accept any code generated by the AI, see if it works (maybe agent mode does this) and move on to the next feature? Or do you mean prompt driven development, where although you are basically writing none of the code, you are still reading every line and maintain high involvement in the code base?
Kind of in between. I accept a lot of code without ever seeing it, but I check the critical stuff that could cause trouble. Or stuff that I know the AI is likely to mess up.
Specifically for the front end I mostly vibe code, and for the backend I review a lot of the code.
I will often follow up with prompts asking it to extract something to a function, or to not hardcode something.
https://open.substack.com/pub/orangepuff/p/first-impressions...
I used Claude code to get started on a pdf reader I wanted to build. This pdf reader has a built in LLM chat and when you ask a question about the pdf you’re reading, the page text will be automatically prepended to the question.
Nothing fancy or special. It was built with streamlit in about 150 lines and a single file. But I was impressed that Claude code 1 shot it
May I ask how you got the opportunity to invest in this company? If you are a VC, makes sense, just wondering how normies can get access to invest in companies they believe in. Thanks
If you're an accredited investor (make sure you meet the financial criteria) you can cold email seed/pre-seed stage companies. These companies typically raise on SAFEs and may have low minimum investments (say $5k or $10k).
Many companies are likely happy to take your small check if you are a nice person and can be even minimally helpful to them. Note that for YC companies you'll probably have to swallow the pill of a $20M valuation or so.
I do indeed work in VC. But as another reply mentions, any accredited investor can write small checks into startups, and most preseed/seed founders are happy to take angel checks.
I have started using AI for all of my side projects, and am now building stuff almost everyday. I did this as a way to ease some of my anxiety related to AI progress and how fast it is moving. It has actually had the opposite effect; it's more amazing than I thought.
I think the difficulty in reasoning about 2) is that given what interesting and difficult problems it can already solve, it's hard to reason about where it will be in 3-5 years.
But, I am also having more fun building things than perhaps the earliest days of my first code written, which is just over 7 years now.
Insofar as 1) goes, yes, I never want to go back. I can learn faster and more deeply than I ever could. It's really exciting!
Also, I do want to note that these little "Here is how I see the world of SWE given current model capabilities and tooling" posts are MUCH appreciated, given how much you follow the landscape. When a major hype wave is happening and I feel like I am getting drowned on twitter, I tend to wonder "What would Simon say about this?"
reply