Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m sorry what is happening with this paragraph:

> As long as the bank authorizations keep coming through, it will push on bug fixes until they're deployed in production, and then start scanning through the user logs to see how well it's doing.

I enjoy using these tools. They help me in my work. But the continual hype makes the discussion around them impossible to be genuine.

So I ask, genuinely, did I miss the configuration section where you can have it scan your logs for new errors and have it argue with you on PRs? Is he trying to say something else? Or is it just anthropomorphizing hype?



I cannot tell if the original tweet is sarcasm or not. Sections like this make me think yes? It's got to be at least tongue-in-cheek.


My take is it's a mix of both sarcasm and not sarcasm, even in the same sentence. It's a post truth future with a ton of upvotes.


I haven't got to trying claude code yet, but absolutely with cursor and windsurf you can have the agent be reading the output of what it writes and runs and it can fix things it sees. You can also have it review code. It also help some times to have it review in a fresh chat with less context. I really think a lot of people on HN are not really pushing on everything that is available. Its magic for me but I spend a lot of effort and skill manifesting the magic. I'm not doubting other people's experience really, but wondering if they are giving up too fast because they actually don't want it work well for ego reasons.


I’m going to keep at it, because I was trained as an SRE, not a developer and have lots of ideas for side projects that have thus far taken a long time to get going, but I’ve been struggling, it sort of quickly gets into these infinite loop situations where it can’t seem it can’t seem to fix a feature and goes back and forth between multiple non working states. CSS layouts but even basic stuff like having the right web socket routes.

We’ll see, maybe my whole approach is wrong, I’m going to try with a simpler project, my first approach was relatively complex.


I can't explain why, but I do get pretty good results with closing a prompt session completely and then initiating a fresh session later on. I have actually seen quite different code from the very same prompt across two sessions.

However, the extra time between sessions does give me the chance to consider where the AI might have gone wrong in the first session and how I could have phrased the initial prompts more effectively.

As others have stated throughout the threads here, I definitely recommend giving it smaller nibbles of problems. When I take this route and get working modules, I will start a fresh prompt session to upload the working code module files and ask it to integrate them into a simple solution with static inputs in a 'main' function. (Because providing coherent inputs from the router function inputs of a web service are simple enough, once these basics are covered)

Basically - I do everything possible in order for the AI to not get distracted or go down a rabbit hole, chasing concerns that humans are very capable of taking care of (without paying tokens for). I will write most of the tests myself and then perhaps afterwards ask it to evaluate the test coverage and provide more of them. This way the new tests are in the format and platform of my choosing.


Oh ok this makes sense. Because of the ordering of the sentences I read it as “it pushes the code to production and then monitors it in production”.

I have found that prompting something like “do X and write tests to confirm it works” works well for what you’re describing. Or even you write the tests then it’ll iterate to make sure they pass.


Yes it will, I wrote a quick script for local deployment (and then had claude improve it) and then quickly write documentation (and have claude improve it) on how to deploy & gather logs. It will do those things and follow the logs while I'm clicking around in the app. When starting a new session it will read the docs and know how to deploy and check logs. If something failed in the docker build that script output is read by claude since it ran it.

haven't tried PR stuff yet though




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: