Hacker Newsnew | past | comments | ask | show | jobs | submit | m132's commentslogin

Appreciate the full prompt history

Well, it ends with "can you give me back all the prompts i entered in this session", so it may be partially the actual prompt history and partially hallucination.

they read like they were done by a 10 year old

They do, the whole tone and the lack of understanding of Docker, kernel threads, and everything else involved make it sound hilarious at first. But then you realize that this is all the human input that led to a working exploit in the end...

Freebsd doesn't have docker. It has jails which can serve a similar purpose but are not the same in important ways

Please at least read the context before attempting to correct me...

Here's what I'm referring to: https://github.com/califio/publications/blob/7ed77d11b21db80...


I mean, I get it: vibe-coded software deserves vibe-coded coverage. But I would at least appreciate it if the main part of it, the animation, went at a speed that at least makes it possible to follow along and didn't glitch out with elements randomly disappearing in Firefox...

How is this on the front page?


It's on the front page because it looks really cool. You can complain about it being vibe coded, but it still looks good. If you ask Claude to allow the user to slow down the animation, it can do that quite easily, that's just not a problem caused by vibe coding. And I'm on FF and didn't notice anything glitching out.

A Co-Authored-By tag on the commit. It's a standard practice and the meaning is self-explanatory. This is what Claude adds by default too.

If you accept the code generated by them nearly verbatim, absolutely.

I don't understand why people consider Claude-generated code to be their own. You authored the prompts, not the code. Somehow this was never a problem with pre-LLM codegen tools, like macro expanders, IPC glue, or type bundle generators. I don't recall anybody desperately removing the "auto-generated do not edit" comments those tools would nearly always slap at the top of each file or taking offense when someone called that code auto-generated. Back in the day we even used to publish the "real" human-written source for those, along with build scripts!


It's weird, because they should not consider it as their own, but they should take accountability from it.

Ideally, if I contribute to any codebase, what needs to be judged is the resulting code. Is it up to the project's standards ? Does the maintainer have design objections ?

What tool you use shouldn't matter, be it your IDE or your LLM.

But that also means you should be accountable for it, you shouldn't defend behind "But Claude did this poorly, not me !", I don't care (in a friendly way), just fix the code if you want to contribute.

The big caveat to this is not wanting AI-Generated code for ideological reasons, and well, if you want that you can make your contributors swear they wrote it by themselves in the PR text or whatever.

I'm not really sure how to feel about this, but I stand by my "the code is what matters" line.


Sounds bit like the label "organic (food)" coiuld be applied to hand-written code?

Some differences with the human source for those kinds of tools: (1) the resultant generated code was deterministic (2) it was usually possible to get access to the exact version of the tool that generated it

Since AI tools are constantly obsoleted, generate different output each run, and it is often impossible to run them locally, the input prompts are somewhat useless for everyone but the initial user.


But I want to see Claude on the contributor list so that I immediately know if I should give the rest of the repo any attention!

Is there any information about how the "advanced flow" will be implemented? According to keepandroidopen.org, this is going to be handled by Google Play Services. Does it mean it will be automatically installed via the silent, always-on GMS update mechanism and I should root my devices and remove GMS altogether if I don't want this?

A hard read for a skeptic like me. A lot of speculation and extrapolation of a trend, not to say outright exaggeration, but very little actual data. Let's not forget that we're at the tip of an economic bubble, and what you're writing about is at the very center of it!

For what it's worth, I read Anthropic's write-up of their recent 0-day hunt that most of this post seems to be based on, and I can't help but notice that (assuming the documented cases were the most "spectacular") their current models mostly "pattern-matched" their ways towards the exploits; in all documented cases, the actual code analysis failed and the agents redeemed themselves by looking for known-vulnerable patterns they extracted from the change history or common language pitfalls. So, most of the findings, if not all, were results of rescanning the entire codebase for prior art. The corporate approach to security, just a little more automated.

Hence I agree with "the smartest vulnerability researcher" mentioned near the end. Yes, the most impactful vulnerabilities tend to be the boring ones, and catching those fast will make a big difference, but vulnerability research is far from cooked. If anything, it will get much more interesting.


I tend to be skeptical but listening to the linked podcast with Carlini and found him very credible–not a sales guy, not an AI doomer, but someone talking about how little work he had to do to find real exploits in heavily-fuzzed code. I think there’s still a safe bet that many apps will be cumbersome to attack but I think it’s still going to happen faster than I used to think.

https://securitycryptographywhatever.com/2026/03/25/ai-bug-f...


Nicholas Carlini is the real deal. He was most recently on the front page for "How to win a best paper award", about his experience winning a series of awards at Big 4 academic security conferences, mostly recently for work he coauthored with Adi Shamir (I'm just namedropping the obvious name) on stealing the weights from deep neural networks. Before all that (and before he got his doctorate), he and Hans Nielsen wrote the back half of Microcorruption.

He's not a sales guy.


Thanks for having him on. It was really nice to hear a sober, experienced voice talking about their work with fellow practitioners.

Thank Nicholas! We'll talk to anyone. :)

Thanks. Watched most of this talk and, unless I missed something, it seems to confirm what I was thinking—most of the strength currently comes from the scale you can deploy LLMs at, not them being better at vulnerability research than humans (if you factor out the throughput). And since this is a relatively new development, nobody really knows right now if this is going to have a greater impact than fuzzers and static analyzers had, or if newer models are ever going to get to a level that'd make computer security a solved problem.

Theres a video of a recent talk Nicolas Carlini gave this past week on Youtube. It’s eye opening. If you don’t believe that LLMs are going to transform the cybersecurity space after watching that I can’t help you.

It's this talk right here:

https://www.youtube.com/watch?v=1sd26pWhfmg

7 minutes in, he shows the SQLI he found in Ghost (the first sev:hi in the history of the project). If I'd remembered better, I would have mentioned in the post:

* it's a blind SQL injection

* Claude Code wrote an exploit for it. Not a POC. An exploit.


> Not a POC. An exploit.

What's the distinction? A proof of concept is just something that demonstrates that a bug is possible to exploit, by doing so.


Repeatability and/or an actual negative effect.

POC generally means “you can demonstrate unintentional behavior”.

“Exploit” means you can gain access or do something malicious.

It’s a fine line. Author’s point is that the LLM was able to demonstrate some malfeasance, not just unintended consequence. That’s a big deal considering that actual malicious intent generally requires more knowhow than raw POC.


Specifically: the exploit extracted the admin's credentials from the database. A blind SQLI POC would simply demonstrate the existence of a timing channel based on a pathological input.

One other commenter asked a decent question - does going lighter (Zig) or harder on memory safety (Rust) confer any meaningful advantages against the phenomenon you describe?

I remember open-source projects announcing their intent to leave GitHub in 2018, as it was being acquired by Microsoft. I was thinking to myself back then: "It's really just a free Git hosting service, and Git was designed to be decentralized at its very core. They don't own anything, only provide the storage and bandwidth. How are they even going to enshittify this?".

8 years later, this is where we are. I'm honestly just stunned, it takes some real talent to run a company that does it as consistently well as Microsoft.


This is nothing.

I would bet that soon it will inject ads within the code as comments.

Imagine you are reading the code of a class. `LargeFileHandler`. And within the code they inject a comment with an ad for penis enlargement.

The possibilities are limitless.


If I recall correctly, what sparked the mass migration to GitHub was the controversy around SourceForge injecting ads into installers of projects hosted there. Now that we have tools that can stealthily inject native-looking ads into programs at the source code level...

Same as it ever was. Same as it ever was.

I noticed that there's a developing trend of "who manages to use the most CSS filters" among web developers, and it was there even before LLMs. Now that most of the web is slop in one form or another, and LLMs seem to have been trained on the worst of the worst, every other website uses an obscene amount of CSS backdrop-filter blur, which slows down software renderers and systems with older GPUs to a crawl.

When it comes to DeepL specifically, I once opened their main page and left my laptop for an hour, only to come back to it being steaming hot. Turns out there's a video around the bottom of the page (the "DeepL AI Labs" section) that got stuck in a SEEKING state, repeatedly triggering a pile of NextJS/React crap which would seek the video back, causing the SEEKING event and thus itself to be triggered again.

I wish Google would add client-side resource use to Web Vitals and start demoting poorly performing pages. I'm afraid this isn't going to change otherwise; with first complaints dating back to mid-2010s, browsers and Electron apps hogging RAM are far from new and yet web developers have only been getting increasingly disconnected from reality.


Yup, exactly my thoughts.

To me, this discontinuation is less about the product and more about making a statement. The M2 Mac Pro was a dysfunctional product of an internal conflict of interests, but it cast a ray of hope that the M series would develop past the current scaled-up-but-still-disposable phone/embedded SoCs and that Apple had some interest in bringing them closer to the offerings of the competitors from the workstation/server market. Now, with this move, they've made it clear that they would rather give up an entire segment than make at least a narrow part of their ecosystem open enough for the PCIe slots of the Mac Pro to find any serious use.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: