Hacker Newsnew | past | comments | ask | show | jobs | submit | resiros's commentslogin

This seems like a hit job by a competitor. Really ruthless.

> Two months ago, an email went out to a few hundred Delve clients informing them that Delve had leaked their audit reports, alongside other confidential information, through a Google spreadsheet that was publicly accessible.

Who leaked the audit reports? Who sent this email? Who is taking the time to write this analysis and kill the company?

In my opinion, the majority of the points in the article are no news. A compliance saas that offers templates for policies, all of them do. The AI is a chatbot, well who thought.

I think the main point is the collusion between delve and the auditors. Is the evidence for that clear?


The key problem is the audits and the auditors. I have independently verified for our vendors that they have the same templated SOC2 as all of the leaked reports, which is concerning because that shows the auditors did not actually validate the controls.

SOC2 is supposed to give you an INDEPENDENT evaluation of the compliance of a company "are they doing what they say they are"

If the SOC2 report is just a pre-populated template, it is meaningless.

It doesn't really matter the motivation of the "DeepDelver" - this has implications across all companies that rely on these vendors that have been "assessed" by Delve.


Really curious what you're going to do, going forward. Will you be rejecting compliance certified with Delve? Will you be forcing your vendors to redo compliance?

Hit piece or not, the blatantly fraudulent behavior displayed by Delve is reprehensible.

And they didn't even try. Read this management assertion for one of the (known) affected companies:

> We have prepared the accompanying description of Cluely, Inc., system titled "Cluely is a desktop AI assistant to give you answers in real-time, when you need it." throughout the period June 27, 2025 - September 27, 2025(description), based on the criteria set forth in the Description Criteria DC Section 200 2018 Description Criteria for a Description of a Service Organization’s System in a SOC 2 Report (description criteria).

> The description is intended to provide users with information about the "Cluely is a desktop AI assistant to give you answers in real-time, when you need it." that may be useful when assessing the risks arising from interactions with Cluely, Inc. system, particularly information about the suitability of design and operating effectiveness of Cluely, Inc. controls to meet the criteria related to Security, Availability, Processing Integrity, Confidentiality and Privacy set forth in TSP Section 100, 2017 Trust Services Principles and Criteria for Security, Availability, Processing Integrity, Confidentiality and Privacy (applicable trust services criteria).


There's no need for some conspiracy.

It's a juicy story to talk about that hits a lot of checkboxes that make it viral --

  1. the hustle culture they promoted online was gross
  2. they followed the 30u30 Forbes pattern like Liz Holmes, FTX, etc. 
  3. they're a YC co, so their's plenty of popular voices supporting them
The 3rd isn't to slight the program but folks definitely slam any companies that seem to be in the moral gray area as a proof the program is nihilistic and a net negative. People like to shove mistakes in the face of "successful" folks like investors/VCs.

Finally, the security and compliance community is litigious by their nature and this startup, in general, was a net negative for a lot of people who do fractional / consulting work in security.


What's more surprising to me, as a layperson, is that I found this out and investigated their shady auditor network in late December. It didn't take much work.

Insight Partners invested in a 32 MILLION DOLLAR ROUND without any apparent shred of due diligence. What does that say about the VC market writ large?


Not sure I agree with the AI edited comments. Using AI to improve the readability and clarity is fine. Sometimes a well structured comment is much better than a braindump that reads like ramblings. And AI is quite good at it (and probably will get better). To make the point, here is how this comment would have looked if edited:

"I don't fully agree with banning AI-edited comments. Using AI to improve readability and clarity is a reasonable thing to do. A well-structured comment is often much better than a braindump that reads like rambling. AI is quite good at this, and it will probably get better. To illustrate the point, here is how this comment would have looked if edited"


I prefer your non-edited version. My brain automatically starts to zone out with the AI edited version, side effect of having read way too much AI text

I also prefer the original version - the AI version has a strange vibe.

Not to take away from your point, but I like your original one better.

Non-edited is better. It flows and reads faster. The AI sentences they feel clinical and sterile. They feel, well, like AI.

I had never noticed the flow of AI text. They do make the flow of reading feel weird with a lot of pauses! Thanks for pointing it out

The edited version is an example of a sterile/canned response. No one talks like that.

While I do edit my comments to fix typos, certain spelling oddities and other peculiarities would be present.


It's a matter of taste, but your original writing is way better. Your writing has your voice. Like dropping the "I am" from your first sentence, using parentheticals, couching your point in understatement (e.g "sometimes" meaning often instead of just saying "often").

The AI comment might be clear, but it sounds like a press release, not a person, and there's nothing to engage with.


For all the people saying they prefer the non-edited version: would y'all be saying that if you didn't already know which one was the non-edited version? Be honest.

There's nothing inherently better about the edited version. It's just saying the same thing with synonyms substituted, at a slightly more formal but less personal register. HN comments are not academic text, colloquial turns of phrase are perfectly fine and expected.

> There's nothing inherently better about the edited version.

Easier to read ==> More likely to be read.

No, it's not saying the same thing, especially if the tool is telling you that your statement is ambiguous and should be rephrased.


Easier to read is mostly related with predictability of the text. Any time the brain mispredicts the next word, you'd have to go back and re-read.

Unless you are purposely train on that specific way to expression, it ain't easier to read.


I don't know why this is confusing. If I forget to put the "not" qualifier in a sentence, do we agree that it can confuse (or worse, mislead) the reader?

I never said - confusing. Just not easier to read as in relative term.

I don't think the edited version is easier to read.

I'll ask the same question I asked someone else:

https://news.ycombinator.com/item?id=47342324

You're saying removing ambiguity does not make it easier to read? You're saying using a word that means nothing like what you meant to say is easier to read than using the correct word?

Really?


What are you referring to? What word did the GP use that means nothing like what they meant to say?

OK. My brain farted, and I misunderstood the top post to be saying something else, and your and others' criticisms were misinterpreted by me.

Now here's the thing. I wrote all my prior comments on a machine with no LLM access. On my personal machine, I had a while ago installed a TamperMonkey script that sends my draft, along with all the parents (to the root) to an LLM for feedback (with a specific prompt). All it does is give feedback (logical errors, etc). So I tried again with one of my comments, and its feedback found several flaws with my comment, and ended it with this suggestion:

"Considering all this, it might be BETTER to either not reply ..."

Had I had this advice when I was writing those comments, it would have saved me and others a fair amount of time.

This is (mildly) useful. It'd be sad to ban such use.


More formal register doesn’t mean easier to read or understand. To many people the exact opposite is the case.

> More formal register doesn’t mean easier to read or understand.

And who is advocating for a more formal register?


I don't follow the need to write CLIs for the agent. Why not use simply the API and document it well? The token difference between using an API and CLI is not that much, and models are trained to use REST APIs and understand their patterns, compared to your random CLI.


I use netbird and can only recommend it


Netbird is very good for my use case. Simple to set up, and just works.


It might simply be that it was not trained enough in Elixir RL environments compared to Gemini and gpt. I use it for both ts and python and it's certainly better than Gemini. For Codex, it depends on the task.


I wonder why AI labs have not worked on improving the quality of the text outputs. Is this as the author claims a property of the LLMs themselves? Or is there simply not much incentive to create the best writing LLM?


The argument is that the best writing is the unexpected, while an LLM's function is to deliver the expected next token.


Even more precisely, human writing contains unpredictability that is either more or less intention (what might be called authors intent), as well as much more subconsciously added (what we might call quirks or imprinted behavior).

The first requires intention, something that as far as we know, LLMs simply cannot truly have or express. The second is something that can be approximated. Perhaps very well, but a mass of people using the same models with the same approximationa still lead to loss of distinction.

Perhaps LLMs that were fully individually trained could sufficiently replicate a person's quirks (I dunno), but that's hardly a scalable process.


Yeah, that makes banana.


I remember an article a few weeks back[1] which mentioned the current focus is improving the technical abilities of LLMs. I can imagine many (if not most) of their current subscribers are paying for the technical ability as opposed to creative writing.

This also reminded me that on OpenRouter, you can sort models by category. The ones tagged "Roleplay" and "Marketing" are probably going to have better writing compared to models like Opus 4 or ChatGPT 5.2.

[1]: https://www.techradar.com/ai-platforms-assistants/sam-altman...


That's like asking why McDonald's doesn't improve the quality of their hamburger. They can, but only within the bounds of mass produced cheap crap that maximizes profit. Otherwise they'd be a fundamentally different kind of company.


I mean there's tons of better-writing tools that use AI like Grammarly etc. For actual general-purpose LLMs, I don't think there's much incentive in making it write "better" in the artistic sense of the world... if the idea is to make the model good at tasks in general and communicate via language, that language should sound generic and boring. If it's too artistic or poetic or novel-like, the communication would appear a bit unhinged.

"Update the dependencies in this repo"

"Of course, I will. It will be an honor, and may I say, a beautiful privilege for me to do so. Oh how I wonder if..." vrs "Okay, I'll be updating dependencies..."


I wish it would just say "k, updated xyz to 1.2.3 in Cargo.toml" instead of the entire pages it likes to output. I don't want to read all of that!


I used to feel the same but you can just prompt it to reply with only one word when its done. Most people prefer it to summarize because its easier to track so ig thats the natural default


I mean, no one is asking for artistic writing, just not some obvious AI slop. The fact that we all can now easily determine that some text has been written / edited by AI is already an issue. No amount of prompting can help.


The article frames this as "semantic ablation" but the underlying mechanism is more specific: it is distributional averaging. RLHF and DPO reward policies optimize for the modal response given a prompt distribution. That is not a bug in the training process, it is the objective function working as designed. The model learns to produce the response that the median annotator would rate highest, and that response is, almost by definition, the least distinctive one.

What is underappreciated is how much stylistic signal lives in what information retrieval people call "burstiness" -- the tendency for distinctive words to cluster rather than distribute evenly. Hemingway's short declarative stacking, DFW's recursive parentheticals, legal writing's formulaic precision -- these are all bursty patterns that a model trained to maximize expected reward will sand down. You can partially recover it with few-shot prompting, but the model is fighting its own reward gradient the entire time.

The practical question is whether you can encode a style prior that survives the decoding process. The research on authorship attribution (stylometry) suggests the feature set is well-understood -- function word frequencies, sentence length distributions, type-token ratios, syntactic complexity metrics. But nobody has built a production system that uses those features as a constraint during generation rather than just detection.


Yeah but thats not what I am saying. I am saying its default writing style is for communicating with the user, not producing content/text hence it has that distinctive style we all recognise. If you want AI writing thats not slop, there are tools that are trying to do that but the default LLM writing style is unlikely to change imo.


Honestly, just use OpenCode. It works with Claude Code Max, and the TUI is 100x better. The only thing that sucks is Compaction.


How much longer is Anthropic going to allow OpenCode to use Pro/Max subscriptions? Yes, it's technically possible, but it's against Anthropic's ToS. [1]

1: https://blog.devgenius.io/you-might-be-breaking-claudes-tos-...


Consider switching to an OpenAI subscription, which allows OpenCode use.


Yeah. OpenAI allows any client, and only one single fixed system prompt. All their control is on the backend, which is worse than Claude.


Doesn’t Claude code have an agents sdk that officially allows you to use the good parts?


Yes but you can't use a subscription with that


There are also Azure versions of Opus


I have been unable to use OpenCode with my Claude Max subscription. It worked for awhile, but then it seems like Anthropic started blocking it.


What’s 100x better about the TUI?


Nope, OpenCode is nowhere near Claude Code.

It's amazing how much other agentic tools suck in comparison to Claude Code. I'd love to have a proper alternative. But they all suck. I keep trying them every few months and keep running back to Claude Code.

Just yesterday I installed Cursor and Codex, and removed both after a few hours.

Cursor disrespected my setting to ask before editing files. Codex renamed my tabs after I had named them. It also went ahead and edited a bunch of my files after a fresh install without asking me. The heck, the default behavior should have been to seek permission at least the first time.

OpenCode does not allow me to scrollback and edit a prior prompt for reuse. It also keeps throwing up all kinds of weird errors, especially when I'm trying to use free or lower cost models.

Gemini CLI reads strange Python files when I'm working on a Node.js project, what the heck. It also never fixed the diff display issues in the terminal; It's always so difficult for me to actually see what edits it is actually trying to make before it makes it. It also frequently throws random internal errors.

At this point, I'm not sure we'll be seeing a proper competitor to Claude Code anytime soon.


Hmmm, I used OpenCode for awhile and didn't have this experience. I felt like OpenCode was the better experience.


Same, I still use CC mainly due to it being so wildly better at compaction. The overall experience of using OpenCode was far superior - especially with the LSP configured.


I use Opencode as my main driver, and I don’t experience what you have experienced.

For instance, opencode has /undo command which allows you to scroll back and edit a prior prompt. It also support forking conversations based on any prior message.

I think it depends on the set up. I overwrote the default planning agent prompt of opencode to fit my own use cases and my own mcp servers. I’ve been using OpenAI’s gpt codex models and they have been performing very well and I am able to make it do exactly what I ask it to do.

Claude code may do stuff fast, but in terms of quality and the ability to edit only what I want it to do, I don’t think it’s the best. Claude code often take shortcuts or do extra stuff that I didn’t ask.


5.3 Codex on cursor is better than Claude code


Not in my (limited) experience. I gave CC and codex detailed instructions for reworking a UI, and codex did a much worse job and took 5x as long to finish.


This is quite nice but limited in that it is single-player. In my opinion, the next generation of AI agents will be multi-player. Ramp's background agent is a good example https://builders.ramp.com/post/why-we-built-our-background-a...

Making this multi-player + creating the right representation to collaborate with agents is in my opinion the next bottlenecks. I wrote a small article about my thoughts there https://x.com/mmabrouk_/status/2010803911486292154


Very interesting but the limitation on the libraries you can use is very strong.

I wonder if they plan to invest seriously into this?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: