More

tabbott · 2026-03-07T00:03:35 1772841815

I lead the Zulip project and I'm not aware of any common crash issues with either our server or any of our apps.

Can you share details on what you're experiencing with us? https://zulip.com/help/contact-support.

jesse__ · 2026-03-07T00:43:57 1772844237

Thanks for your work on Zulip!

I have some feedback that's annoyingly non-specific.

I used Zulip a few years ago as a contractor. It seemed _fine_, but I didn't love it. Specifically, the UI felt sluggish and generally the experience was somewhat unpolished. Maybe things have changed, a lot happens in a couple years, but there you go

tabbott · 2026-03-07T01:44:24 1772847864

Just about every UI component has been redesigned over the last two years. So your experience may be different these days :).

EdNutting · 2026-03-07T00:04:31 1772841871

tabbott · 2026-03-06T19:57:50 1772827070

I recommend that anyone who is responsible for maintaining the security of an open-source software project that they maintain ask Claude Code to do a security audit of it. I imagine that might not work that well for Firefox without a lot of care, because it's a huge project.

But for most other projects, it probably only costs $3 worth of tokens. So you should assume the bad guys have already done it to your project looking for things they can exploit, and it no longer feels responsible to not have done such an audit yourself.

Something that I found useful when doing such audits for Zulip's key codebases is the ask the model to carefully self-review each finding; that removed the majority of the false positives. Most of the rest we addressed via adding comments that would help developers (or a model) casually reading the code understand what the intended security model is for that code path... And indeed most of those did not show up on a second audit done afterwards.

staticassertion · 2026-03-07T13:05:38 1772888738

I have a few skills for this that I plug into `cargo-vet`. The idea is straightforward - where possible, I rely on a few trusted reviewers (Google, Mozilla), but for new deps that don't fall into the "reviewed by humans" that I don't want to rewrite, I have a bunch of Claude reviewers go at it before making the dependency available to my project.

Analemma_ · 2026-03-06T20:27:27 1772828847

I'm curious: has someone done a lengthy write-up of best practices to get good results out of AI security audits? It seems like it can go very well (as it did here) or be totally useless (all the AI slop submitted to HackerOne), and I assume the difference comes down to the quality of your context engineering and testing harnesses.

This post did a little bit of that but I wish it had gone into more detail.

j-conn · 2026-03-06T23:53:02 1772841182

OpenAI just released “codex security”, worth trying (along with other suggestions) if your org has access https://openai.com/index/codex-security-now-in-research-prev...

simonw · 2026-03-06T21:28:26 1772832506

The HackerOne slop is because there's a financial incentive (bug bounties) involved, which means people who don't know what they are doing blindly submit anything that an LLM spots for them.

If you're running the security audit yourself you should be in a better position to understand and then confirm the issues that the coding agents highlight. Don't treat something as a security issue until you can confirm that it is indeed a vulnerability. Coding agents can help you put that together but shouldn't be treated as infallible oracles.

hansvm · 2026-03-07T01:10:23 1772845823

That sounds like the same problem (a deluge of slop) with a different interface (eating straight from the trough rather than waiting for someone to put a bow on it and stamp their name to it)?

stubish · 2026-03-07T06:15:15 1772864115

Seems very similar to turning on compiler warnings. A load of scary nothings, and a few bugs. But you fix the bugs and clarify the false positives, and end up with more robust and maintainable code.

simonw · 2026-03-07T01:33:06 1772847186

I've found it's pretty good. It's really not that much of a burden to dig through 10 reports and find the 2 that are legitimate.

It's different from Hacker One because those reports tend to come in with all sorts of flowery language added (or prompt-added) by people who don't know what they are doing.

If you're running the prompts yourself against your own coding agents you gain much more control over the process. You can knock each report down to just a couple of sentences which is much faster to review.

Mapsmithy · 2026-03-07T02:44:49 1772851489

You also probably have a much better idea of where the unsafe boundaries in your application are. Letting the models know this information up front has given me a dozen or so legitimate vulnerabilities in the application I work on. And the signal to noise ratio is generally pretty good. Certainly orders of magnitude better than the terrible dependabot alerts I have to dismiss every day

johannes1234321 · 2026-03-06T22:29:06 1772836146

The question still is: will enough useful stuff be included, to make it worth to dig through the slop? And how to tune the prompt to get better results.

unethical_ban · 2026-03-06T23:53:27 1772841207

I assume it's just like asking for help refactoring, just targeting specific kinds of errors.

I ran a small python script that I made some years ago through an LLM recently and it pointed out several areas where the code would likely throw an error if certain inputs were received. Not security, but flaws nonetheless.

bluGill · 2026-03-06T22:46:53 1772837213

That depends on how the tool is used. People who ask for a security vulnerability get slop. People who asked for deeper analysis often get something useful - but it isn't always a vulnerability.

simonw · 2026-03-06T22:42:03 1772836923

Best way to figure that out is to try it and see what happens.

Groxx · 2026-03-06T23:36:28 1772840188

[claimed common problem exists, try X to find it] -> [Q about how to best do that] -> "the best way to do it is to do it yourself"

Surely people have found patterns that work reasonably well, and it's not "everyone is completely on their own"? I get that the scene is changing fast, but that's ridiculous.

simonw · 2026-03-07T00:05:15 1772841915

There's so much superstition and outdated information out there that "try it yourself" really is good advice.

You can do that in conjunction with trying things other people report, but you'll learn more quickly from your own experiments. It's not like prompting a coding agent is expensive or time consuming, for the most part.

nl · 2026-03-06T23:44:21 1772840661

/security-review really is pretty good.

But your codebase is unique. Slop in one codebase is very dangerous in another.

LamaOfRuin · 2026-03-07T14:26:56 1772893616

For those not aware, this is a specific feature available in Claude Code.

https://support.claude.com/en/articles/11932705-automated-se...

Groxx · 2026-03-07T21:35:07 1772919307

that's kinda what I was looking for tbh. I didn't know that was an option, and nothing in the thread (or article) seemed to imply it was.

I was mostly working off "well I could ask claude to look at my code for security problems, i.e. 'plz check for security holes kthx', but is that really going to be the best option?". if "yes", then it would kinda imply that all the customization and prompt-fiddling people do is useless, which seems rather unlikely. a premade tool is a reasonable starting point.

ronsor · 2026-03-06T22:54:59 1772837699

You're either digging through slop or digging through your whole codebase anyway.

lmeyerov · 2026-03-06T20:55:25 1772830525

We split our work:

* Specification extraction. We have security.md and policy.md, often per module. Threat model, mechanisms, etc. This is collaborative and gets checked in for ourselves and the AI. Policy is often tricky & malleable product/business/ux decision stuff, while security is technical layers more independent of that or broader threat model.

* Bug mining. It is driven by the above. It is iterative, where we keep running it to surface findings, adverserially analyze them, and prioritize them. We keep repeating until diminishing returns wrt priority levels. Likely leads to policy & security spec refinements. We use this pattern not just for security , but general bugs and other iterative quality & performance improvement flows - it's just a simple skill file with tweaks like parallel subagents to make it fast and reliable.

This lets the AI drive itself more easily and in ways you explicitly care about vs noise

ares623 · 2026-03-06T20:29:27 1772828967

No mention of the quality of the engineers reviewing the result?

SV_BubbleTime · 2026-03-07T01:21:42 1772846502

This is exactly how I would not recommend AI to be used.

“do a thing that would take me a week” can not actually be done in seconds. It will provide results that resemble reality superficially.

If you were to pass some module in and ask for finite checks on that, maybe.

Despite the claims of agents… treat it more like an intern and you won’t be disappointed.

Would you ask an intern to “do a security audit” of an entire massive program?

padolsey · 2026-03-07T01:51:38 1772848298

My approach is that, "you may as well" hammer Claude and get it to brute-force-investigate your codebase; worst case, you learn nothing and get a bunch of false-positive nonsense. Best case, you get new visibility into issues. Of _course_ you should be doing your own in-depth audits, but the plain fact is that people do not have time, or do not care sufficiently. But you can set up a battery of agents to do this work for you. So.. why not?

creatonez · 2026-03-07T02:00:13 1772848813

IMO the key behavior is that LLMs are really good at fuzz testing, because they are probabilistic monkeys on typewriters that are much more code-aware than a conventional fuzz tester. They cannot produce a comprehensive security audit or fix security issues in a reliable way without human oversight, but they sure can come up with dumb inputs that break the code.

The results of such AI fuzz testing should be treated as just a science experiment and not a replacement for the entire job of a security researcher.

Like conventional fuzz testing, you get the best results if you have a harness to guide it towards interesting behaviors, a good scientific filtering process to confirm something is really going wrong, a way to reduce it to a minimal test case suitable for inclusion in a test suite, and plenty of human followup to narrow in on what's going on and figure out what correctness even means in the particular domain the software is made for.

orbital-decay · 2026-03-07T05:25:39 1772861139

>the key behavior is that LLMs are really good at fuzz testing, because they are probabilistic monkeys on typewriters

That's exactly what they're not. Models post-trained with current methods/datasets have pretty poor diversity of outputs, and they're not that useful for fuzz testing unless you introduce input diversity (randomize the prompt), which is harder than it sounds because it has to be semantical. Pre-trained models have good output diversity, but they perform much worse. Poor diversity can be fixed in theory but I don't see any model devs caring much.

krzyk · 2026-03-07T06:14:54 1772864094

What is there to loose in trying?

Basically, don't trust AI if it says "you program is secure", but if it returns results how you could break it, why not take a look?

This is the way I would encourage AI to be used, I prefer such approaches (e.g. general code reviews) than writing software by it.

SV_BubbleTime · 2026-03-09T02:24:01 1773023041

Because if you want the work done correctly, you WILL put the time you thought you were saving in. Either up front, or in review of its work, or later when you find out it didn’t do it correctly.

eli · 2026-03-07T03:09:00 1772852940

It depends whether anyone was ever actually going to spend that week doing it the "hard" way. Having Claude do it in a few minutes beats doing nothing.

Put another way: I absolutely would have an intern work on a security audit. I would not have an intern replace a professional audit though.

It's otherwise a pretty low stakes use. I'd expect false positives to be pretty obvious to someone maintaining the code.

SV_BubbleTime · 2026-03-07T03:26:32 1772853992

My point is that it’s one thing to say I want my intern to start doing a security audit.

It’s another thing to say hey intern security audit this entire code base.

LLM’s thrive on context. You need the right context at the right time, it doesn’t matter how good your model is if you don’t have that.

j16sdiz · 2026-03-07T04:42:43 1772858563

> Would you ask an intern to “do a security audit” of an entire massive program?

Why not?

You can't relies solely on that, but having an extra pair of eye without prior assumption on the code always is good idea.

tabbott · 2026-02-26T23:45:13 1772149513

What makes you want to believe the Trump Administration when it claims it doesn't want to do domestic mass surveillance?

tabbott · 2026-02-26T23:43:28 1772149408

An organization character really shows through when their values conflict with their self-interest.

It's inspiring to see that Anthropic is capable of taking a principled stand, despite having raised a fortune in venture capital.

I don't think a lot of companies would have made this choice. I wish them the very best of luck in weathering the consequences of their courage.

idiotsecant · 2026-02-27T00:20:58 1772151658

The problem is that this is a decision that costs money. Relying on a system that makes money by doing bad things to do good things out of a sense of morality when a possible outcome is existential risk to the species is a 100% chance of failure on a long enough timeline. We need massive disincentives to bad behavior, but I think that cat is already out of its bag.

freakynit · 2026-02-27T02:15:09 1772158509

I appreciate that the HN community values thoughtful, civil discussion, and that's important. But when fundamental civil liberties are at stake, especially in the face of powerful institutions and influence from people of money seeking to expand control under the banner of "security", it's worth remembering that freedom has never simply been granted. It has always required vigilance, and at times, resistance. The rights we rely on were not handed down by default; they were secured through struggle, and they can be eroded the same way.

Power corrupts, and absolute power corrupts absolutely.

_def · 2026-02-27T05:47:26 1772171246

On a long enough timeline literally everything has 100% chance of failure. I'm not trying to be obnoxious, I just wanna say: we only got this one life and we have to choose what to make of it. Too many people pretend things are already laid out based on game theory "success". But that's not what it's about in life at all.

tabbott · 2026-02-26T17:37:22 1772127442

It's an interesting idea. The current endowment size of less than $1M is immaterial; the question with a project like this will always be how it is able to raise capital.

A way something like this could be interesting is if founders started donating 5% of equity when they started a company to an open source foundation like this one.

It doesn't impact the founder much financially: Success is very binary for founders. But in aggregate, if thousands of startup founders do this, there would be some hits and some of those hits could generate a significant endowment.

(You can also try to get people to donate who feel their success was built on top of open source, but I feel that after 10 years building a company to IPO, one's attention as a founder has likely been on business metrics and spending time with business people, not on technology and spending time with technologists, and that shift in attention can reduce people's feeling of gratitude for the amazing inheritance that is open-source software).

kvinogradov · 2026-02-26T17:51:55 1772128315

Consider this as a nonprofit startup that has just raised a pre-seed round. The current size of $700K is indeed immaterial, as our plan is to scale it significantly in the coming years.

The closest real-world comparable to what we are building is the Wikimedia Endowment, whose former Director is among OSE’s advisors. Like Wikimedia, we aim to be supported not only by large donations but also by contributions from large community — in our case, 150M+ GitHub users.

Our target audience is diverse - from highly successful founders to everyday developers. The Open Source Endowment is prepared to accept donations in both cash and stock from these groups.

While 5% of equity may be too much, 1% seems achievable. I am personally ready to commit 1% of the carried interest from my own VC fund to the endowment.

kiba · 2026-02-26T18:09:05 1772129345

What is a preseed round? You guys don't "make" money when the ROI is primarily about funding long term maintenance of open source projects.

kvinogradov · 2026-02-26T18:15:29 1772129729

"Preseed round" is just the small funding when the project is a very early stage. We expect to raise more funding when the endowment matures. There is no ROI, it is a pure charity.

verdverm · 2026-02-26T17:54:29 1772128469

Running a non-profit with the mentality of SV, what could go wrong?

Definitely something I will actively avoid after parent comment

dewey · 2026-02-26T17:55:39 1772128539

Seems better like the current state...of there not being anything like that? Perfect is the enemy of good.

verdverm · 2026-02-26T17:57:59 1772128679

There are many existing projects like this, I'm not going to pick the one started by a former VC

Ask if those have not changed things, why would a VC run thing make things better? The last 2 decades have shown us what VC centeredness has brought us

dewey · 2026-02-26T18:09:37 1772129377

Can you point out some existing ones with traction? I'm looking more at the list of people who are on board with it ("Trusted by open source creators" section) than who is actually running it, which I think is more important to get buy in than whoever is pulling admin strings in the back.

verdverm · 2026-02-26T18:12:01 1772129521

> Can you point out some existing ones with traction?

That's kind of the point, there are none. The question is why? If people cannot even click a button to support when it's right there...

I don't think people coming out of the VC world are going to fix it, call me cynical if you like

dewey · 2026-02-26T18:16:19 1772129779

You said: "There are many existing projects like this", directly followed by "That's kind of the point, there are none." when asked for an example. Which one is it?

It's seems like a pretty thankless fundraising job but one where having connections to companies, banks and experience with distributing funds comes in handy. What's in it for a VC? I'd assume incoming deal flow and connections to new open source companies.

Seems more promising to me than a technical open source maintainer stepping up to do it on the side. But time will tell.

verdverm · 2026-02-26T18:20:22 1772130022

there are many existing, none with meaningful traction

it looks like there are no direct connections, they are investing, taking fees, and distributing the leftovers

whit537 · 2026-02-26T18:03:54 1772129034

Former VC!?

KV ... you gonna take that lying down? :P

> There are many existing projects like this

Also please link, we're not aware of any other endowments exclusively focused on Open Source.

verdverm · 2026-02-26T18:08:57 1772129337

Look at OP's profile: https://news.ycombinator.com/user?id=kvinogradov

https://runacap.com/

kvinogradov · 2026-02-26T18:22:21 1772130141

Not the former VC, but an active venture capitalist: https://kvinogradov.com. I earn money by investing in open source / AI / infra software startups, and I spend money by donating to nonprofit open source projects :-)

Also, it is not a VC who run things, but the team which consists of people with diverse backgrounds (founders/executives/devs x OSS/nonprofit) and the donor community (which everybody can join): https://endowment.dev/community/

verdverm · 2026-02-26T18:26:14 1772130374

It's the VC "class", similar to the Epstein Class, nowhere near as bad or vile, but have definitely been one of the primary reasons the wealth gap and inequality have risen and continue to rise

dewey · 2026-02-26T18:53:04 1772131984

With your strong feelings against VC, I hope you are aware that HN is the message board of one of the leading VC firms?

> but have definitely been one of the primary reasons the wealth gap and inequality have risen and continue to rise

That's a pretty big leap you are doing there.

verdverm · 2026-02-26T19:00:18 1772132418

You can click my handle to see I've been a part of the HN community longer than yourself, I'm fully aware of the many associations

I have /rant'd on YC and the dilution of help to their startups after they stopped heeding their own advice to "do things that don't scale"

whit537 · 2026-02-26T19:03:31 1772132611

> I've been a part of the HN community longer than

Dang, got me beat, too. :) gg

verdverm · 2026-02-26T19:12:08 1772133128

It's not a competition, but it is faux pax for GGP to make the comment the way they did. I would hope you all would know that having been here more than a decade

kvinogradov · 2026-02-26T18:17:17 1772129837

What are your specific concerns?

By the way, only 1 out of 6 core team members is based in SV.

verdverm · 2026-02-26T18:22:44 1772130164

SV is not a geographical location in the sense I'm using it

Taking capital, using it, taking fees, and then distributing leftovers... sounds like Trumponomics

kvinogradov · 2026-02-26T18:52:04 1772131924

We all - the OSE donors - are donating personal savings to make this work, and are directly interested to make this org as efficient as possible. Having skin in the game is best way to keep such nonprofits accountable. There is no leftovers or fees - all investment income from donations goes to open source, except for minimized operating expenses (e.g. accounting). It is run by the team of volunteers without salaries, and we require $1000+/year donations from all directors of this org.

verdverm · 2026-02-26T19:22:59 1772133779

Are there rules on where you park the money between when you get it and when FOSS gets paid?

The README has a 2-3% gap between expected returns and outlays, surely that is not all going to accounting?

mhitza · 2026-02-26T22:18:07 1772144287

You have to give them at least some benefit of the doubt.

I have my own questions yet which I haven't materialized, about the bylaws and selection criteria. But at least they are proposing a new approach.

I'd at least give them a year tryout to see in what it materializes.

At the current state of things I'm a bit in doubts about the market, and how that will change across the year. Though, it would also be interesting, as an idea, to participate in such a process as a member.

mannanj · 2026-02-26T17:42:00 1772127720

I'm not an expert here on equity, 5% feels a bit high. I like the idea - even 1% would be significant. In general, could we start to hold accountable and start using public status and tracking of organizational commitment to the open source software they use and make profit off of - that might help a lot as well.

We in general are too naive and fail to hold accountable others and ourselves from contributing back when we use resources from the common public. Open source is like imo the common welfare/public resource. If others are abusing it, its time to call them out for what they are really doing: framing, abusing and stealing from the public and maybe we need to be more serious about this and change the public access (maybe hybrid-open source for companies who use OS software) and create systems to legally enforce these.

whit537 · 2026-02-26T17:46:04 1772127964

I'll put a plug for the Open Source Pledge here:

https://opensourcepledge.com/members/

The companies listed there have all paid at least $2000/eng on staff/year to OSS maintainers. Real accountability. Endowment accepts corporate donors but is primarily geared towards individuals at this point. Pledge members are all companies. Both/and ... to the OSS moon!

mannanj · 2026-03-01T14:12:27 1772374347

That list is embarrassingly tiny. Not a single Fortune 100 company on there.

whit537 · 2026-02-26T17:44:24 1772127864

Thanks for the idea, tabbott. Made a ticket to track:

https://github.com/osendowment/foundation/issues/24

tabbott · 2026-02-26T17:31:30 1772127090

I feel like the articles on this have been very negative ... but aren't the Anthropic promises on safety following this change still considerably stronger than those made by the competing AI labs?

reasonableklout · 2026-02-26T18:20:14 1772130014

Yes, and it is easy to look at the reality of the market and see how this is needed to remain competitive

tabbott · 2026-02-10T20:07:02 1770754022

I encourage you to stop by and report things with screenshots/screencasts so we can get these fixed.

We were able to reproduce the search icon centering issue in mobile web, which is being fixed here:

https://chat.zulip.org/#narrow/channel/9-issues/topic/center...

tabbott · 2026-02-10T18:30:13 1770748213

This article looks rather rushed -- the description of Zulip is not accurate, and I suspect that folks working on the other products may feel the same way about how their projects are described.

I lead the Zulip project, and I'd like to clarify that Zulip's free community pricing does not have user limits, either in Cloud or self-hosting. The 10 user limit for free mobile notifications only applies to workplace/business use. Larger communities are encouraged to submit a simple form to get approved for notifications beyond 10 users.

And this complaint seems quite strange:

> Even for self-hosted plans, anything above the free tier requires a zulip.com account for plan management.

How would a paid subscription work without an account for managing it?

This is an important and timely topic, but I wish a more deeply researched article was the one being widely circulated.

tabbott · 2026-02-10T18:11:02 1770747062

Zulip web is one of the best modern chat apps for intermittently offline use, and we put a lot of effort into making it that way.

For example, if you had it open on your laptop with a window open, suspend it, and open it up on a plane, you can read the last few weeks of message history, compose replies that will send when you regain network, etc. I do this regularly on flights.

We always have ideas for how to improve this further, and the mobile app doesn't do as extensive caching as the web app does, but it's not an issue of technical feasibility. The protocol was designed for mixed online/offline use from the beginning.

tabbott · 2026-02-10T17:42:33 1770745353

Bringing the recent conversations view for mobile is one of our main goals for next couple months!