What's with these journals all being so hostile? I remember years ago i tried to delete my Washington Post account - there was no button anywhere in the settings menu though, only: text if you want your data deleted. I texted them, they asked back, do you live in a GDPR region? I said no, they replied well tough luck. Insane
I understand that this is frustrating for people who mostly write thoughtful emails. But personally I use gmail for exactly the following things: account recovery, system notifications, and b2b email threads. For the latter, I really couldn't care less about form or shape. It's a tool to an end, to get a point across. I found the auto writing stuff pretty useless so far (suggestions change the intended tone or even meaning of the email) but summaries are very useful to get a grip what happened in a larger thread which I should only know the gist of anyway.
I might be in the minority but to me email is an annoying requirement to reach out to people, and that is not due to the AI tools, it's due to: thread management, the horrible noise of unasked for newsletter, and system messages and updates I theoretically do care about but that are just inconsistently formatted and badly listed. I welcome AI giving me a better overview over what's going on than what I myself have.
I agree, if I was the maintainer this would be an extremely tiring community feedback.
People coming in "I encountered a bug, I don't know what the bug is but I thought about it for a second and it's obviously your descision to do xyz".
As a maintainer, what are you supposed to do? It's not more useful than a ticket "somethings wrong idk what" which is useless enough to close without further action. But it puts the burden on the maintainer to a) figure out what's wrong based on basically no data whatsoever, then b) if they find it out figure out why then c), and that's the tiring part, review their process and create a defense for their approach, or admit that that thing that random user felt after trying out your software for 10 minutes is right, and that you were what? stupid to even think this would ever work? They never asked for any of this, and they're already doing so much work for free.
If the rsync maintainer reads this: You're doing incredible work and humanity appreciates your obviously incredibly competence in it, and not everyone feels the way these people do.
Moving to agentic workflows is obviously the right step and it already provides enough benefits to do it already. And mistakes are bound to happen (if the issue is even a mistake!) and there will always be people who cannot comprehend the power of agents and who will point the finger saying "I know it from the start! I've worked with these tools for 2 hours already and I can see they don't work! Idk why you think they do!". They're wrong. But mistakes will happen that otherwise wouldn't have - but that's the learning experience.
I don’t know the details of this exact instance, but saying that there are reliability issues is a valid feedback if reliability plummets.
As far as I know, nobody with data claims that vibe coding doesn’t affect reliability negatively.
People will connect these two things.
Many times, when reliability doesn’t plummet really. For example, there were huge negative news about a Samsung phone a few years back, that it easily causes fires. Sales were affected by this. Interestingly, next year, they released basically the same thing under different name, and complains were never that loud again. And as far as I know, when they were loud, there was nothing special about that particular model regarding this. So it’s possible that outrage is not validated at all.
They will also connect these, when reliability plummets, but it’s not because of vibe coding.
And they will connect, when it is the real culprit in general, but their problems are not affected by vibe coding.
And of course also when vibe coding really causes their problems.
In any case, the original statements will be true. Do we really want to make a product less reliable to implement features and bugs which we deemed not that important before? Especially with a stable product?
Of course, these on the maintainers, but it’s interesting that forcing AI and their consequences on us - like how Microsoft, Google, etc do - is the default, and not the other way around according to many in this thread and others.
Yes, reliability plummeting is valuable feedback, but it should be framed as such, and not attack the maintainers descision to use agentic tools to write code, and especially not in that high nose way with an undertone of: What you're doing is obviously wrong, did you even think?? Everyone here knows it, are you stupid?
And the maintainer can then choose to use that feedback to incorporate it into the workflow, on their own time. If they so choose (which I'm sure they will, unless they get burnt by the community right now).
If the maintainer used any other tool which is suspected to cause a number of recent problems, it'd be discussed. The tone is a problem but the reaction is equally problematic. It isn't even clear the maintainer hasn't been silently changed if agents are used, depending on the extent. That itself is worthy of discussion, and "maintainer decision" is not the right call in that situation. One comment basically insinuates that with instructions for AI, though it was written as a trollish joke.
It can be discussed but not like this. The tone is problematic and results in the reaction. You're basically saying: I found a few bugs and I saw that you use tool X, thus you're now not worthy of maintaining this software without my supervision. Which is tiring if the initial report doesn't even show what exactly was wrong, or if something was wrong. It's just a feeling of: something doesn't work for me; I don't like AI tools; I see you're using AI tools; therefore I now tell you that you're not capable of maintaining this package, and I have to intervene. Such a stark comment must be done on more than just vibes and feelings.
They're not negated, smarter is smarter, but you have to reach deeper in your pocket. I think this will happen more and more - the smartest models get more expensive. But it won't matter - the current models we have today will get cheaper and can still be used for what they're used today.
I would take all benchmarks with a grain of salt. I don't really use them. What's it supposed to tell me? "5% smarter", what does that mean? My experience will differ. Just try it!
I doubt Anthropic internally sets as a goal to improve this or that benchmark - it's just a way to visualize progress. They probably have much more complex metrics internally.
I can tell you my experience as a js package dev, last tried a few weeks ago. We're building an npm package that's supposed to run on both node.js, deno & bun & the web.
This is an annoying to do for exactly two platforms: node.js, and deno.
node.js bcs it requires a workaround whenever something networking comes in: fetch doesn't work the same. So you structure you're code around having a node.js workaround. Same story for some other APIs. But you can test if itn works!
Deno is more annoying, you just can't test your package with deno before publishing. Before we released to npm, we installed a tar file and sent those around for testing. Works in node, in vite (node, for browser), works in bun, like a charm. Doesn't work with deno unless you switch to package.json, and you use exactly the subset of the spec that deno supports. You can't "deno install xyz.tar", you have to use npm for that (inserts a single line into package.json), THEN you can use deno to execute. No docs, no hint, just trial & error.
Even more annoyingly, npm & bun both offer 'link': in package repo, call npm/bun link, in the test repo do npm/bun link @yourpackage, and that's it, it's installed. Creates a dyn link to the source's build dir so you can rebuild without packing or sending tars or anything like that, you just build in your package dir and the test project is immediately updated.
Deno doesn't have that. What's worse, they don't tell you they don't have that. Also basically no error messages. It just fails in weird ways. Spent hours trying to do it. Now I just publish without testing for deno and wait for bug reports.
So out of the three: bun just works. That's it. Better than any platform. It just works, and it has a nicer CLI & nicer error messages, and it's faster on startup. It has the web api and the node api (i think) and its own api that's very nice as well, nicer than e.g. node. And e.g. if you run bun link, it tells you exactly what happened: this is what just happened, this is what you have to do to use it elsewhere. Node doesn't have that!
I think deno recognized bun's strategy of using npm dev's backbone as being the better call - that's why they're now slowly introducing node.js features, even though that goes against their original USP.
This article assumes that AI only has an impact on the development phase which is certainly not true. It can speed up every part of the step. Including ideation, legal, documentation, development, and deployment.
Ideation: Throw ideas back & forth, cross reference with knowledge bases, generate design documents. Documentation: Generate large parts of docs. Development: Clear. Deployment: Generate deployment manifests, tooling around testing, knowledge around cloud platforms.
Every single step can be done better & faster with AI. Not all of them, but a lot.
Even development. Yes some part of your job involves understanding the problem better than anyone & making solutions. But some parts are also purely chore. If you know you keed a button doing X, then designing that button, placing it, figuring out edge cases with hover & press states, connecting to the backend etc - this is chore that can be skipped. Same principle applies to almost all steps.
A typical example of trying to add a new significant capability involves many meetings (days, weeks, months, etc. )with the business to understand how their work flows between systems X, Y and Z as well as all of the significant exceptions (e.g. we handle subset A this way and subset B that way, but for the final step we blend those groups together, except for subset C which requires special process 97).
Then with that understanding comes the system solutioning across multiple systems that can be a blend of internal system or vendor's system, each with different levels of ability to customize, which pushes the shape of the final solution in different directions.
There is certainly value in speeding up coding, but it's just one piece of the puzzle and today LLM's can't help with gathering the domain information and defining a solution.
What I've seen in an AI-forward looking environment is that it's much more common for PM/POs to be knocking up at least a UI prototype now, and experimentation is happening often even before writing the tickets. Similarly when devs are proposing something they often are coming with a couple of prototypes already implemented. Both of those mean decisions are coming a lot quicker.
I wouldn’t discount the value of moving small tasks away from developers, nor the value of fast cheap prototypes.
Product owners can very quickly get, for many problems, an interactive demo without coding. For lots of problems this can be somewhere from a static html page which shows the interactions to a hacked in feature that lets them actually test if it solves the customer need and try several variations before handing over much more concrete specs of what they want to happen. So much time is lost between getting an idea from someone’s head to code to use to then find out it wasn’t communicated well and then finally that the idea didn’t help anyway and we want it in a different way.
Yes yes I know someone is about to say that now there’s pressure to push the prototype out but that’s an organisational level problem that existed anyway.
And small problems can much faster to solve as well, or even move away from devs. Often people just need some text changed somewhere or html putting together, or some basic code for analysis. They could understand the logic, but the task of writing it from scratch and how to run things may be too much - now you don’t need to prioritise work for a dev to get some sql written and they can spend their time on the larger more software engineering level problems.
"that’s an organisational level problem that existed anyway"
That's very true to many organizations. One cannot just slap an AI tool on it when you are dealing with fundamental organizational problems in the first place.
"they can spend their time on the larger more software engineering level problems"
For sure, devs still needs to focus on the right type of work and maintain the balance. I built a tool to just do that: https://worktypefocus.com/
I've seen proposals for Product Managers to define those conditions themselves by speaking with the LLM. A continuing architectural diagram is constructed and graph is updated until all cases are covered and then the LLM writes the code, writes the validations, pushes to CI environments, runs tests, schedules prod deploy (by looking at company event schedule), gets CAB approval, deploys code, tests in prod, and fixes regressions.
I'm not saying this is the correct thing, but companies are implementing it and it is "working". I don't think keeping our head in the sand is helping.
> I've seen proposals for Product Managers to define those conditions themselves by speaking with the LLM.
But the LLM is not aware of how the business works and why, so someone needs to work with the business to extract the information. Typically it's not well documented.
> someone needs to work with the business to extract the information. Typically it's not well documented.
LLM extraction of the information from the Product Owner is becoming the way to overcome poorly-documented business context.
Non-technical folk are using things like `/grill-me` [0] to seed the LLM with the long-tail complexities that they didn't know they didn't know they needed to put out.
They can ask, they can do a back and forth and they can write documentation to be used from that point onwards and write it in a common style and structure.
These are language models, being able to talk through something with them and have them extract some information is what they excel at. Given that you’d probably get a halfway decent result with a literal fixed set of questions (an Eliza level docbot) gpt 5.5 is going to nail that as a task.
is it working though? The main outcome we've seen with companies that drink the AI Kool aid en masse is buggy unstable systems. clearly there's a level of rigor that's being missed for ship velocity
All of the above points align with our organization’s experience. But there is one more thing happening as well: we have more people in more roles able to create software solutions for issues that used to be brute forced via physical processes. (We are a small manufacturing business.) While these aren’t big giant enterprise projects that require deep swe experience, they are simple software tools that are improving process and productivity everywhere. It is pretty amazing what happens when your head of shipping can build a bespoke tool to solve a problem that previously they dealt with through burning through a lot of labor hours.
One of my beliefs about AI, for small / medium sized companies it allows them massive speed ups and generally increases their capability (I'm also in this space), existing employees of all types essentially get massive speed boosts / opens pathways not available before. For big companies, they are likely to have a bunch of problems due to size, communication pathways, management structures, decision making structures, etc.
I would be really interested in the details of these kind of tools that are improving processes and productivity.
Are they reasonably documented/audited/put into any sort of version control like a lot of internal tooling? Or are they the kind of the thing that gets whacked together on the fly in a "move spreadsheet data from A to B", "I want a list of people's schedules with custom highlighting" kind of things.
Not doubting your productivity increase, I'm just curious how people quantify that when they say it.
One of our BAs created a site that tests the effectiveness of copy / layout adjustments. I don't even know exactly what that's called but he's able to do statistical analysis much faster on what works and what doesn't. It's really cool to watch him thrive and I feel like some of the thinkers that were not devs are going to find themselves to be one but in their specific domain in a few years
Yes. In the same way that spreadsheets are the dev tools for non-devs, LLMs could step into that role, but with much more powerful end result. With the caveat that in the same way you can create a powerful foot-gun with a spreadsheet you can probably create a foot-cannon with an LLM.
yeah the Coinbase CEO gleefully pointed that out as well and now the market thinks they are totally incompetent every time some UX quirk is found
looks like orgs have to have engineers on for optics. like having a legal staff with no lawyers, or a cybersecurity staff with no IT or certified people. Software has famously not needed state licenses or industry certification, but maybe thats a direction to consider to give utility to company optics.
The article pretty much plays out whats happening in our place, heavy use of AI in software development but we dont see us shipping faster, about same or perhaps slower (for other reasons). Its a weird feeling as were waiting for this utopia to kick-in but its not and were cant fully put our fingers on it.
The article and the AI skepticism crowd on HN read like the blind leading the blind to me.
I'm at a FAANG. My org is moving much more quickly, maybe between 3-10x more quickly than we were pre-AI. We aren't seeing a spike in reliability issues. Things just get done faster. An org as large as mine has no right to move as fast as it does.
I’ve been back through your post history (not entirely) - you mention multiple times you work at a FAANG - so you work at one of 5 very public companies.
You have been asked multiple times by multiple commenters to provide a single example of something that reflects this incredible boost achieved by <massive tech org>, you have ignored every request for this, and I suspect will ignore this one as well. HN is going to die unless we all start calling these constant deceptive practices out. I’ll leave others to parse your history and make their own judgements.
Not going to break NDA and give up our competitive advantage for HN, sorry! I can tell you it's been useful for us, but thinking about how to use it is an exercise left for the reader.
Perfect excuse for avoiding any substance (except no one asked you to give up the advantage or secret sauce of using LLMs, only the end examples of what has been achieved with it). It is also funny how you always leave out the name of the company you allegedly work at. It is perfectly clear why you do that, though: no matter what company you name, its actual employees on HN will quickly disprove all the ridiculous claims you have made regarding LLMs and AI. Keeping the name ambiguous lets you get away with it.
It's highly team dependent. Shortly, the more "coding monkey" the work is, the more velocity you can get with AI. As soon as you need to interface with customers and extract requirements, that becomes the bottleneck.
The main problem is, it's not a one-size-fit-all tool, you need to understand what it speeds up to benefit from the speed up.
And if it is a chore, we already have some tools to speed it up, only if it is worth it though. Placing a button is actually easy if you get all the design system down usually with a component library, visual regression automation and testing automation.
If a team doesn't have tools and automation in place, AI might speed them up a little but it adds a layer of complexity, i.e. everyone have to manage their own workflow and tools. And when you try to align the team, you get the tools and automation that the team is supposed to have in the first place.
As for ideation, the problem isn't the speed of information ingestion but the ability to connect and understand different parts of the information, which require thinking. More information at times is just going to hinder the ability to think. For example, it is obvious to developers why there is a rate limit for the APIs but for PMs it might not be obvious. They might ask the AI whether or not a rate limit can be removed easily, how many days if you vibe code it and ignore the possibility that the rate limit might by abused by users just to improve a feature because it is too slow.
We are still doing alot of work with new tools but old methods though, it will be interesting to see how far can we go if we forget about the old rules and embrace the chaos entirely.
Indeed. I suspect most effective AI users are quietly making real progress toward their objectives.
Anecdotally, I see a lot of problems/solutions content about AI that doesn't reflect at all the challenges I face. But trying to tell people that there are other ways of doing things, especially when it conflicts with token-maxxing, is a lost cause
I know and I agree. It sounds incredibly arrogant but it's frankly is a bid sad to see how much HN is lagging behind AI adaption. It's been 90% noise over the last 3-6 months about problems that aren't truly problems if you really look hard at what AI is capable to do already today. It's mostly ppl & process problems. I could post a comment like the one above below almost every article on AI. But it is what it is. It's an opportunity for anyone who doesn't bite into the cynical tone here for sure.
The HN AI skeptics are just bizarre to me. They are insisting to us that, no, the productivity gains we're experiencing every day, simply don't exist!
It's not that they're using the tool wrong, it's that the tool just isn't capable of what we see before our own eyes! I guess our eyes and ears are simply lying to us?
And then they ask for how we are managing to make things move faster. When you refuse to breach NDA and give up your competitive advantage on HN, this somehow confirms their belief that AI is useless.
Precisely. People don't realize that it's all numbers. Given average IQ of people involved in a project is 140, an AI with an IQ of 150 can replicate each and every such individuals in the pipeline. People saying AI can't do this or AI can't do that should come to terms with the fact that this IQ gap is monotonously increasing.
1: When was the last time you worked on a project where you thought the average IQ was 140? I don’t even think I have worked on a project where the maximum IQ was 140.
2: Who thinks the IQ of people on the project determines its success? There’s so much more to it than just “high capability team members” (to give IQ a generous interpretation).
3: (math joke) A sequence like (AI IQ - Human IQ) can be negative and monotonicly increasing and still never reach 0.
Pattern matching against millions of IQ test questions from a training set in order to score 150 on an IQ test doesn't give you an intelligence equivalent to 150.
I agree. Inexperienced people (not necessarily "dumb") are likely to accept everything at face value, not apply critical thinking skills, and not even check the AI generated output.
What IQ "means" is separable from what it is. IQ is a measure of performance on IQ tests. That's literally what it is. If a computer system can complete IQ tests, it has an IQ.
The issue is that IQ means less than you want it to.
I don't want IQ to mean anything. pkoird clearly wants it to mean something.
IQ is a terribly flawed measure of human intelligence. But it measures nothing when you apply an LLM that contains multiple IQ tests in its corpus. IQ is deeply flawed, but the point is not to "measure performance on IQ tests". If someone cheats on an IQ test and scores 200, no reasonable person would say they have 200 IQ.
Enforcement is secondary and is allowed to take weeks / months / never at all if nobody reads the paper. It's about being able to ban if an issue arrises; not about keeping the database strictly clean.
Looking at it from Europe, this definitely also happens. It depends on the situation. I know of ppl who were kept bcs the parting was in good faith (which was less a firing and more an agreement that parting is in everyone's interest), but I also know of ppl who had their access revoked before firing bcs it wasn't. The latter had unilateral system access as well, which added to it. It's not about humane or inhumane, it's about risk. The 3-6 months being nice is also a fairytale that I have only ever heard in a positive light from employees who are not particularly ambitious or awake or in any way satisfied with their jobs or the prospect of a future job. On the other hand from the perspective of employers it's consistently hard to effectively restructure, it's expensice and awkward to have to pretend to want to keep someone around that you or they don't want around.
It's just one of these rules that unfortunately in Europe allow people to view life purely as the time between jobs. I'd never tell that to someone's face but it's simply a fact that the world stops of people don't work and no matter what the ideal world looks like in your dreams, working is the only real way forward for anything. It's part of the reason why Europe is falling behind on everything.
Europe is not falling behind on anything that is not reasonable.
The increased growth in USA the last decade have largely been created by means that one day will be quite costly for you (debt).
The USA under MAGA is falling apart. EU and others are actively minimizing risk by selecting non-US IT providers. EU and others are actively selecting non-US defence aystems.
I say that it is very positive to protect your citizens. Russia (sending their citizens en masse to a certain death on the front lines) and USA have more in common politically than USA and EU.
I agree with everything you said, it's great that they're trying to detach from US IT providers & alternative, and I do think Europe is doing a lot things better than the US.
But there's nothing like AWS, Google Cloud, facebook, Azure, ChatGPT, Tesla, etc etc the list goes on and is very long, in Europe. They're switching way too late. Why did it not happen before? Why do we have very limited IT providers, for example? Due to the culture and regulation that doesn't incentivize it sufficiently.
I'm European too btw and live in the EU and I'm happy about a lot of things we have that the US doesn't, I'm just personally worried that we're setting priorities wrong. Having a chill life in the park is good in the ideal it's just detached from what's needed to make a state run; and it will end in the EU having even less power that is has now, resulting in fewer moral values being carried into the world.
> It's part of the reason why Europe is falling behind on everything.
I read a news article that Orange Telecom in France was being sued by a woman they had on payroll for the last 20 years doing nothing, because due to a medical condition she suffered, she became unable to do her job, and since they couldn't fire her due to France unions and labor laws, nor did they have any available job that could fit her current condition, they just kept paying her for 20 years to do nothing at work, and now she's suing them for the depression she got to get paid for no work.
It felt like reading a Monty Python skit.
But Europe is failing due to a myriad of compounding issues and structural deficits, not just because firing workers can be a Kafkaesque nightmare in some countries. European workers' unions and labor protections were even stronger 20-25 years ago and in 2004 the Euro stock market was worth more than the US stock market, while now it's worth half the US one. But that's whole different discussion where pages have to be written to encompass the whole context and cover all aspects of European economic decline. Boiling it down to crazy labor protections would be reductionist and incorrect.
They couldn't find anything for her to do? Hard to believe, but if there's a reason not to fire her then then pay her the money she's owed and stop demanding she show up. Making someone come in with no tasks assigned is fun for a week and quickly turns into punishment detail. Putting someone on punishment detail because you're not allowed to fire them is Bad.
Unless she was allowed to stay home, in which case I take most of that back and it falls on her to go outside and find something to do. I can't find any articles with enough detail. But I'm still skeptical they actually couldn't find a job for her to do. It was 'just' paralysis on one side.
>They couldn't find anything for her to do? Hard to believe,
If a person's now disabled, what can a company give them to do profitably, that isn't already optimized, automated or offshored?
There's plenty of civil servants whose jobs are just moving one paper from one room to the next, just to keep more useless people employed that nobody would hire in the private sector. But this doesn't really exist as much in the private sector.
I don't think they offshored the entire office, but if they did they'd probably be able to fire her at that point.
If I found the right article, the disability is epilepsy and paralysis on one side.
Which mean she can do pretty much any office job fine. She already was doing office work, so the disability should not have changed things all that much. I'm sure she typed slower, but that can be worked around and mitigated.
>Which mean she can do pretty much any office job fine.
Honestly, I doubt it. If you show up to an interview of "any office job" with "epilepsy and paralysis on one side" nobody will hire you simply because you won't be as productive as those without such disabilities.
Also, "epilepsy and paralysis on one side" is the legal medical diagnosis, but in practice the impact can be much greater, especially with age, which is why ageism is a thing even among people who are legally in full health because in practice your body isn't the same like when you were 19-25.
But given that they already hired her, if she's going at 30-90% speed depending on task then it should be very easy to keep giving her tasks. And she can practice things like one-handed typing to improve the average.
She doesn't need the equivalent of "moving paper from one room to the next". She lost some number of dollars per hour worth of productivity, but it sounds like she was still capable of being reasonably productive.
>Ithey just kept paying her for 20 years to do nothing at work, and now she's suing them for the depression she got to get paid for no work.
It's called "mise au placard" and it's illegal. It's a technique to get people to quit by themselves, so companies don't have deal with the hassle of firing them. The lawsuit is 100% justified.
The anomaly there is that France Télécom was a public company at the time of the hiring, and through privatisation public servant benefits were upheld for existing employees, which blocked most unpythonesque solutions.
If she had been hired after, it would have taken time but she would have been found unfit for work (she had epilepsy and hemiplegia), her contract terminated, and she would have most likely received a handicap pension instead.
reply