Hacker Newsnew | past | comments | ask | show | jobs | submit | Greed's commentslogin

Would definitely tend to agree. Whenever I read complaints about accuracy of LLMs with complex systems, it has generally been from those that aren't thinking very critically about how they're using them in the first place. If you were to replace that LLM with a real human junior, would you really walk away for a few weeks and then assume the solution given was correct by default when you got it back? Obviously not. So you identify and gatekeep the most critical parts ahead of time, make error correction part of the process, and chunk the Giant, Complex Thing into Smaller, Achievable, Verifiable Things.

LLMs are proving to be very much force multipliers of the kind of developer you already are, and of those who report a 10x increase in productivity they're probably all being genuine. Whether that 10x is of careful, thoughtful choices or reckless rough-shod slop though is really an artifact of the developers themselves. I've been saying from the beginning that your effectiveness with LLMs is roughly equivalent to your ability to get effective results out of a real team of human contractors.


If 40k is the barrier to entry for impressive, that doesn't really sell the usecase of local LLMs very well.

For the same price in API calls, you could fund AI driven development across a small team for quite a long while.

Whether that remains the case once those models are no longer subsidized, TBD. But as of today the comparison isn't even close.


It’s what a small business might have paid for an onprem web server a couple of decades ago before clouds caught on. I figure if a legal or medical practice saw value in LLMs it wouldn’t be a big deal to shove 50k into a closet


You would still have to do some pretty outstanding volume before that makes sense over choosing the "Enterprise" plan from OpenAI or Anthropic if data retention is the motivation.

Assuming, of course, that your legal team signs off on their assurance not to train on or store your data with said Enterprise plans.


At least with the server you know what you are buying.

With Anthropic you're paying for "more tokens than the free plan" which has no meaning


With M3 Max with 64GB of unified ram you can code with a local LLM, so the bar is much lower


But why? Spending several thousand dollars to run sub-par models when the break-even point could still be years away seems bizarre for any real usecase where your goal is productivity over novelty. Anyone who has used Codex or Opus can attest that the difference between those and a locally available model like Qwen or Codestral is night and day.

To be clear, I totally get the idea of running local LLMs for toy reasons. But in a business context the sell on a stack of Mac Pros seems misguided at best.


I ran the qwen 3.5 35b a3b q4 model locally on a ryzen server with 64k context window and 5-8 tokens a second.

It is the first local model I've tried which could reason properly. Similar to Gemini 2.5 or sonnet 3.5. I gave it some tools to call , asked claude to order it around, (download quotes, print charts, set up a gnome extension) even claude was sort of impressed that it could get the job done.

Point is, it is really close. It isn't opus 4.5 yet, but very promising given the size. Local is definitely getting there and even without GPUs.

But you're right, I see no reason to spend right now.


Getting Opus to call something local sounds interesting, since that's more or less what it's doing with Sonnet anyway if you're using Claude Code. How are you getting it to call out to local models? Skills? Or paying the API costs and using Pi?


I just start llama.cpp serve with the gguf which creates an openai compatible endpoint.

The session so far is stored in a file like /tmp/s.json messages array. Claude reads that file, appends its response/query, sends it to the API and reads the response.

I simply wrapped this process in a python script and added tool calling as well. Tools run on the client side. If you have Claude, just paste this in :-)


Sometimes you can't push your working data to third party service, by law, by contract, or by preference.


I started doing it to hedge myself for inevitable disappearance of cheap inference.


Sure, but now double the team size. Double it again.

Suddenly that $40k is quite reasonable because you’ll never pay another dollar for st least 2-3 years.


Would you?

2-3 years ago people were fantasizing on running local models on a consumer nvidia RTX GPU.


It's not. I've got a single one of those 512GB machines and it's pretty damn impressive for a local model.


Assuming you ran the gamut up from what you could fit on 32 or 64GB previously, how noticeable is the difference between models you can run on that vs. the 512GB you have now?

I've been working my way up from a 3090 system and I've been surprised by how underwhelming even the finetunes are for complex coding tasks, once you've worked with Opus. Does it get better? As in, noticeably and not just "hallucinates a few minutes later than usual"?


I don't know that I care much for the mythologization of effective developers as "Wolves" and "10x-ers" which are this decade's equivalent of Ninja / Rockstar / Guru, but a similar less tech-centric version of that is just the concept of the "Maverick" within any organization and the parallels aren't too far regardless of the industry you're talking about. Outsized impact in undersold roles with a lot of heavy swinging soft power earned through merit.

It's strange to intentionally try to place or manufacture mavericks within your org for (at least) two reasons:

1. They're emergent phenomena. It's probably more valuable on average to examine WHY someone skipping all of your processes is effective than it is to make the conditions right for someone to become that maverick. Theoretically anyone CAN be that person, but unless something is actively going wrong it probably won't happen.

2. Process exists because it makes your org more efficient. When you start building your teams around the idea of someone explicitly being the maverick(s), ask yourself: "Who exactly is going to reconcile all of this against the framework that the entire rest of the company runs on? Is the rest of that person's team relegated to damage control and cleanup crew, and is that actually more effective than having an equivalent number of mid-level performers all pulling in the same direction?"

In the world of tech, the alleged 10x-er often manifests itself as: Tech Debt, but at High Volume™!


You are confusing concepts.

What the original article described is an engineer who could not stand by and let a painful problem with an obvious solution not be solved. the key point of the so called wolf is the obviousness of the solution. it was. ot obvious to anyone else, and to anyone else it would have been a major investment. the 10x does not come from frantic coding, it comes from a comprehensive and unique understanding that translates to code quickly due to motivation and understanding.

Process does not make an org more efficient. it makes it more consistent. if the baseline efficiency is low, the consistency of an improved set of work practices will ofcourse improve efficiency.

What a process often does is overfitting. Overfitting to the most common buiness need, sometimes overfitting to the noisiest patholgies seen.

The problem with process overfitting is that it excludes efficient solutions for problems that don't fit the previous set of business needs, or are not at risk of the previous set of pathologies. sometimes the process has a good pressure valve for this, pull the andon cord. do some kaizen, fire up the CMM level 5 KPAs. but sometimes just applying bespoke judgment is better.

I have been the wolf he describes. I also have been the manager he describes who lets the wolf have space and stand up for themselves. i have also been the manager who creates process and worflows and alignment and blah blah to dampen the noise of individual agency.

tech debt is an orthogonal concern.


I don't think the concepts are as unrelated as you're suggesting, they both tend to operate on the premise that they can be more effective than others because they're able to bypass the lanes that everyone else is taking.

And you are highlighting exactly what I'm pointing out, which is that if your process is so rigid and overfit that your org is regularly missing out on obvious solutions then the thing you should be solving is the process rather than trying to create "wolves". The concept of a team needing someone that consistently "breaks the rules" so that you can do the right thing is a glaring red flag that you have a bigger picture problem.


my point was the efficiency from bypassing/cutting corners is different to the efficiency from understanding and synthesizing problems and solutions differently.

the "obviousness" in the first is seen by everyone, the "obviousness" in the second is seen only by people able to break out of a collective mindset and unground their thought processes.

in the first the "wolf" is missing some obvious things, in particular the negative externality of their action. in the second the "wolf" is generally working on maximising the positive externality by generalizing problem space and solution space outside of the conventional fitting.


> Process exists because it makes your org more efficient

That is one hell of an assumption.


> Process exists because it makes your org more efficient

Nah. Process mostly exists because management doesn't have visibility into what engineering is doing, so they have to poke vertical holes through the org to know what everyone is up to.

Process is often pitched as improving coordination between teams, but that's more of a fringe benefit than the actual reason for process.


I was part of a user study on Azure back when it first rolled out-- they were looking for seniors with an AWS background to participate in UX research, and I remember walking out of that study with imposter syndrome for the very first time. Spent 60 minutes totally unable to do the thing I wanted to do when I was introduced to Azure for the first time, and I remember thinking... am I a fraud?

No! Not this time, at least. In hindsight everything was named and organized terribly and it hasn't improved much since.


I maintain to this day that the Zune was one of the best designed hardware and software platforms I've ever used. Probably the only truly design forward product that MS ever produced.


The Zune hardware was slick, particularly the solid state players. The music store worked great and their music licensing was so much better than Apple - $10 a month for unlimited streaming, unlimited downloads (rentals) to Zune devices and 10 free mo3 downloads to own.

Their only misstep was making one of their colorways poop brown! That and being too late to market with a phone that used the same design language


There was also the fact that Microsoft introduced it 3 months before Apple announced the product that would kill the iPod, leading with the HDD model (a direct competitor to what would become known as the iPod Classic line) when Apple’s real flagship was the iPod nano.

There was also the crap that was Windows Media Player 11 which I tried to like for about a month.

There was also the incompatibility with Microsoft’s own DRM ecosystem in PlaysForSure which was full of these subscription music services, some of which were quite popular with the kind of people that were inclined to buy a Zune: folks in Microsoft’s ecosystem that had passed up on using an iPod and used something from SanDisk, Creative, Toshiba or iRiver instead. This is because they wanted to replicate the entire iPod+iTunes model entirely.

The 2006 lineup of iPods was also particularly strong, and included the first aluminum iPod nano’s. When Microsoft announced and released the Zune, they were counter-programming against that, right into the Holiday season with a new brand that had no name ID, with a product that was just like the iPod, couldn’t play any of your music from iTunes or Rhapsody, but with… HD radio.

More than a few missteps were made.


  > Their only misstep was making one of their colorways poop brown
i think the other big issue was calling it a 'zune' but thats just me...


“…you’re absurd, what’s a Zune?!”

https://youtube.com/watch?v=Jkrn6ecxthM


Name or color had nothing to do with it imho (I like the brown personally). It was all timing. They were entering a market with a well estaablished leader (iPod) that was nearly as good, as good, or better depending on who you ask. On top of it phones themselves were taking over the music player market at the same time, which is where Microsoft really dropped the ball.

I mean, iPhone is a really ridiculous name as well if you stop to think about it.


You think having a dumb name would be a negative, but one of the biggest bands in the world is called Metallica.


The Zune software 2.0 remains the pinnacle of Microsoft design


It's arguable, even if you're right, that the net loss to humanity is still far greater without these restrictions than with. Modern social media is leading to multiple generations of emotionally stunted, non-verbal children. Many of whom literally struggle to read.

If you haven't seen it in person, it is now incredibly common for children as young as 1 or 2 to be handed an iPad and driven down an algorithmic tunnel of AI generated content with multiple videos overlaid on top. I've seen multiple examples of children scrolling rapidly through videos of Disney characters getting their heads chopped off to Five Nights at Freddy's music while laughing hysterically. They do this for hours. Every day. It's truly horrifying.

Parents are just as poorly equipped at dealing with this as the children are, the difference being that at least their brains have already fully developed so that there is no lasting permanent damage.


Yes, this is the true dividing factor for me. The battery life of the new ARM laptops is an astounding upgrade from any device I have ever used.

I've been a reluctant MacBook user for 15 years now thanks to it being the de-facto hardware of tech, but for the first time ever since adopting first the M1 Pro and then an M2 Pro I find myself thinking: I could not possibly justify buying literally any other laptop so long as this standard exists.

Being able to run serious developer workflows silently (full kubernetes clusters, compilers, VSCode, multitudes of corpo office suite products etc), for multiple days at a time on a single charge is baffling. And if I leave it closed for a week at 80% battery, not only does that percentage remain nearly the same when resumed-- it wakes instantly! No hibernation wake time shenanigans. The only class of device which even comes close to being comparable are high end e-ink readers, and an e-ink reader STILL loses on wake time by comparison.

I'm at the point now where I'm desperately in need of an upgrade for my 8 year old personal laptop, but I'm holding off indefinitely until I discover something with a similar level of battery performance that can run Linux. As I understand it, the firmware that supports that insane battery life and specifically the suspend functionality that allows it to draw nearly zero power when closed isn't supported by any Linux distro or I would have already purchased another MacBook for personal use.


As I'm sure the author now realizes: truly elite skill among those working in the trades is in wildly high demand as compared to what someone might expect coming from the software industry.

If just 1% of all software developers are writing near-flawless code to spec, that's still about 287,000 people in the world. They're relatively accessible and the chances of you being able to work with one on a short timetable is actually pretty high.

By comparison: GC's, architects, builders at that level are far, far more rare by the numbers + highly localized + are usually mired in many years-long projects simultaneously. They do not need your business, are paid whatever price they ask, and are usually booked far in advance.

Even so! If you even get the hint that someone in that situation is willing to work with you it will save you far more time and money to wait for that person than to try going with someone available that you feel alright about. If they're readily available, it's because they are not in demand. Think about why that might be. If you can afford to, waiting for the person you actually want to work with is the better option every. single. time.


This comment makes me feel so sad. I lack the words to describe what critical essence this question is missing, but technology used to mean a hacker ethos of just doing things because they seemed cool and worth doing and even just the ask of this feels parasitic by comparison. Sign of the times.


I'm gonna drop a sad truth on you: even the greatest hackers of old had to make money somehow.


Eh, I think this falls right into the traditional hacker ethos of doing what seems cool, it's just that what you think is cool may be different than what I think is cool.

I want to make games, but I know how much time that takes, so I understand that to make something cool I need funding to be able to focus on that cool thing. Crypto can be a tool in this case, and I personally would prefer mining to watching ads.

Hackers are great and analyzing systems and figuring out what they might support, despite the original designer's intentions.


Absolutely. The value proposition for me with rideshare services has ALWAYS been the conversations and experiences you get to have with a diverse cross section of humanity. I'd take the bus / train otherwise.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: