there's a lot more involved in senior dev work beyond producing code that works.
if the stakeholders knew how to do what they needed to build and how, then they could use LLMs, but translating complex requirements into code is something that these tools are not even close to cracking.
> there's a lot more involved in senior dev work beyond producing code that works.
Completely agree.
What I don't agree with is statements like these:
> LLM’s never provide code that pass my sniff test
To me, these (false) absolutions about chat bot capabilities, are being rehashed so frequently, that it derails every conversation about using LLM's for dev work. You'll find similar statements in nearly every thread about LLM's for coding tasks.
It's provably true that LLM's can produce working code. It's also true, that some increasingly large portion of coding is being offloaded to LLM's.
In my opinion, developers need to grow out of this attitude that they are John Henry and they'll outpace the mechanical drilling machine. It's a tired conversation.
> It's provably true that LLM's can produce working code.
You've restated this point several times but the reason it's not more convincing to many people is that simply producing code that works is rarely an actual goal on many projects. On larger projects it's much more about producing code that is consistent with the rest of the project, and is easily extensible, and is readable for your teammates, and is easy to debug when something goes wrong, is testable, and so on.
The code working is a necessary condition, but is insufficient to tell if it's a valuable contribution.
The code working is the bare minimum. The code being right for the project and context is the basic expectation. The code being _good_ at solving its intended problem is the desired outcome, which is a combination of tradeoffs between performance, readability, ease of refactoring later, modularity, etc.
LLM's can sometimes provide the bare minimum. And then you have to refactor and massage it all the way to the good bit, but unlike looking up other people's endeavors on something like Stack Overflow, with the LLM's code I have no context why it "thought" that was a good idea. If I ask it, it may parrot something from the relevant training set, or it might be bullshitting completely. The end result? This is _more_ work for a senior dev, not less.
Hence why it has never passed my sniff test. Its code is at best the quality of code even junior developers wouldn't open a PR for yet. Or if they did they'd be asked to explain how and why and quickly learn to not open the code for review before they've properly considered the implications.
> It's provably true that LLM's can produce working code.
This is correct - but it's also true that LLMs can produce flawed code. To me the cost of telling whether code is correct or flawed is larger than the cost of me just writing correct code. This may be an AuDHD thing but I can better comprehend the correctness of a solution if I'm watching (and doing) the making of that solution than if I'm reading it after the fact.
As a developer, while I do embrace intellisense, I don't copy/paste code, because I find typing it out is a fast path to reflection and finding issues early. Copilot seems to be no better than mindlessly copy/pasting from StackOverflow.
From what I've seen of Copilots, while they can produce working code, I've not seen that much that it offers beyond the surface level which is fast enough for me to type. I am also deeply perturbed from some interviews I've done for senior candidates recently who are using them and, when asked to disable them for collaborative coding task, completely fall apart because of their dependency over knowledge.
This is not to say I do not see value in AI, LLMs or ML (I very much do). However, I code broadly at the speed of thought, and that's not really something I think will be massively aided by it.
At the same time, I know I am an outlier in my practice relative to lots around me.
While I don't doubt other improvements that may come from LLM in development, the current state of the art feels less like a mechanical drill and more like an electric triangle.
Code is a liability, not an asset. It is a necessary evil to create functional software.
Senior devs know this, and factor code down to the minimum necessary.
Junior devs and LLMs think that writing code is the point and will generate lots of it without worrying about things like leverage, levels of abstraction, future extensibility, etc.
The code itself, whether good or bad, is a liability. Just like a car is a liability, in a perfect world, you'd teleport yourself to your destination, instead you have to drive. And because of that, roads and gas stations have to be built, you have to take care of the car, etc,.... It's all a huge pain. The code you write, you will have to document, maintain, extend, refactor, relearn, and a bunch of other activities. So yo do your best to only have the bare minimum to take care of. Anything else is just future troubles.
Sure, I don’t dispute any of that. But it's not a given that using LLMs means you’re going to have unnecessary code. They can even help to reduce the amount of code. You just have to be detailed in your prompting about what you do and don’t want, and work through multiple iterations until the result is good.
Of course if you try to one shot something complex with a single line prompt, the result will be bad. This is why humans are still needed and will be for a long time imo.
I'm not sure that's true. A LLM can code because it is trained on existing code.
Empirically, LLMs work best at coding when doing completely "routine" coding tasks: CRUD apps, React components, etc. Because there's lots of examples of that online.
I'm writing a data-driven query compiler and LLM code assistance fails hard, in both blatant and subtle ways. There just isn't enough training data.
Another argument: if a LLM could function like a senior dev, it could learn to program in new programming languages given the language's syntax, docs and API. In practice they cannot. It doesn't matter what you put into the context, LLMs just seem incapable of writing in niche languages.
Which to me says that, at least for now, their capabilities are based more on pattern identification and repetition than they are on reasoning.
Have you tried new languages or niche languages with claude sonnet 3.5? I think if you give it docs with enough examples, it might do ok. Examples are crucial. I’ve seen it do well with CLI flags and arguments when given docs, which is a somewhat similar challenge.
That said, you’re right of course that it will do better when there’s more training data.
> It's provably true that LLM's can produce working code
ChatGPT, even now in late-2024, still hallucinates standard-library types and methods more-often-than-not whenever I ask it to generate code for me. Granted, I don’t target the most popular platforms (i.e. React/Node/etc; I’m currently in a .NET shop, which is a minority platform now, but ChatGPT’s poor performance is surprising given the overall volume and quality of .NET content and documentation out there.
My perception is that “applications” work is more likely to be automated-away by LLMs/copilots because so much of it is so similar to everyone else’s, so I agree with those who say LLMs are only as good as there are examples of something online, whereas asking ChatGPT to write something for a less-trodden area, like Haskell or even a Windows driver, is frequently a complete waste of time as whatever it generates is far beyond salvaging.
Beyond hallucinations, my other problem lies in the small context window which means I can’t simply provide all the content it needs for context. Once a project grows past hundreds of KB of significant source I honestly don’t know how us humans are meant to get LLMs to work on them. Please educate me.
I’ll declare I have no first-hand experience with GitHub Copilot and other systems because of the poor experiences I had with ChatGPT. As you’re seemingly saying that this is a solved problem now, can you please provide some details on the projects where LLMs worked well for you? (Such as which model/service, project platform/language, the kinds of prompts, etc?). If not, then I’ll remain skeptical.
> still hallucinates standard-library types and methods more-often-than-not whenever I ask it to generate code for me
Not an argument, unsolicited advice: my guess is you are asking it to do too much work at once. Make much smaller changes. Try to ask for as roughly much as you would put into one git commit (per best practices)-- for me that's usually editing a dozen or less lines of code.
> Once a project grows past hundreds of KB of significant source I honestly don’t know how us humans are meant to get LLMs to work on them. Please educate me.
Edit: The author of aider puts the percentage of the code written by LLMs for each release. It's been 70%+. But some problems are still easier to handle yourself. https://github.com/Aider-AI/aider/releases
Thank you for your response - I've aksed these questions before in other contexts but never had a reply, so pretty-much any online discussion about LLMs feels like I'm surrounded by people role-playing being on LinkedIn.
> It's provably true that LLM's can produce working code
Then why can't I see this magical code that is produced? I mean a real big application with a purpose and multiple dependencies, not yet another ReactJS todo list. I've seen comments like that a hundred times already but not one repository that could be equivalent to what I currently do.
For me the experience of LLM is a bad tool that calls functions that are obsolete or do not exist at all, not very earth-shattering.
> if the stakeholders knew how to do what they needed to build and how, then they could use LLMs, but translating complex requirements into code is something that these tools are not even close to cracking.
They don't have to replace you to reduce headcount. They could increase your workload so where they needed five senior developers, they can do with maybe three. That's like six one way and half a dozen the other way because two developers lost a job, right?
Yeah. Code that works is a fraction of the aim. You also want code that a good junior can read and debug in the midst of a production issue, is robust against new or updated requirements, has at least as good performance as the competitors, and uses appropriate libraries in a sparse manner. You also need to be able to state when a requirement would loosen the conceptual cohesion of the code, and to push back on requirements thdt can already be achieved in just as easy a way.
I'd guess "huge investment" in this case is relative. The maintainer is not spending a ton of time building features for the CLA tool since it's mostly "done" and so investing more time to build support for Gitlab would require many more hours of development than they're probably dedicating right now.
And i can imagine that maybe they didn't abstract communication with Github enough and would need to refactor the system to handle that as well.
Generally, i think it's not totally reasonable to expect them to do more free work to support use cases that the maintainer does not need. Since it's open source, we're all welcome to contribute back.
I support remote work, but moving out to a rural area is putting yourself at risk if you expect to get high software engineer pay.
There’s a lot more competition for remote jobs and you’re also competing with highly qualified candidates in lower cost of living places outside the US.
Flipside: Staying in an expensive metro area puts you at risk to maintain your quality of life and pay mortgage/taxes/insurance on everything, when a lot of hybrid jobs don't pay enough to cover those expenses, and are also at risk of layoffs, which can be catastrophic when you have a high cost of living. If you're remote/rural, you don't need to make nearly as much just to skate by if something goes south. I moved out of the Bay Area, and I cannot imagine ever going back. Too much risk. I don't care if the pool of jobs is smaller and the pay is lower.
When it comes to your quality of life, there are a lot of factors besides the cost of living - social circle, weather, job opportunities, hobbies, cultural compatibility etc. For a lot of people, expensive metro areas can also be the only places where they have friends & family or where they can pursue their hobbies. As an immigrant of color, I simply cannot see myself living happily in rural Ohio even if that could be a wonderful place for someone else.
I agree. But there are a lot of places in the US that check 90% of those boxes at a fraction of the cost of the Bay Area. There are a lot of great second-tier cities that are very welcoming to people of color.
I think the key part of moving elsewhere is that you have much less access to the in-person segment, assuming moving represents a significant hurdle to you.
Totally agree with this and one of the biggest problems i’ve dealt with at early stage startups is employees from a big companies that couldn’t deal with bypassing certain things like this.
Totally agree with you that now that React has matured, I have dealt with a ton of shitty React codebases because it really is more of a library that doesn't inform you how to architect your app and thus lots of folks went in completely different directions. Additionally because some of the pitfalls of React can be unintuitive, it is tough to unravel these in a larger codebase.
However, I will say it has been much easier to make updates to bad React codebases than a bad jQuery codebase. The way React works tend to make it less difficult to reason about even if the app is poorly built.
> Turbopack is built on Turbo: an open-source, incremental memoization framework for Rust. Turbo can cache the result of any function in the program. When the program is run again, functions won't re-run unless their inputs have changed. This granular architecture enables your program to skip large amounts of work, at the level of the function.
Which would make that multiple an extremely rough estimate, highly-dependent on the situation. But it's still an exciting development
It's not very hard to stick to a pure-ish style even in languages that don't statically enforce it; JavaScript's MobX framework has the same constraint, as do React functions, and computed properties in Vue, etc. You don't really get any static help with functional purity in those cases, but thousands of programmers do it successfully every day
And while Rust doesn't strictly enforce purity, it does make it a lot easier to stick to in practice with its explicit `mut` keyword for local variables, references, and function arguments (including for the `self` argument on methods). There are various ways like RefCell to get around this if you're really trying to, but you have to go out of your way.
The answer seems to be caching (https://turbo.build/pack/docs/core-concepts), lots and lots of caching. Which enables it to reuse a lot of already computed stuff for incremental rebuilds.
Vite uses esbuild for transpiling and rollup for bundling. esbuild (written in go) is already pretty fast for transpiling but there might be a lot of possibilities to optimize in rollup as rollup is written in JavaScript.
My guess is some combination of dropping support for older browsers, supporting fewer features, and optimizing for the common case at the expense of edge cases. In a few releases, once all those are added back in, it will be no faster than Vite.
Hanoi feels more novel to Americans than Ho Chi Minh, which feels like a fairly generic city. At least that's how I felt visiting the two.
Another thing you have to remember is that there are a lot of southern Vietnamese in the US, who brought their food with them. The average American has experienced more of south Vietnam than north without ever having visited.
Seeing the difference between a city that I perceived to be kind of emulating western culture (Ho Chi Minh), versus Hanoi, which has a culturally distinct feel about it, can reasonably lead a person to see a touristy city as a more cultural experience.
From an ameri-centric point of view, HCM is inundated with a kind of unpleasant or generic tourism (eat at these places, eat on a boat, go to this market, go to this tower, go to these museums, go to these "palaces," climb into vietnamese tunnels) compared to Hanoi where a lot of the tourism is related to both food and how beautiful the country is. It's kind of the difference between "this food is objectively good" and "this food is new and interesting."
I have used protobufs and grpc on the web before. Maybe this project is the magic bullet that makes it easy to do, but in the past that typescript support from Google and 3rd parties (improbable eng) was lacking and there was little interest in making simple changes to make it easier to use.
On top of that, the increase in JS shipped to user's due to the size of the generated protobuf objects was unacceptable for anything that cares about keeping their bundle size manageable. Have you done anything to address this issue?
Not sure if it is a magic bullet, but it was definitely written by TypeScript developers, for TypeScript developers.
The generated TypeScript code is already pretty minimal because all serialization ops are implemented with reflection instead of generated code (which is only marginally slower than generated code in JS).
But you can also switch to generating JavaScript + TypeScript declaration files, which is truly minimal: JavaScript is an entire dynamic language, so we actually only generated a small snippet of metadata in the .js output, and create a class at run time with a function call. The generated typings (.d.ts) give you type safety, autocompletion in the IDE, and so on.