Okay how about this situation that one of my junior devs hit recently:
Coding in an obj oriented language in an enormous code base (big tech). Junior dev is making a new class and they start it off with LLM generation. LLM adds in three separate abstract classes to the inheritance structure, for a total of seven inherited classes. Each of these inherited classes ultimately comes with several required classes that are trivial to add but end up requiring another hundred lines of code, mostly boilerplate.
Tell me how you, without knowing the code base, get the LLM to not add these classes? Our language model is already trained on our code base, and it just so happens that these are the most common classes a new class tends to inherit. Junior dev doesn't know that the classes should only be used in specific instances.
Sure, you could go line by line and say "what does this inherited class do, do I need it?" and actually, the dev did that. It cut down the inherited classes from three to two, but missed two of them because it didn't understand on a product side why they weren't needed.
Fast forward a year, these abstract classes are still inherited, no one knows why or how because there's no comprehension but we want to refactor the model.
True True,
I remember another example, with Linus Torvalds, who at a conference used a trivial example of simplifying functions, as to why hes good at what he does, or what makes a good lead developer in general.
It went something along the lines of.
"Well we have this starting function which clearly can solve the task at hand. Its something 99 developers would be happy with, but I can't help but see that if we just reformulate it into a do-while instead we now can omit the checks here and here, almost cutting it in half."
Now obviously it doesn't suffice as real-world example but, when scaled up, is a great view at what waste can accumulate at the macro level. I would say the ability to do this is tied to a survival instinct, one which, undoubtedly will be touted as something that'll be put in the 'next-iteration' of the model. Its not strictly something I think that can be trained to be achievable though, as in pattern matching, but its clearly not achievable yet as in your example from above.
> Tell me how you, without knowing the code base, get the LLM to not add these classes?
Stop talking to it like a chatbot.
Draft, in your editor, the best contract-of-work you can as if you were writing one on behalf of NASA to ensure the lowest bidder makes the minimum viable product without cutting corners.
---
Goal: Do X.
Sub-goal 1: Do Y.
Sub-goal 2: Do Z.
Requirements:
1. Solve the problem at hand in a direct manner with a concrete implementation instead of an architectural one.
2. Do not emit abstract classes.
3. Stop work and explain if the aforementioned requirements cannot be met.
---
For the record: Yes, I'm serious. Outsourcing work is neither easy nor fun.
Every time I see something like this, I wonder what kind of programmers actually do this. For the kinds of code that I write (specific to my domain and generates real value), describing "X", "Y", and "Z" is a very non-trivial task.
If doing those is easy, then I would assume that the software isn't that novel in the first place. Maybe get something COTS
I've been coding for 25 years. It is easier for me to describe what I need in code than it is to do so in English. May as well just write it.
20 here, mostly in C; mixture of systems programming and embedded work.
My only experience with vibe-coding is when working under a time-crunch very far outside of my domain of expertise, e.g., building non-transformer-based LLMs in Python.
I mean, unless you just don't know how to program, I struggle to see what value the LLM is providing. By the time you've broken it down enough for the LLM, you might as well just write the code yourself.
Curious about the mechanics here — when you say the model was ‘trained on our code base’, was that an actual fine-tune of the weights (e.g. LoRA/adapter or full SFT), or more of a retrieval/indexing setup where the model sees code snippets at inference? Always interested in how teams distinguish between the two.
Coding in an obj oriented language in an enormous code base (big tech). Junior dev is making a new class and they start it off with LLM generation. LLM adds in three separate abstract classes to the inheritance structure, for a total of seven inherited classes. Each of these inherited classes ultimately comes with several required classes that are trivial to add but end up requiring another hundred lines of code, mostly boilerplate.
Tell me how you, without knowing the code base, get the LLM to not add these classes? Our language model is already trained on our code base, and it just so happens that these are the most common classes a new class tends to inherit. Junior dev doesn't know that the classes should only be used in specific instances.
Sure, you could go line by line and say "what does this inherited class do, do I need it?" and actually, the dev did that. It cut down the inherited classes from three to two, but missed two of them because it didn't understand on a product side why they weren't needed.
Fast forward a year, these abstract classes are still inherited, no one knows why or how because there's no comprehension but we want to refactor the model.