My experience thus far is that the local models are a) pretty slow and b) prone to making broken tool calls. Because of (a) the iteration loop slows down enough to where I wander off to do other tasks, meaning that (b) is way more problematic because I don't see it for who knows how long.
This is, however, a major improvement from ~6 months ago when even a single token `hi` from an agentic CLI could take >3 minutes to generate a response. I suspect the parallel processing of LMStudio 0.4.x and some better tuning of the initial context payload is responsible.
Open models are trained more generically to work with "Any" tool.
Closed models are specifically tuned with tools, that model provider wants them to work with (for example specific tools under claude code), and hence they perform better.
I think this will always be the case, unless someone tunes open models to work with the tools that their coding agent will use.
> Open models are trained more generically to work with "Any" tool. Closed models are specifically tuned with tools, that model provider wants them to work with (for example specific tools under claude code), and hence they perform better.
Some open models have specific training for defined tools (a notable example is OpenAI GPT-OSS and its "built in" tools for browser use and python execution (they are called built in tools, but they are really tool interfaces it is trained to use if made available.) And closed models are also trained to work with generic tools as well as their “built in” tools.
I see this as the next great wave of work for me and my team. We sustained our business for a good 5–8 years on rescuing legacy code from offshore teams as small-to-medium sized companies re-shored their contract devs. We're currently in a demand lull as these same companies have started relying heavily on LLMs to "write" "code" --- but as long as we survive the next 18 months, I see a large opportunity as these businesses start to feel the weight of their accumulated tech debt accrued by trusting claude when it says "your code is now production ready."
Looking forward to trying this out. An early comment: I would love to be able to override tool descriptions and system prompts from a config. Especially when working with local models, context management is king and the tool descriptions can be a hidden source of uncontrollable context.
This is exactly my dream too, especially after seeing the Apple Watch prototype dumbphone cases they used to conceal them in public[0]. It would be a glorious re-purposing of the Apple Watch to serve the most diehard fans of the killed iPhone Mini.
Sadly the Apple Watch doesn't do proper external text input. You can connect a bluetooth keyboard, but it works by sending all input via the VoiceOver accessibility feature, which is slow and fidgety.
Like my co-founder said, we're a small company based in the US, and hiring in foreign jurisdictions is both expensive and time consuming. We're currently set up to hire in the US and Canada, and while we'd be willing to expand our footprint for the right candidate, the easiest thing for us is to look for candidates in our current operating jurisdictions.
Seeking a skilled frontend-focused full-stack engineer who thrives on building beautiful and functional user interfaces, but still feels comfortable on the back-end.
While we’re looking for developers with strong technical skills we don’t typically hire for experience in a particular framework or technology. We’re mostly seeking generalists that enjoy working in new technical stacks and have exceptional communication skills; because we’re a small company, everyone here takes on a lot of roles, and strong relationships with our clients are essential to our success.
We offer a 20-hour work week, retirement and health benefits, a competitive salary, an unlimited vacation and parental leave policy. You can read more on our work philosophy here: https://www.apsis.io/mission.
If you're interested, please reach out to us with any questions or with your resume at contact@apsis.io.
I agree with the sentiment elsewhere in this thread that this represents a "hideous theft machine", but I think even if we discard that, this is still bad.
It's very clear that generative has abandoned the idea of creative; image production that just replicates the training data only serves to further flatten our idea of what the world should look like.
First, that only works for potential biases you already know about and can anticipate, or can spot in the output. If the result is an egregious ripoff or an artist you’ve never heard of, or is the likeness of a model or actor you’re not aware of, how would you know?
Second, it doesn’t create the kind of originality we want. It just limits the kind of unoriginality we are getting.
This is, however, a major improvement from ~6 months ago when even a single token `hi` from an agentic CLI could take >3 minutes to generate a response. I suspect the parallel processing of LMStudio 0.4.x and some better tuning of the initial context payload is responsible.
6 months from now, who knows?