By desk you mean that "Mac mini"? Because it is pricey. In my country it is 1000 USD (from Apple for basic M4 with 24GB). My desk was 1/5th of that price.
And considering that this Mac mini won't be doing anything else is there a reason why not just buy subscription from Claude, OpenAI, Google, etc.?
Are those open models more performant compared to Sonnet 4.5/4.6? Or have at least bigger context?
Right now, open models that run on hardware that costs under $5000 can get up to around the performance of Sonnet 3.7. Maybe a bit better on certain tasks if you fine tune them for that specific task or distill some reasoning ability from Opus, but if you look at a broad range of benchmarks, that's about where they land in performance.
You can get open models that are competitive with Sonnet 4.6 on benchmarks (though some people say that they focus a bit too heavily on benchmarks, so maybe slightly weaker on real-world tasks than the benchmarks indicate), but you need >500 GiB of VRAM to run even pretty aggressive quantizations (4 bits or less), and to run them at any reasonable speed they need to be on multi-GPU setups rather than the now discontinued Mac Studio 512 GiB.
The big advantage is that you have full control, and you're not paying a $200/month subscription and still being throttled on tokens, you are guaranteed that your data is not being used to train models, and you're not financially supporting an industry that many people find questionable. Also, if you want to, you can use "abliterated" versions which strip away the censoring that labs do to cause their models to refuse to answer certain questions, or you can use fine-tunes that adapt it for various other purposes, like improving certain coding abilities, making it better for roleplay, etc.
You don't need that much VRAM to run the very largest models, these are MoE models where only a small fraction is being computed with at any given time. If you plan to run with multiple GPUs and have enough PCIe lanes (such as with a proper HEDT platform) CPU-GPU transfers start to become a bit less painful. More importantly, streaming weights from disk becomes feasible, which lets you save on expensive RAM. The big labs only avoid this because it costs power at scale compared to keeping weights in DRAM, but that aside it's quite sound.
While you can run with weights in RAM or even disk, it gets a lot slower; even though on any given token a fraction of the weights are used, that can change with each token, so there is a lot of traffic to transfer weights to the GPU, which is a lot slower than if it's directly in GPU RAM. And even more slower if you stream from disk. Possible, yes, and maybe OK for some purposes, but you might find it painfully slow.
I have the same setup (M4 Pro, 24GB). The e4b model is surprisingly snappy for quick tasks. The full 26B is usable but not great — loading time alone is enough to break your flow.
Re: subscriptions vs local — I use both. Cloud for the heavy stuff, local for when I'm iterating fast and don't want to deal with rate limits or network hiccups.
I care because every second of that startup time is lost productivity and focus. For me and any developer on my team. Hot reload only works if no class or method has changed so that's not a solution. I've worked on codebases of similar size and complexity in many languages, and the developer experience of a compile and restart that takes less than five seconds is game changing.
I can imagine a full build of the project(s) in TFA are on the order of several minutes to build/run the first, and maybe every time. I remember working on projects before SSDs were common that would take on the order of a half hour or more... the layers of abstraction were so that you literally had to thread through 15+ projects in two different solutions in order to add a single parameter for a query and it would take a couple weeks to develop and test.
That said, I did catch up on my RSS feeds during that job.
AfD is a far right populist party in the EU's biggest economic powerhouse country, whose explicit goals are to leave the EU (they probably can't due to the German constitution), exit the eurozone, withdraw from the Paris climate deal, leave NATO, and cozy up with Russia.
It's not hard to imagine what kind of damage they could do to the EU if they took power in Germany and started working with Hungary to block EU legislation, veto sanctions, defund programs, etc.
Sure, those EIO will be held if Hungary starts applying EIO that it got (e.g. for former Ministry of Justice of Poland which awaits trail, he sits comfortably in Hungary).
Let's hope elections there will change Orban into something saner.
Trends are various. You had Poland remove rightwing goverment 2 years ago (yes and elect righwing president few months ago). Romania electing a European centric president.
We can go on. EU is not a single country, not a single community of people.
It's happening in the EU too, just not at such a fast pace than in other regions. And it's still far away from authoritarianism.
Currently it's just smaller pieces and no bigger agenda is visible (or even exiting). But there are constantly new regulations that would make an authoritarian coup (like currently in the US) easier.
Thanks for reference the docs. For me an agent is an entity that you can ask something and it talks to you and try to do what you asked to do.
In this case if you have a server with an endpoint you can run opencode when the endpoint is called and pass it the prompt. Opencode then think, plan and act accordingly to you request, possibly using tools, skills, calling endpoints,etc.
I'm still kind of confused, but opencode itself comes with several agents built-in, and you can also build your own. So what does it mean to use opencode itself as an agent?
And considering that this Mac mini won't be doing anything else is there a reason why not just buy subscription from Claude, OpenAI, Google, etc.?
Are those open models more performant compared to Sonnet 4.5/4.6? Or have at least bigger context?
reply