That makes sense. I think of it like visual movement, a difference in position over time. Even a single step represents a change in position, even if the time increment is very small. The transition is the animation, the duration would be 2 frames: up, and down.
In a nutshell: put two different frames in sequence, and you have an animation.
But the up and down really consists of two user actions, pressing the mouse button, and releasing it again. See drag-and-drop for example, where that distinction is important. It’s even important for simple buttons: You can generally abort a button press by moving the mouse pointer outside the button area before releasing the mouse button again. In that case, the button action isn’t triggered. The pressed-down state visualizes that the action will be triggered when you release the mouse button while still in the button area.
Animation is when more than one consecutive step happens on it’s own. I’d argue that even tooltips appearing and disappearing after a timeout doesn’t constitute an animation, because the disappearance isn’t immediately consecutive with the appearance, and (maybe more importantly) the intervening state of the tooltip being shown is meaningful to the user as a distinct state.
When does a woodworker cease to be one? When he uses a handsaw? A circular saw? A sawmill?
> When a tool blurs the line between who performed the task
Who saws the wood? He who operates the tool, or the tool performing its function? What is the value of agency in a business that, supposedly, sells product? Code authorship isn't like writing, is it? Should it be?
Or is the distinction not in the product, but in the practice? Is the difference in woodworking vs lumber processing?
Or is it about expectation? e.g. when we no longer expect a product to be made by hand due to strong automation in the industry, we prepend terms such as "hand-made" or "artisanal". Are we currently still in the expectation phase of "software is written by hand"?
I have no dog in this race, really. I like writing software, and I like exploring technology. But I'm very confused and have a lot of questions that I have trouble answering. Your comment resonated though, and I'm still curious about how to interpret it all.
There's also the perception of time. How long did it take you to write that email/comment/code? Did you laborious pour over every word, every line, for hours, before you hit send, regardless of if you used an LLM or not. Or did you spend barely five minutes, and just pasted whatever ChatGPT shit out?
That's the real question that people are trying to suss out.
Have you considered the Framework Desktop setup they mentioned in their announcement blog post[0]? Just marketing fluff, or is there any merit to it?
> The top-end Ryzen AI Max+ 395 configuration with 128GB of memory starts at just $1999 USD. This is excellent for gaming, but it is a truly wild value proposition for AI workloads. Local AI inference has been heavily restricted to date by the limited memory capacity and high prices of consumer and workstation graphics cards. With Framework Desktop, you can run giant, capable models like Llama 3.3 70B Q6 at real-time conversational speed right on your desk. With USB4 and 5Gbit Ethernet networking, you can connect multiple systems or Mainboards to run even larger models like the full DeepSeek R1 671B.
I'm futsing around with setups, but adding up the specs would give 384GB of VRAM and 512GB total memory, at a cost of about $10,000-$12,000. This is all highly dubious napkin math, and I hope to see more experimentation in this space.
There's of course the moving target of cloud costs and performance, so analysing break-even time is even more precarious. So if this sort of setup would work, its cost-effectiveness is a mystery to me.
Strix Halo does not run a 70B Q6 dense model at real-time conversational speed - it has a real-world MBW of about 210 GB/s. A 40GB Q4 will clock just over 5 tok/s. A Q6 would be slower.
It will run some big MoEs at a decent speed (eg, Llama 4 Scout 109B-A17B Q4 at almost 20 tok/s). The other issue is its prefill - only about 200 tok/s due to having only very under-optimized RDNA3 GEMMs. From my testing, you usually have to trade off pp for tg.
If you are willing to spend $10K for hardware, I'd say you are much better off w/ EPYC and 12-24 channels of DDR5, and a couple fast GPUS for shared experts and TFLOPS. But, unless you are doing all-night batch processing, that $10K is probably better spent on paying per token or even renting GPUs (especially when you take into account power).
Of course, there may be other reasons you'd want to inference locally (privacy, etc).
Yeah it's only really viable for chat use cases, coding is the most demanding in terms of generation speed, to keep the workflow usable it needs to spit out corrections in seconds, not minutes.
I use local LLMs as much as possible myself, but coding is the only use case where I still entirely defer to Claude, GPT, etc. because you need both max speed and bleeding edge model intelligence for anything close to acceptable results. When Qwen-3-Coder lands + having it on runpod might be a low end viable alternative, but likely still a major waste of time when you actually need to get something done properly.
I love Framework but it's still not enough IMO. My time is the most valuable thing, and a subscription to $paid_llm_of_choice is _cheap_ relative to my time spent working.
In my experience, something Llama 3.3 works really well for smaller tasks. For "I'm lazy and want to provide minimal prompting for you to build a tool similar to what is in this software package already", paid LLMs are king.
If anything, I think the best approach for free LLMs would be to run using rented GPU capacity. I feel bad knowing that I have a 4070ti super that sits idle for 95% of the time. I'd rather share an a1000 with bunch of folks and have that run at close to max utilization.
That is, in every sense of the term, their problem.
I will switch to whatever is best for me at a good price, and if thats not sustainable then I'll be fine too; I was a developer before these existed at all, and local models only help from there.
The framework desktop isn't really that compelling for work with LLMs, it's memory bandwidth is very low compared to GPUs and Apple Silicon Max/Ultra chips - you'd really notice how slow LLMs are on it to the point of frustration. Even a 2023 Macbook Pro with a M2 Max chip has twice the usable bandwidth.
The mentioned CPU uses unified memory for its built in GPU / NPU. I.e. some portion of what could ordinarily be system RAM is given to the GPU instead of the CPU
The memory bandwidth is crap and you’ll never run anything close to Claude on that unfortunately. They should have shipped something 8x faster at least 2 tb/s bandwidth
Their work has molded a lot of my views on technology, such a breath of fresh air when I first found out about them! They really inspired me to look at my own work and ask how to make it more resilient, how to decrease dependencies.
From my experience, achieving provider independence boils down to: own your stack, work offline-first, test failure modes constantly.
Been trying to get a setup going with NixOS + local AI + custom CLI tools for development work, and I never would have thought to pursue this sort of thing if I hadn't found these people. Great stuff!
Oh and ORCA is a LOT of fun! Give it a shot if you're into sounddesign, or generative electronic music stuff: https://100r.co/site/orca.html
Offline solutions or not totally internet dependent ones can bring a lot of value to the users. So many things are webapps that could easily work offline. Sure, the web is easier and has more reach, but when sites, apps, or games vanish, I start to miss the 90-00 CD days.
But again, what is best for the user is probably not the best business idea...
Thanks for sharing your setup! I'm also very interested in running AI locally. In which contexts are you experiencing decent success? eg debugging, boilerplate, or some other task?
I'm running qwen via ollama on my M4 Max 14 inch with the OpenWebUI interface, it's silly easy to set up.
Not useful though, I just like the idea of having so much compressed knowledge on my machine in just 20gb. In fact I disabled all Siri features cause they're dogshit.
Well then. I'm Belgian, and I was considering exploring security more professionally. But I think I'll just stick to hobby hacking, and pray I never discover a vulnerability. Yikes!
The culpability we share for "churning out shitty code" is spot-on imo. There's been so much incentive to shipping "good enough", that even the definition of "good enough" has been backsliding. Sometimes even to the point of "whatever we can get away with", in the name of speed of delivery.
That friction has always been there, in my experience. But this is the first time I'm seeing it happening around me. LLM's are so divisive, and yet the more extreme positions on either side seem to be digging their heels in, as if the tech is not in flux.
Your improv comparison resonates. I'd add that it also reflects the positives of improv. The freedom, being present, lower inhibition/impediments, the positive feedback loop of AI's "yes, and!". These are useful for the same reason improv is to written material. As a sketchbook, trying ideas and really finding the bellylaughs you write into the stage routine.
Ooh, definitely trying this out! I ended up homebrewing a whole context maintainance ritual, but that was a pain to get an AI agent to consistently apply, so it spun out into building a whole project management... thing.
This looks much more thought out, thanks for sharing!
That's an interesting idea! I struggle with the same issues you've mentioned, that space between the IDE integrated option and pure CLI. Your comment sparked an idea of using something like vim or similar where you can edit the config on the fly and reload it. I wonder how hard it would be to bolt a prompt interface to the front to have it build the editor for you?
It would likely quickly devolve into typical editor config bikeshedding, only AI powered? At least for me, maybe someone smarter could streamline it enough to be useful though!
In a nutshell: put two different frames in sequence, and you have an animation.