Fedify is really fun to mess around with. The fedify tutorial was also really great for learning about developing with Activity pub and the fediverse in general.
I don't use Discord generally, but the fedify Discord is particularly useful, and I see how some discussions there have evolved into features in this release which is nice too!
I think the appeal and use case for Graphene and similar OS for most users is the Google/privacy/ownership type argument.
I do understand your point that people at risk of state level attacks might get a false surface level appearance of defence from this. But then anyone who's a target of state level attacks and is making OS decisions based on a surface level understanding of the tech is not going to have a good time anyway.
Personally I run an ollama server. Models load pretty quickly.
There's a distinction between tokens per second and time to first token.
Delays come for me when I have to load a new model, or if I'm swapping in a particularly large context.
Most of the time, since the model is already loaded, and I'm starting with a small context that builds over time, tokens per second is the biggest impactor.
It's worth noting I don't do much fancy stuff, a tiny bit of agent stuff, I mainly use qwen-coder 30a3b or qwen2.5 code instruct/base 7b.
I'm finding more complex agent stuff where multiple agents are used can really slow things down if they're swapping large contexts. ik_llama has prompt caching which help speed this up when swapping between agent contexts up until a point.
tldr: loading weights each time isn't much of a problem, unless you're having to switch between models and contexts a lot, which modern agent stuff is starting to.
I always felt the idea of trying to align your code, policy, software and infrastructure so it's easy to do compliance is the bread and butter of devops and devsecops in a regulated environment,
Is this an article by someone who's just done ISO 27001 for the first time and realised that?
I think it might be adults ignoring established grammar rules to make a statement about how they identify a part of a group of AI evangelists.
Kind of like how teenagers do nonsensical things like where thick heavy clothing regardless of the weather to indicate how much of a badass them and their other badass coat wearing friends are.
To normal humans, they look ridiculous, but they think they're cool and they're not harming anyone so I just leave them to it.
make a statement about how they identify a part of a group
That’s what it is. A shibboleth. They’re broadcasting group affiliation. The fact that it grates on the outgroup is intentional. If it wasn’t costly to adopt it wouldn’t be as honest of a signal.
On a scale from the purest, not lifting a finger anymore than to strike a keyboard, of virtue signaling to putting one's money where their mouth is this shibboleth is about as costly as the tidal zone is dry land.
You convey tone through word choice and sentence structure - trying to convey tone through casing or other means is unnecessary and often just jarring.
Like look at the sentence "it has felt to me like all threads of conversation have veered towards the extreme and indefensible." The casing actually conflicts with the tone of the sentence. It's not written like a casual text - if the sentence was "ppl talking about this are crazy" then sure, the casing would match the tone. But the stodgy sentence structure and use of more precise vocabulary like "veered" indicates that more effort has gone into this than the casing suggests.
Fair play if the author just wants to have a style like this. It's his prerogative to do so, just as anyone can choose to communicate exclusively in leetspeak, or use all caps everywhere, or write everything like script dialogue, whatever. Or if it's a tool to signal that he's part of an in-group with certain people who do the same, great. But he is sacrificing readability by ignoring conventions.
That's politicians and media influencers of all ages, not the general public
The new generation of tiktok / podcast "independent journalists" is a serious issue / case of what you describe. They are many doing zero journalism and repeating propaganda, some paid by countries like Russia (i.e. Tim Pool and that whole crew that got caught and never face consequences)
I run Qwen3-Coder-30B-A3B-Instruct gguf on a VM with 13gb RAM and a 6gb RTX 2060 mobile GPU passed through to it with ik_llama, and I would describe it as usable, at least. It's running on an old (5 years, maybe more) Razer Blade laptop that has a broken display and 16gb RAM.
I use opencode and have done a few toy projects and little changes in small repositories and can get pretty speedy and stable experience up to a 64k context.
It would probably fall apart if I wanted to use it on larger projects, but I've often set tasks running on it, stepped away for an hour, and had a solution when I return. It's definitely useful for smaller project, scaffolding, basic bug fixes, extra UI tweaks etc.
I don't think "usable" a binary thing though. I know you write lot about this, but it'd be interesting to understand what you're asking the local models to do, and what is it about what they do that you consider unusable on a relative monster of a laptop?
I've had usable results with qwen3:30b, for what I was doing. There's definitely a knack to breaking the problem down enough for it.
What's interesting to me about this model is how good it allegedly is with no thinking mode. That's my main complaint about qwen3:30b, how verbose its reasoning is. For the size it's astonishing otherwise.
Honestly I've been completely spoiled by Claude Code and Codex CLI against hosted models.
I'm hoping for an experience where I can tell my computer to do a thing - write a code, check for logged errors, find something in a bunch of files - and I get an answer a few moments later.
Setting a task and then coming back to see if it worked an hour later is too much friction for me!
This is interesting! Do you have any more info? I just discovered sunshine/moonlight work surprisingly well on Quest 3 for remote desktop to linux, I'd not really considered Termux X11 natively though.
> Hypergrowth is a synonym for unsustainable growth.
No it's not. It's often a recognition that just one or two, maybe three companies will end up dominating a particular market simply due to economies of scale and network effects... and so the choice is between hypergrowth to try to attain/keep the #1 or #2 position, or else go out of business and lose all the time, money, and effort you already put into it.
Nothing whatsoever makes it unsustainable. You might be offering cheaper prices during hypergrowth -- those are unsustainable -- but then you raise prices back to sustainable levels afterwards. And consumers got to benefit from the subsidized prices, yay! The business is entirely sustainable, however.
Uber is the poster child of hypergrowth. They became profitable in 2023. And their stock price has ~doubled since. Totally sustainable.
> Hypergrowth is a synonym for unsustainable growth. The headline here is business breaks tech, again.
That just isn't true. Plenty of services do just fine after experiencing hypergrowth, and a few outages are not an example of tech breaking. That's a fairly common occurrence.
I'm not saying companies can't do fine in many respects after experiencing hypergrowth, but like you said, that's after hypergrowth - the hypergrowth isn't sustainable.
And I disagree: outages are a fairly literal example of tech breaking. A few outages aren't catastrophic though, and I agree are fairly common. I know it's cliche, but "move fast and break things" might get growth, but it also gets broken things along the way.
Hypergrowth is growth and churn at the expense of sustainability and stability. It can definitely be fun though!
I use the built in derp server. I have run a standalone derp server hackily deployed for a month, it worked fine but didn't provide much benefit over the built in one. It was basically just a go package. If you're familiar with running Go code, it's straight forward to run, it's very, very light/unproductionised.
I have a todo task to integrate derp into my headscale deployment properly ("finish ansible role"), but when I picked it up last month, I noticed tailscale had release relay nodes, and they seem like they'd be better suited than dedicated derp nodes, but headscale hasn't implemented support for them yet.
tldr: not to hard to host DERP, just needs publicly facing endpoint (incl. letsencrypt) but the built in one is fine. But relay nodes look like they'll be a better option for most and I'd guess will be implemented in headscale sometime this year.
I did a lot of postgraduate research around crypto from 2011 - 2016. There are a lot of parallels, and your message adds to them.
"x is different because we can actually do useful stuff with it" is what every x enthusiast deep in an x bubble or pump n dump says about x.
When the next big tech bubble comes along in 10 - 15 years, there will be people saying exactly what you just said: "NextBigTech you can actually use to build useful things in the world, and NextBigTech thing actually does that building, not just what LastBigTech thing (AI) did, that obviously didn't deliver the utopia it promised".
I wonder what it'll be. AGI? Quantum computing? Brain computer interfaces?
I'd love to pickup this conversation again with you in 15 years.
The difference is, for the claims of blockchain, it was trivially easy to look at and say, "This could have been a database".
Almost every single blockchain "product" (outside of the peer-to-peer trustless currency ) could have been a database.
This time the cost of entry of small software products has cratered.
For example, I was able to knock up a tool for a guide-maker for a niche game I play that gets about 500 peak daily players on steam.
The entire motivation for the tool is because I personally struggle to follow their well written guide. It takes a reasonable amount of focus and care to adjust a bunch of settings between "runs" based on the guide as written. Getting one of these wrong can set you back a bunch of time without even realising what went wrong.
These settings have an import/export feature in game, but that only allows for a few saved presets, and isn't easy to share.
So I've made a tool that lets people create, organise and share these presets.
Literally the only user is likely to be this single guide maker. Possibly a few others might use it to consume their guides.
Without claude-code, it would never have been reasonable for me to invest the time to make the tool. It would have been an idle dream sitting on my "I wish I had the discipline to make this" pile.
But I don't have the discipline to make that kind of project. I'm too easily distracted, and I'd have got bored of the idea before I'd finished establishing all the boilerplate, let alone before ironing out all the bugs. I also don't have the front-end talent to make things look pretty with CSS.
The LLM doesn't get demotivated. It doesn't get bored, and compressed the building of the prototype down to a day or two. Enough to keep my interest until feedback arrived. A week later, and it's shipped with 50+ issues raised and fixed.
> The difference is, for the claims of blockchain, it was trivially easy to look at and say, "This could have been a database".
Yes, and it's trivial now to look at so many LLM startups and say "that could be a complex if/else statement" or "that could be an Alexa skill" or "I can do that already with my mobile phone".
Everything you've just described about the impact of the friction of you doing your work, and how AI has solved that, is essentially what crypto promised and delivered for a certain subsect of finance, which is why crypto still has market caps in the trillions.
AI will do the same, make a notable change on a certain sub sector of work.
My point isn't that AI is useless, it isn't that it won't add value. It's hugely valuable and will change the world in way people don't even realise, just like dotcom and crypto did and do. Right now though, the disruption and investment is disproportionate and speculative, which is why it has parallels to crypto and dotcom.
Crypto only looked like it solved friction in places with messed up banking.
To people in the EU/UK who had free faster payments before Bitcoin was a thing, it never looked like an improvement at all.
The solution expensive and slow banking was always political, not technical.
Crypto was purely speculative, because it was never solving real problems.
I'm not speculating about problems being solved, I'm out there solving real problems. No-one in "blockchain" ever got to say the same. It was always a promise of things being better. And for many people, things already were better than what was being promised.
> Crypto only looked like it solved friction in places with messed up banking.
AI only solved friction in places work messed up, like giving developers enough time to program stuff.
> To people in the EU/UK who had free faster payments before Bitcoin was a thing, it never looked like an improvement at all.
To tech companies who were already content with their development team's velocity, AI never looked like an improvement at all.
> The solution expensive and slow banking was always political, not technical.
The solution to developers not coding fast enough was always political, not technical.
> Crypto was purely speculative, because it was never solving real problems.
AI was purely speculative, because it was never solving any problems. (Sorry, I have to point out here you said higher up a bunch of problems that Crypto was solving, and now you're saying how it was also speculative, which is the parallel between crypto that you were trying to argue against).
> I'm not speculating about problems being solved, I'm out there solving real problems. No-one in "blockchain" ever got to say the same. It was always a promise of things being better. And for many people, things already were better than what was being promised.
Again, either you're right above when you said crypto solved problems where banking was bad, or you're right here where you're saying blockchain never solved anything.
You're going round in circles trying to find a way that AI isn't like crypto whilst giving more examples of how AI is like crypto.
Remittance, micropayments, unbanked people, unstable economies: all of these did, can and do have problems solved by blockchain.
No, AI is different because we're actively doing useful stuff with it. It's not "this will replace x soon", it's "I don't x anymore because it would be crazy not to use AI for this, which is what I do on a daily basis already."
"No, NewBigTech _is_ different, trust me, I'm an expert in all the things this tech does for us now."
Crypto was doing stuff in 2012, it contributed to a huge amount of global remittance payments even then, and probably still does now.
I was working with intelligence agencies, and crypto was being widely use in a variety of crimes too. Both of those are still probably true, and then there's now probably an entire industry shipping literally billions of $ around the world every day as settlement between exchanges in crypto.
As someone who was approached as an expert at the time, I was saying all the things you're saying to me now at the time about Crypto.
The point is I was right at the time: crypto was being used, and still is. You're right, AI is being used, and still is.
The problem, or the bubble or the pump/dump/parallel element is that the amount of attention and capital flowing around the area is vastly more than the current use cases and is therefore largely speculative.
This is true in AI too. Yes people are using it already daily, but if everyone is already using AI for everything, then why do we need a few hundered billion dollars more of datacentres, chips, RAM and powergen, what's that for...? "Future AI stuff..." soooo.... speculative...?
> if everyone is already using AI for everything, then why do we need a few hundered billion dollars more of datacentres, chips, RAM and powergen
Because everyone is already using AI for everything. That proves its value.
But of course the future isn't evenly distributed yet and only a tiny fraction of 1% of us are using AI all day so far. But once somebody gets converted they don't / can't go back to the old way. And converting them is pretty much instant.
A big difference between crypto and AI is around how crypto could paint a better future once we rebuilt most of our transactional infrastructure and persuaded a quorum to move onto it, and how I personally am benefiting from AI day by day to build myself tools and infrastructure for my life, work, businesses and finances only requires me to accept this change. Everyone else in the world could reject AI-augmented engineering, and I will still be tremendously better off.
AI will offer us a utopia when we've finished rebuilding all of our electricity infrastructure and finally got enough AI datacenters, and stopped muggle humans buying memory and GPUs because AI needs them more.
I'm pretty certain I read the sentence above about crypto sometimes around 2015.
I am guessing this is sort of ProgrammableWeb 2.0.
Disintermediation is the common thread in all of this.
Will be interesting to see solutions arising for developers to monetize their open-source contributions. Pull Request = Push Demand so perhaps there should be a cost attached to that especially knowing that AI will eventually train on it.
In absolute numbers I’m not sure that’s true. But 4GLs aren’t what replaced assembly for anyone. C, C++ and Pascal were the most common assembly replacements.
As for C and C++, there definitely aren’t fewer of them in absolute terms. And even in relative terms they are still incredibly popular.
All of that is beside the point though. The hype around 4GLs wasn’t that they would replace older programming languages. The hype was that they’d replace programming as a profession in general. You wouldn’t need programming specialists because domain experts could handle programming the computer themselves.
There was also a mini bubble around social media aggregators and RSS feeds culminating in sites like gada.be
I see the dynamic as follows (be warned, cynical take)
1) there are the youth who are seeking approval from the community - look I have arrived - like the person building the steaming pile of browser code recently.
2) there are the veterans of a previous era who want to stay relevant in the new tech and show they still got mojo (gastown etc)
In both cases, the attitude is not one of careful deep engineering, craftsmanship or attention to the art, instead it reflects attention mongering.
I don't know how relevant to the world another b2b sass platform is. You could easily grab one from github which is where AI is got the data to build one in the first place.
Meanwhile cryto offers an alt banking platform used by many who have been debanked.
I could be building a game, a home blog, any sort of OSS.
The point im making is that crypto exists purely as this alt investment, trading tokens to get rich. Im skeptical its actually being used as a currency in any real currency-fashion. But now seems to be stuck in pump and dump schemes.
Meanwhile, AI is enabling people right now, today, to help build and learn things they normally wouldnt do.
"Im skeptical its actually being used as a currency in any real currency-fashion."
I don't know what qualifies as real currency-fashion to you but you can purchase things and services from many different places. At times credit card payment processors may be down you can use bitcoin to pay. It doesn't have to replace a currency it can be used as another way to spend or collect money.
It's also good in situations where you want to accept money but not open yourself up to risk like a donation button. A risk exists selling a product via credit card around chargebacks this method removes that risk.
In practice it works well and is being used. It's not replacing the dollar but it doesn't need to.
Internet -> Obvious value, more efficient communication, knowledge sharing, transactions etc.
AI -> Value is very obvious to me as a developer.
Blockchain -> ? What is the actual value? Something about decentralized finance and not having to trust anyone? And the tradeoff is every transaction costs $10 or more. It was always a dubious proposition with its "value" driven by speculative investment which fueled the hype machine.
Yeah there are parallels in that in all cases people got really excited about something tech and poured a bunch of money in, but the outcomes and actual amount of value derived can be wildly different.
You see that you're assessing AI from the depth of the AI bubble and coming to the same conclusion about AI as people who assess crypto from the depths of a crypto bubble came to, right?
The dotcom bubble was due to all the useless, speculative stuff people were doing with the internet, not the useful bits you referred to that are still around which we use today.
The AI bubble is coming from all the useless, speculative stuff people were doing with AI, not the useful bits you referred to that are still around which we use today.
... You see where I'm going with this, right?
Crypto use cases that are still around that get used today are are not hard to find for anyone sincerely wanting to accept that they exist. I've already listed a few in other posts. That's not my point though.
My point is there's a speculative bubble around AI, and that's got a lot of parallels to the speculative bubbles around crypto and dotcom. Everything you've said supports the idea that you're unaware that you're talking from inside a bubble.
> It was always a dubious proposition with its "value" driven by speculative investment which fueled the hype machine.
Explain to me - without speculation or hype - why we still need trillions more datacentres, power, water, money and everything else for AI, if we're already using it and it's already here and we're already getting the most out of it?
I said explain without speculation. You've just given 2 points that both reduce down to "potential but unquantifiable future benefits", or "speculation".
> There is no reason to believe this will slow down any time soon
All investment is kind of speculative: you're betting on the future, but typically for a reason.
A bubble, IMO, is what emerges when lots of people bet on the future purely because they see others betting on the future. People often don't realise they're doing it, like the people building AI SaaS apps. They think they're going to get rich because they think everyone is using the bubble tech.
Most of the apps are rubbish and could be implemented with something other than AI, same as a lot of crypto apps or dotcom websites in the bubble periods.
They look like they're useful in the bubble, because they're getting regular customers (as everyone comes in to try this newfangled AI/Crypto/dotcom tech) but once everyone's tried it, the only people who come back are the ones with the actual use for it, and there's never enough use to support the hype created in bubbles.
> The model absolutely can be run at home. There even is a big community around running large models locally
IMO 1tln parameters and 32bln active seems like a different scale to what most are talking about when they say localLLMs IMO. Totally agree there will be people messing with this, but the real value in localLLMs is that you can actually use them and get value from them with standard consumer hardware. I don't think that's really possible with this model.
Local LLMs are just LLMs people run locally. It's not a definition of size, feature set, or what's most popular. What the "real" value is for local LLMs will depend on each person you ask. The person who runs small local LLMs will tell you the real value is in small models, the person who runs large local LLMs will tell you it's large ones, those who use cloud will say the value is in shared compute, and those who don't like AI will say there is no value in any.
LLMs which the weights aren't available are an example of when it's not local LLMs, not when the model happens to be large.
> LLMs which the weights aren't available are an example of when it's not local LLMs, not when the model happens to be large.
I agree. My point was that most aren't thinking of models this large when they're talking about local LLMs. That's what I said, right? This is supported by the download counts on hf: the most downloaded local models are significantly smaller than 1tln, normally 1 - 12bln.
I'm not sure I understand what point you're trying to make here?
Mostly a "We know local LLMs as being this, and all of the mentioned variants of this can provide real value regardless of which is most commonly referenced" point. I.e. large local LLMs aren't only something people mess with, they often provide a lot of value for a relative few people rather than a little value for a relative lot of people as small local LLMs do. Who thinks which modality and type brings the most value is largely a matter of opinion of the user getting the value, not just the option which runs on consumer hardware or etc alone.
You're of course accurate that smaller LLMs are more commonly deployed, it's just not the part I was really responding to.
32B active is nothing special, there's local setups that will easily support that. 1T total parameters ultimately requires keeping the bulk of them on SSD. This need not be an issue if there's enough locality in expert choice for any given workload; the "hot" experts will simply be cached in available spare RAM.
When I've measured this myself, I've never seen a medium-to-long task horizon that would have expert locality such that you wouldn't be hitting the SSD constantly to swap layers (not to say it doesn't exist, just that in the literature and in my own empirics, it doesn't seem to be observed in a way you could rely on it for cache performance).
Over any task that has enough prefill input diversity and a decode phase thats more than a few tokens, its at least intuitive that experts activate nearly uniformly in the aggregate, since they're activated per token. This is why when you do something more than bs=1, you see forward passes light up the whole network.
Thing is, people in the local llm community are already doing that to run the largest MoE models, using mmap such that spare-RAM-as-cache is managed automatically by the OS. It's a drag on performance to be sure but still somewhat usable, if you're willing to wait for results. And it unlocks these larger models on what's effectively semi-pro if not true consumer hardware. On the enterprise side, high bandwidth NAND Flash is just around the corner and perfectly suited for storing these large read-only model parameters (no wear and tear issues with the NAND storage) while preserving RAM-like throughput.
I've tested this myself often (as an aside: I'm in said community, I run 2x RTX Pro 6000 locally, 4x 3090 before that), and I think what you said re: "willing to wait" is probably the difference maker for me.
I can run Minimax 2.1 in 5bpw at 200k context fully offloaded to GPU. The 30-40 tk/s feels like a lifetime for long horizon tasks, especially with subagent delegation etc, but it's still fast enough to be a daily driver.
But that's more or less my cutoff. Whenever I've tested other setups that dip into the single and sub-single digit throughput rates, it becomes maddening and entirely unusable (for me).
Bits per weight, its an average precision across all the weights. When you quantize these models, they don't just used a fixed precision size across all model layers/weights. There's a mix and it varies per quant method. This is why you can get bit precision that arent "real" in a strict computing sense.
e.g. A 4-bit quant can have half the attention and feed forward tensors in Q6, and the rest in Q4. Due to how block-scaling works, those k-quant dtypes (specifically for llama.cpp/gguf) have larger bpw than they suggest in their name. Q4 is around ~4.5 bpw, and Q6 is ~6.5.
I was trying to correct the record that a lot of people will be using models of this size locally because of the local LLM community.
The most commonly downloaded local LLMs are normally <30b (e.g. https://huggingface.co/unsloth/models?sort=downloads). The things you're saying, especially when combined together, make it not usable by a lot of people in the local LLM community at the moment.
I don't use Discord generally, but the fedify Discord is particularly useful, and I see how some discussions there have evolved into features in this release which is nice too!
reply