Hacker Newsnew | past | comments | ask | show | jobs | submit | soganess's commentslogin

There are fairly mainstream devices with decent Vulkan support but poor hardware decode coverage for the codecs people actually get on the web. Polaris era Radeons have H.264 and HEVC decode, but VP9 support is absent (or not exposed in many common Linux paths) so YouTube is sloppy. The Raspberry Pi 5 is another example: it has hardware HEVC decode, but YouTube 4K is generally VP9 or AV1 rather than HEVC, and Pi 5 does not advertise VP9 hardware decode.

I think Vulkan Video is just another api to access those hardware decoders. It's not going to bring support for codecs to hardware without the support.

You are 100% right! My mistake. It is too late for me to edit my previous comment. But I appreciate the correction.

Yea, I'm most-hopeful for some of my lowest-end devices. Those as-cheap-as-possible CPUs tend to have a very strange set of accelerators for codecs.

>but VP9 support is absent (or not exposed in many common Linux paths) so YouTube is sloppy [...] but YouTube 4K is generally VP9 or AV1 rather than HEVC

I installed linux yesterday. Youtube doesn't let you backtrack to VP9 in the user profile setting. It serves AV1 by default now for all resolutions. Bummer if you're on older and/or low end hardware.

Maybe some browser extensions can force VP9?


I believe h264-ify still does. And many of the fancy "remake YouTube to not suck" extensions do as well.

>I believe h264-ify still does

To h264 yes, but not to VP9, no?


Yup, the name is a bit of a misnomer at this point: https://addons.mozilla.org/en-US/android/addon/enhanced-h264...

What you really do is disallow everything save vp9...


> Pi 5 does not advertise VP9 hardware decode

because it does not have it


I figured, but all I knew for certain was that it did not advertise it.

Maybe it was silently presented in silco but lacked the software bits. It's a pretty big omission considering the Pi5's release date.


On pi5 they even removed the hw accelerated h264 encoder. The soc used in raspberry pi is basically what scraps Broadcom allow them to have. For example it took to pi5 to add accelerated aes.

1.5 Billion? Less than that 1.776B slush fund.


Since we out here doing ad hominems: if you don't think the code you wrote a few months ago is shit, you're already cooked, and judging by that comment, I'm betting you're crispy.

Even the best code I've ever written rots, not because it changes but because I get better. Now... I know thinking out of the box is hard... but one can get better a lot of different ways, and call me an optimist, but I'm betting folks can get better at producing tool-assisted code, too. Assuming how we do it now is how it will be forever is silly.

We're in the middle of figuring out the next level of mediated engineering. You-know-what or get off the pot, but stop pretending being a dinosaur is still in vogue. It's gauche, and trust me, we've seen it all before...

... back in my day we didn't have that fancy IDE autocomplete; we memorized every function in a library. IDEs?! ... Back in my day we didn't even have debuggers; we just knew how the code worked. Pish posh, back in my day the compiler didn't even produce error messages that made sense. Compilers? The faux luxury of it all! Back in my day, if you actually cared about your code, you wrote the assembly by hand.


The code I write is pretty good, and it stays good throughout the years because it was already good. If you're frequently finding month old code to be shit, your code is just shit to begin with.


Or .. you are working in an unstable environment. If the system you take as a base changes frequently(like the web), your code becomes bad, outdated code, no matter how shiny it was before. But if it was good, migration and maintaining it is easier.


React is 13 years old


So?

Are you implying I should have switched to react 13 years ago, when it was just another framework who come and go?

And then my experience would have been stable?

(I kind of doubt it, when my unstable web experience was caused by changes into how canvas works, webrtc, indexeddb, webgl, webgpu. I don't make simple UI's)

Or well, just that "let" was introduced makes all my beautiful old code with only "var" bad code by modern standards.


"let" is 11 years old.

My point is that the web is not really an unstable platform anymore, it's just something people say that used to be true. Also using var and other superceded constructions doesn't make the code bad; that's not what bad code is. I think you know this.


I think I know that different people have different metrics what bad code is.

My main metric is, if it is readable. So yes, I also don't consider my old code with var bad. But other people do.


I do believe we call that a pilot program. If it is legal for 50, why not 5 million?


If this works out then 2027 might actually be the year of desktop Linux... but like in the saddest way possible.


The GamersNexus arc is such a YouTube success story: from a dude uploading videos, to superbly rigorous methodology in top tier hardware reviews, to occasional news summaries, to journalistic longforms, to straight up advocacy, and somehow back to a humble written tech site.

It’s like a tech Benjamin Button story. Thanks, Steve!

...oh heck, I can't handle it anymore, take my money already.


I understand their points. But to me they do too much rage baits these days that I can't bring myself to watch their stuff. Why would I let my self get angry lol.


> they do too much rage baits these days

Yeah, I agree but given the state of YouTube and the fact GN has an actual office, lab and employees, I also understand why they have no choice but to the play the game at least a bit. YouTube has repeatedly cut the rev rates to the point that even fairly large channels are still under pressure if they have meaningful recurring expenses.

Basically, I just skip the rage bait and watch the good stuff. I also love that GN still publishes print articles, which has essentially zero upside for them.


Better to report on reality than to report positively. TV isn’t always about entertainment, after all!


Reality has both negatives and positives. Gamers Nexus clearly lie on the other side of the spectrum where they overwhelmingly choose the negative stuffs, hence rage baits.


I cannot name any positives from the past twelve months in the field of PC hardware, that Gamers Nexus operates in and around with their work. Perhaps I’ve overlooked something?


Yes you overlooked a lot. For example two weeks ago they made a feel good video about building PCs for two people who lost everything in a house fire.

https://www.youtube.com/watch?v=kboQ0quk1uM


Yes; while that doesn’t have bearing on business or technical matters of the PC hardware market, it’s certainly positive-feelgood content that the comment I originally replied to might appreciate as good TV!


GN aren't the rage baiters. It's the companies that are handing this content to them on a silver platter that are. Steve and team have done excellent work over the years. Have things turned more directed at the enshittification? Of course, that's what they're reporting on. He's been iced out of pre-reviews for ending up on these corporate naughty lists plenty of times. But Steve is at least saying it out loud and as it is. Just look at the DRAM cartel v2 and AMD going against their own corporate & political donations policies. Sorry, but Steve isn't the rage baiter in this story.


Getting so close to good!

I consider Gemma 4 31B (dense / no MoE), the new baseline for local models. It's obviously worse than the frontier models, but it feels less like a science experiment than any previous local model I’ve run, including GPT OSS 120B and Nemotron Super 120B.

On my M5 Max with 128 GB of RAM and the full 256K context window, I see RAM use spike to about 70 GB, with something like 14 GB of system overhead. A 64 GB Panther Lake machine with the full Arc B390, or a 48 GB Snapdragon X2 Elite machine, could probably run it with a 128K to 256K context window. Maybe you can squeeze it into 32GB (27.5GB usable) with a 32K context window?

Even last year, seeing this kinda performance on a mainstream-ish/plus configuration would have seemed like a pipe dream.


Gemma 4 IS good, I've literally had it get a thing right that Opus 4.7 missed, the edges are ragged and I'm reliably finding usecases where it's basically equivalent. Ultimately the metric is "what can I RELY on it to do". Opus definitely knows a lot more and can sometimes do much more complex tasks, but especially when you're good about feeding the context Gemma is amazing. The difference between the sets of things I trust the two models to do is surprisingly small. I've had some insanely good runs recently working on my personal tooling as well as random projects. The first local model that can reliably left to implement features in agentic mode on non-trivial projects.

https://thot-experiment.github.io/gradient-gemma4-31b/

This is a relatively complex piece of tooling built entirely by Gemma 4 inside OpenCode where I manually intervened maybe only 4 times over the course of a few hours.

running Q6_K_XL, 128k context @ q8 ~ 800tok/s read 16tok/sec write

eagerly awaiting turboquant and MTP in llama.cpp, should take me to 256k and 25-30tok/s if the rumors are true


Re-posting this from a buried comment for visibility because it's just so fucking impressive to me.

I went to the store to buy mixers and while I was out Gemma 4 31b got pretty far along with reverse engineering the bluetooth protocol of a desk thermometer I have. I forgot to turn on the web search tool, so it just went at it, writing more and more specific diagnostic logging/probing tools over the course of like 8 turns. It connected to the thermometer, scanned the characteristics and had made a dump of the bluetooth notification data. When I got back it was theorizing about how the data might be encoded in the bluetooth characteristics and it got into an infinite loop. (local models aren't perfect and i never said they were) I turned on the websearch tool and told it to "pick up the project where it left off", it read the directory, did a couple googles and had a working script to print temperature, humidity and battery state in like 3 turns. Reading back throught it's chain of thought I'm pretty sure it would have been able to get it eventually without googling.

idk, I thought I was a cool and smart engineer type for being able to do stuff like this, if my GPUs being able to do this more or less unsupervised isn't impressive I guess fuck me lol.


Had a very similar experience recently.

Built a basic authentication handler for this test just so it wouldn't be in the training data of either model. It had deliberately planted bugs. One was a hardcoded secret, another was a wrap-on-0xFFFFFFFF bug as a result of a malloc(length+1).

Qwen 3.6 found both, alongside two other issues I hadn't even considered, and the location of the magic value. GPT-5.4, though, missed the malloc issue (flagging memory exhaustion as the only risk), it missed a separate timing bug (it explicitly said the function was safe), and it hallucinated the location of the magic value. Qwen correctly identified the integer overflow. GPT-5.4 did not.

I then compared basic research between them using SearXNG for web search. For example, the current status of MTP in llama.cpp. Qwen 3.6 27B found the current PR, but flagged a related issue that shows the current implementation can be slower than just using a draft model right now. GPT-5.5 Thinking found the same PR, but didn't flag the downsides.

In a similar comparison, I asked both models how I should get started with ESPHome as a total beginner. ChatGPT suggested an ESP32-S3 and a BME280, which is... just not a good idea. It also talked about the ESP32-P4 not having Wi-Fi, and installing with HA or Docker. Meanwhile, Qwen3.6 27B said regular ESP32, DHT22, and mentioned HA, Docker, and pip as installation methods. While GPT was good, it was just throwing out jargon for a prompt that explicitly requested it for a beginner.

It kind of blew my mind that in all three of these, Qwen landed it better.


It definitly is and just a few years ago unheared of.

And we progress on so many different frontiers in parallel: Agent harness, Agent model, hardware etc.


A technology indistinguishable from magic.


The small Qwen 3.6 models handle context a little better than Gemma 4, but Gemma 4 26B in particular has such small and efficient solutions which are really smart for its weight class. I was so impressed with its performance in our benchmark upon release that I wrote a blog post about it [0], although its position on the leaderboard later fell a bit as we ran it in more long context agentic coding environments.

[0] https://gertlabs.com/blog/gemma-4-economics


Here's a great explanation why:

https://www.youtube.com/watch?v=_A367W_qvc8

Google's messing with the context. LOTS of speed for a little worse long-context performance.


i use smaller model gemma e2b for most of my editing and it works surprisingly well. Workflow is planning with sota models and execution via small models. If you plan properly dont leave ambiguity for smaller model it works well.


Out of curiosity have you tried other small models? The e2b for me was unusable. Llama3.2 3b was better and that thing is a year old and I rarely use it now too.


yes i keep on trying small models, i have also tried qwen 3.5 0.8B, 2B, 4b and gemma4 e4B models but they either did not worked reliably (thinking loop, issue in following instruction) or there were performance issues (prompt speed, tg speed, too much ram) e2b was the sweet spot where i could give it plan and it can edit files properly.


That makes sense it sounds like your computer isn't super powerful. Whatever works for you


How did e2b compare to e4b ?


i did not see much improvement for my use case i.e. file editing tasks but with e4b tg/s is lower so i stick with e2b.


Could you please share your time to first token and tok/s?


M4 Pro 64GB (14 CPU / 20 GPU), Gemma 4 31B Q4_K_M GGUF, LM Studio: time to first token 0.92s, 11.56 tokens/s.

Edit: For comparison with the other poster, same setup as above, but with Gemma 4 31B Instruct 8bit MLX (not sure if exactly the same model): time to first token 4.62s, 7.20 tokens/s; with a different prompt, 1.17s and 7.24 tokens/s.


Could you (or anyone with the same hardware) try antirez's ds4 and report how gracefully it degrades with only the 64GB RAM? Obviously it's going to be dog slow at best for any single inference flow, but can you meaningfully improve on that by running many sessions in parallel? (Ideally you'd need roughly on the order of model sparsity in order to get meaningful sharing of MoE weights, but whether that's genuinely achievable is anyone's guess!)


I’m on an M2 Max and get 10 tok/s with Gemma 4 8bit MLX


It's great, but I wish I could use these things without it feeling like my laptop is going to melt through the desk.


Does gemma work better than qwen3 in your experience?


Not in mine. I see a lot of people talking about Gemma on here but in my circles pretty much everyone else is running qwen.


What's your opinion with Gemma 4 vs Qwen3.6?


I'm sure Andreessen thinks about your well-being just as often as you think about his!

I agree poking fun at someone's appearance is low (and this is particularly savage), but it is hard to have sympathy for Andreessen, and I'm not going to strain myself trying.


Why is it hard to have sympathy for him?


That is your problem right there. Instead of PCI compliance you needed that sweet, sweet IBM MCA compliance.

Rookie mistake by your AI; otherwise it did a flawless job, and the glaze it's been giving you is 100% accurate. You are the bestest.

If one more AI calls me "insightful" or says that my question "really cuts through the noise" or "gets to the heart of the matter"...


You're totally right!


Everyone seems to be objecting to the arithmetic error, but what’s more interesting is the underlying assumption: that “Sergey Brin owes $50B” is so morally dispositive that it would end the argument.

Brin is not a sympathetic marginal case. He is the standard: founder equity, $1 salary, unrealized appreciation, deferral, and access to liquidity without needing wage income.

It is a little strange to look at a very large tax bill landing at his feet and decide that is where the scandal starts. Is the number supposed to make the idea disqualifying because it is drastically more than most people make in many lifetimes?

Said another way, the revealing part is not that someone on the internet multiplied wrong. The revealing part is how quickly the error resolves into a fairness argument on behalf of Sergey Brin, a man whose fortune is almost a laboratory specimen of the thing the regular income tax keeps failing, decade after decade, to reach.


it's not an arithmetic error, it's literally written in the tax bill to value voting ownership shares as a percentage of the company if they are not available on the open market.

Section 50303(c)(3)(C) says "For any interests that confer voting or other direct control rights, the percentage of the business entity owned by the taxpayer shall be presumed to be not less than the taxpayer's percentage of the overall voting or other direct control rights."

Sergey Brin is not 'hoarding' anything. There is not a bank vault of jewels or wheat that the rest of the world is deprived of having. Stock 'wealth' is not a fixed quantity of wealth. To think that divesting people of ownership is egregious and immoral.

You are advocating for laws you do not read, regarding concepts of which you have not read about nor seem to be able to comprehend, and people like that are my enemy.


Not only this, but a person with this much wealth cannot possibly consume enough to pay their fair share in taxes. They've already used all the loopholes to avoid income taxes which aren't even progressive for billionaires as mentioned by Buffet.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: