Hacker Newsnew | past | comments | ask | show | jobs | submit | andsoitis's commentslogin

You don’t know what you’re going to think next. And you can’t stop it.

Not dissimilar from biological entities. Some stimulus starts the whole thing.

Is anyone here experimenting seriously with Diffusion for text generation? I’d love to learn about your experiences!

https://www.inceptionlabs.ai/

This startup seems to have been at it a while.

From our look into it - amazing speed, but challenges remain around time-to-first-token user experience and overall answer quality.

Can absolutely see this working if we can get the speed and accuracy up to that “good enough” position for cheaper models - or non-user facing async work.

One other question I’ve had is wondering if it’s possible to actually set a huge amount of text to diffuse as the output - using a larger body to mechanically force greater levels of reasoning. I’m sure there’s some incredibly interesting research taking place in the big labs on this.


The overall speed rather than TTFT might start to be more relevant as the caller moves from being a human to another model.

However quality is really important. I tried that site and clicked one of their examples, "create a javascript animation". Fast response, but while it starts like this

``` Below is a self‑contained HTML + CSS + JavaScript example that creates a simple, smooth animation: a colorful ball bounces around the browser window while leaving a fading trail behind it.

<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>JavaScript Bounce Animation</title> <style> body, html { margin: 0; padding: 0;

```

the answer then degrades to

``` radius: BALL_RADIUS, color: BALL_COLOR, traivD O] // array of previous {x,y} positions }; ```

Then more things start creeping in

``` // 3⃣ Bounce off walls if (ball.G 0 ball.radius < 0 || ball.x + ball.radius > _7{nas.width) { ball.vx *= -1; ibSl.x = Math.max(ball.radius, Math.min(ball.x, canvbbF4idth - ball.radius)); } if

```

and the more it goes on the worse it gets

``` Ho7 J3 Works 0 Atep | Description | ```

and

``` • prwrZ8}E6on 5 jdF wVuJg Ar touc> 2ysteners ,2 Ppawn \?) balls w>SFu the 8b$] cliM#]9 ```

This is for the demo on the front page, so I expect this is a pretty good outcome compared to what else you might ask.


Weird; I clicked through out of curiosity and didn't get any corruption of the sort in the end result.

I also asked it some technical details about how diffusion LLMs could work and it provided grammatically-correct plausible answers in a very short time (I don't know the tech to say if it's correct or not).


Mercury 2 is better than that in my testing, but it does have trouble with tool calling.

I've found the latency and pricing make Mercury 2 extremely compelling for some UX experiments focused around automated note tagging/interlinking. Far more than the Gemini Flash Lite I used before, it made some interactions nearly frictionless, very close to how old school autocomplete/T9/autocorrect works in a manner that users don't even think about the processes behind it.

Sadly, it does not perform at the level of e.g. Haiku 3.5 for tool calling, despite their own benchmarks claiming parity with Haiku 4.5, but it does compete with Flash Lite there too.

Anything with very targeted output, sufficient existing input and that benefits from a seamless feeling lends itself to dLLMs. Could see a place in tab-complete too, though Cursors model seems to be sufficiently low latency already.


If you like Mercury 2 you should try Xiaomi Mimo-v2-flash.

I have an agentic benchmark and it shows Mercury 2 at 19/25 in 58 seconds and Mimo v2 Flash at 22/25 in 109 seconds

https://sql-benchmark.nicklothian.com/?highlight=xiaomi_mimo... (flip to the Cost vs Performance tab to see speed more graphically too)


Thanks for the recommendation and sharing your evals, will take a closer look at them. Yes, the Mimo models are very interesting, end-to-end pricing wise especially, though in my tool call runs, GLM 4.7 Flash did slightly better at roughly equal speed and full run cost. Is of course very task dependent and both are amazing options in the price range, but latency wise, nothing feels like Mercury 2 at the moment.

Yeah the speed is super impressive.

https://chatjimmy.ai/ from Taalas seems down at the moment but if you really want speed.... 18,000 tps is something to experience


Did you get a chance to evaluate coding performance?

Yes, nothing to write home about. It's all relative of course, what stack, what goal, what approach on which models perform best, but for regular day-to-day coding, I do not find it usable given alternatives.

Kimi, Mimimax and GLM models provide far more robust coding assistance at sometimes no cost (financed via data sharing) or for very cheap. Output quality, tool calling reliability and task adherence tend to be far more reliable across all three over Mercury 2, so if you consider the time to get usable code including reviews, manual fixes, different prompting attempts, etc. end-to-end you'll be faster.

Only "coding" task I have found Mercury 2 to have a place for code generation is a browser desktop with simple generated applets. Think artefacts/canvas output but via a search field if the applet has been generated previously.

With other models, I need to hide the load behind a splash screen, but with Mercury 2 it is so fast that it can feel frictionless. The demo at this point is limited by the fact that venturing beyond a simple calculator or todo list, the output becomes unpredictable and I struggle to get Mercury 2 to rely on pre-made components, etc. to ensure consistent appearance and a11y.

Despite the benchmarks, cost and speed figure suggesting something different, I have had the best overall results with Haiku 4.5, simply because GPT-5.4-nano is still unwilling to play nice with my approach to UI components. I am currently experimenting with some routing, using different models for different complexity, then using loading spinners only for certain models, but even if that works reliably, any model that I cannot force to rely on UI components in a consistent manner isn't gonna work, so for the time being it'd just route between less expensive and more expensive Anthropic models.

Coding wise, one more exception can be in-line suggestions, though I have no way to fairly compare that because the tab models I know about (like Cursors) are not available via API, but Mercury 2 seems to perform solidly there, at least in Zed for a TS code base.

Basically, whether code or anything else, unless your task is truly latency dependent, I believe there are better options out there. If it is, Mercury 2 can enable some amazing things.


It's being explored right now for speculative decoding in the local-LLM space, which I think is quite interesting as a use-case

https://www.emergentmind.com/topics/dflash-block-diffusion-f...


DFlash immediately came to my mind.

There are several Mac implementations of it that show > 2x faster Qwen3.5 already.


I have. It requires a distinct intuition compared to a normal language model. Very well suited to certain problems.

Can you tell us more?

I've been playing with a Swift implementation of a diffusion language model (WeDLM), but performance is not yet acceptable and it still generates roughly from left-to-right like a language model (just within a sliding window rather than strictly token-by-token... but that doesn't matter when the sliding window is only like 16 tokens.)

Any ambitious web app needs to manage state, so you need to solve for that. Rolling your own is of course totally doable, but is an opportunity cost to solving your unique user problems in your app. State management, and other things your app will need are commoditized, so it is better focus on the unique value you have to bring.

On the language front, TypeScript gives you a more modern, yet flexible, language.


Typescript was a Microsoft attempt to hijack the Node community because they had nothing to offer it. It makes nothing better and most things worse, and no you don’t need it anyway.

Qwen, according to the article, also fast surpasses DeepSeek.

> The difference is that fibre is infrastructure, LLMs are an application

When I zoom out, I see “token generation” as an infrastructural layer, with applications built upon it.


What you say is accurate. Just remember that generating code is not the only way for an engineer to amplify themselves.

Using the author’s logic, it is Google then that will lead.

Unlike Apple, they have even more devices in the field PLUS they have strong models PLUS Apple uses Google models.


Google is an advertisement company at the end of the day and that's a conflict of interest with user privacy.

So is Apple. Worse, Apple is a company that is comfortable with the idea of restricting user control, so you can't get privacy even if you want it.

> Apple uses Google models

Source?


The article itself? lol

> You can't keep marginalizing people and expecting stability.

People who shoot someone or throw bombs at someone even though that someone never did something against them, should be marginalized. In prison.


>People who shoot someone or throw bombs at someone even though that someone never did something against them

I think the point is that there's going to be an increasingly large percentage of the populace who think that the AI bosses / billionaire class did indeed do something against them.


This has always been the case, hasn't it? There have always been groups of people who perceive technology change as a negative, or they are in fact negatively impacted.

But they didn't ask the rest of us if we're ok for them to murder someone on our behalf.

Personally I hope that AI will be a step change for the positive. I think it is inevitable that it will progress form here, in the darwinian sense, that someone else on this thread mentioned.

With that in mind, we should all be pushing for it to be used to our benefit, rather than detriment. And like almost all technological advances in the past, I think this can happen.

So if people are saying violence against Sam Altman is expected, then they're also saying violence against me is expected, because I am hopeful and vaguely supportive of the technology. That's quite scary.


The last time we had a serious labor movement in the US, which made enormous progress towards dignity for workers, it involved guns and bombs.

Okay

> <...> even though that someone never did something against them, <...>

Many tech billionaires openly, publicly and loudly said something among the lines: "I/we/my company/tech-bros are building torment nexus - it will take your job and/or kill people and/or shut up political opponents. You are powerless to stop this."

There are some of those billionaires willing to put their name and face in front of billions of people in the world. You will have no trouble finding people that will think that X or Y tech bro is personally responsible for some poor persons problems.

Especially when there's a bunch of news like "layoffs due to AI", "record investments due to AI", etc.

I am not supporting violence, never done it and never considered it. Though not surprising when talking heads of political/economical extremes can get threats from people that have nothing to lose.


> Netanyahu and Putin are two war criminals according to International Court of Justice.

You mean the ICC, not the ICJ. The latter is a separate body that handles disputes between states.

Both have warrants for arrest by the ICC but note that neither Putin nor Netanyahu have been tried or convicted by the International Criminal Court.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: