More

Topfi · 2026-03-25T14:20:43 1774448443

I can only speculate of course, but I'd suspect a far simpler reason, driving subscriptions. Even limited use for video editing can fill up the free 5GB tier very quickly and if iClouds rather aggressive nudges to upgrade and their success amongst the people around me are anything to go by, there is a lot of potential in getting people to subscribe for a few bucks.

The scenario I envision is basically, a user just edits a few 4k files together and starts approaching the 5GB limit. Because OneDrive is also by default used for other things such as documents and even the desktop nowadays, they soon thereafter hit the 5GB limit and start being inundated with offers to upgrade to the 100GB tier. "Just 1,99 per month, nothing major, barely felt", at least that's the pitch. Maybe they acquiesce, maybe they ignore it. That's the play I'd wager. And if enough people resist, maybe in an upcoming update a full OneDrive could lead to (artificially) degraded functionality. Not in the sense of local storage being impossible, just slightly less convenient over direct OneDrive. Say what you want about the iCloud nagging (I have and will continue complaining about that as well), at least it isn't required to have free iCloud storage to locally edit ones videos with iMovie.

In any case, forcing OneDrive for a video editor (arguably one of the highest storage requirement programs most people will ever use at the moment) is anti-consumer and showcases how little any commitment [0] by them actually means. Took less than a week, which is honestly longer than I'd have suspected...

[0] https://blogs.windows.com/windows-insider/2026/03/20/our-com...

Topfi · 2026-03-20T10:22:35 1774002155

As far as communicating ones ideas with the public in an interesting manner is concerned, this is truly a great way to do it. Succinct, attention grabbing, not overly emotional. I especially liked the detail regarding stabilisation, sections focused on the phone were rock solid, sections filming him had a handheld effect.

On the main topic, a possible datacenter moratorium, I will admit that I am to far removed from the US and your particular situation, not to mention the many state and local governments involved to make a true judgement one way or another.

I will however say that a lot of the issues currently raised with data centers such as local water usage, environmental concerns ranging from generator exhaust to noise pollution and local tax incentives causing issues for municipal funding have existed independent of the current situation for decades.

Bottled water companies have harmed local communities to a similar and likely greater extent than datacenter build up, pollution in e.g. the cancer alley is an ongoing issue with little political will to find solutions and tax incentives have been part and parcel for decades before. The same goes for other issues currently focused on with model training and deployment like data privacy, right to access ones own information from processors, etc.

Now, I am not saying that Senator Sanders isn't interested in more comprehensively solving these issues beyond the current data center buildout and the interest this has generated by the public, but what I am saying is that a crucial difference between EU and US regulatory approach can be seen in this case, rather than pushing forward solid environmental and privacy regulation for all areas of the economy, there appears to be a tendency to operate on a more targeted level. Banning Huawei, forcing the sale of TikTok, considering a moratorium on data center buildup, these are just some examples where, in my European mind, a more broad regulatory framework that companies the same regardless of their origin (in the case of e.g. the TikTok ban there is no reason such concerns shouldn't equally apply to Meta) and area of operation (as mentioned, many data center concerns are equally applicable to other areas of the economy). If there were more comprehensive laws that are applied across industries, perhaps many concerns with data centres in the US would not exist.

I do however, in fairness, also understand that the US operates in a very different manner, that passing regulation so comprehensive is likely impossible with the makeup of Congress (not just the current split but also the Filibuster and other procedural quirks) and that a data center moratorium can be a way to bring attention to such issues in an easy to comprehend manner. And I do also have to admit that while the EU was initially willing to enforce strict environmental standards on our own local car manufacturers, there has been a willingness to compromise on this front for our local economic benefit, so I cannot say confidently how we'd approach questions in this field if software behemoths like Meta and Alphabet were European rather than American.

Topfi · 2026-03-20T08:23:04 1773994984

A related topic that I have in the past thought about is, whether LLM derived code would necessitate the release under a copyleft license because of the training data. Never saw a cogent analysis that explained either why or why not this is the case beyond practicality due to models having been utilized in closed source codebases already…

mjg59 · 2026-03-20T08:29:33 1773995373

The short answer is that we don't know. The longer answer based purely on this case is that there's an argument that training is fair use and so copyleft doesn't have any impact on the model, but this is one case in California and doesn't inherently set precedent in the US in general and has no impact at all on legal interpretations in other countries.

bragr · 2026-03-20T09:21:21 1773998481

The dearth of case law here still makes a negative outcome for FSF pretty dangerous, even if they don't appeal it and set precedent in higher courts. It might not be binding but every subsequent case will be able to site it, potentially even in other common law countries that lack case law on the topic.

And then there is the chilling effect. If FSF can't enforce their license, who is going to sue to overturn the precedent? Large companies, publishers, and governments have mostly all done deals with the devil now. Joe Blow random developer is going to get a strip mall lawyer and overturn this? Seems unlikely

adampunk · 2026-03-20T15:37:20 1774021040

I don't think this argument is a winner. It fails on a few grounds:

First, unless you can point to regurgitation of memorized code, you're not able to make an argument about distribution or replication. This is part of the problem that most publishers are having with prose text and LLMs. Modern LLMs don't memorize harry potter like GPT3 did. The memorization older models showed came from problems in the training data, e.g. harry potter and people writing about harry potter are extraordinarily over-represented. It's similar to how with stable diffusion you could prompt for anything in the region of "Van Gogh's Starry Night" and get it, since it was in the training data 50-100 different ways. You can't reliably do this with Opus or GPT5. If they're not redistributing the code verbatim, they're not in violation of the license. One could argue that the models produce "derivative works, but..."

The derivative works argument is inapt. The point of it is to disrupt someone's end-run around the license by saying that building on top of GPL code is not enough to non-GPL it. We imagine this will still work for LLMs because of the GPLs virality--I can't enclose a critical GPL module in non-GPL code and not release the GPL code. But the models aren't DOING THAT. They're not reaching for XYZ GPL'd project to build with. They're vibing out a sparsely connected network of information about literally trillions of lines of software. What comes out is a mishmash of code from here and there, and only coincidentally resembles GPL code, when it does. In order to make this argument work, you need a theory of how LLMs are trained and operate that supports it. Regardless of whether or not one of those theories exist, in court, you'd need to show that your theory was better than the company's expert witness's theory. Good luck.

Second, infringement would need discovery to uncover and would be contingent on user input. This is why the NYT sued for deleted user prompts to ChatGPT--the plaintiffs can't show in public that the content is infringing, so they need to seek discovery to find evidence. That's only going to work in cases where you survive a motion to dismiss--which is EXACTLY where a few of these suits have failed. You need to show first that you can succeed on the merits, then you proceed. That will cut down many of these challenges since they just can't show the actual infringement.

Third, and I think this is the most important, the license protections here are enforced by *copyright*. For copyright it very much matters if something is lifted verbatim vs modified. It is not like patent protection where things like clean room design are shown to have mattered to real courts on real matters. In additional contrast to patents, copyright doesn't care if the outcome is close. That's very much a concern for patents. If I patent a gizmo and you produce a gizmo that operates through nearly identical mechanisms to those I patented, then you can be sued--they don't need to be exact. If I write a novel about a boy wizard with glasses who takes a train to a school in Scotland and you write a novel about a boy wizard with glasses who takes a boat to a school in Inishmurray, I can't sue you for copyright infringement. You need to copy the words I wrote and distribute them to rise to a violation.

Topfi · 2026-03-20T19:23:11 1774034591

> Modern LLMs don't memorize harry potter like GPT3 did. [...] You can't reliably do this with Opus or GPT5.

If you try any modern LLM, you will find that you can. Easily [0], reliably [1], consistently [2]. All these examples are with models released in 2025/26.

[0] https://arxiv.org/html/2601.02671?amp=&amp=

[1] https://arxiv.org/abs/2506.12286

[2] https://ai.stanford.edu/blog/verbatim-memorization/

adampunk · 2026-03-20T19:48:18 1774036098

So, they have to do anything special to those models in order to get them to regurgitate ~ 100%? Any special prompts they needed to use to get sonnet to cough that up?

What is the real copyright risk of there being an arcane procedure to sometimes recover most of a text? So far it’s nothing. Which is what I’m saying. Pragmatically this is a loser of an argument in a court room. It is too easy for the chain of reasoning to be disrupted and even undisrupted the argument for model maker liability is attenuated.

harshreality · 2026-03-20T20:28:12 1774038492

You can't do that without already having the contents of the book, in which case getting an LLM to regurgitate it with partial prompting shouldn't be legally relevant at all. What it regurgitates will have errors, and if you try to chain that as prompt cues without re-basing each cue to the actual text (which you have separately), the LLM's output will rapidly lose coherence with the original work.

If its responses were perfect so that you could chain them, or if you could ask "please give me words 10-15 of chapter 3 paragraph 4 of HPatSS, and it did so, then you'd have a better case to complain. Still, the counterargument is that repeated prompting like that, explicitly asking for copyright violation, is the real crime. Are you going to throw someone in prison if they memorize the entirety of HPatSS and recite arbitrary parts of it on demand?

Combining both issues: that LLMs are only regurgitating mostly accurate continuations, and they're only providing that to the person who explicitly asked... any meaningful copyright violation moves downstream. If you record someone reciting HPatSS from memory, and post it on youtube, you are (or should be considered) the real copyright violator, not them.

If you ask for an identifiable short segment of writing, or a piece of art, and get something close enough that violates copyright, that should really be your problem if you redistribute it (whether manually or because you've coded something to allow 3rd parties to submit LLM prompts and feed answers back to them, and they go on to redistribute it).

Blaming LLMs for "copyright violation" is like persuading a retarded person to do something illegal and then blaming them for it.

themafia · 2026-03-20T18:07:00 1774030020

> unless you can point to regurgitation of memorized code

I have, on many occasions, gotten an LLM to do just this. It's not particularly hard. In the most recent case google's search bar LLM happily regurgitated a digital ocean article as if it was it's own output. Searching for some strings in the comments located the original page and it was a 95% match between origin and output.

> The memorization older models showed came from problems in the training data,

And what proof do you have that they "fixed" this? And what was the fix?

> harry potter and people writing about harry potter

I'm not sure that's how you get GPT to reproduce upwards of 85% of Harry Potter novels.

> Second, infringement would need discovery to uncover and would be contingent on user input.

That's not at all how copyright infringement works. That would be if you wanted to prove malice and get triple damages. Copyright infringement is an exceptionally simple violation of the law. You either copied, or you did not.

> For copyright it very much matters if something is lifted verbatim vs modified.

Transformation is a valid defense for _some_ uses. It is not for commercial uses. Using LLM generated code for commercial purposes is a hazard.

adampunk · 2026-03-20T19:41:20 1774035680

This must be why all of these copyright plaintiffs are having tremendous days in court! If even half of this were correct, they wouldn’t be losing in summary judgment.

We have yet to see a single judgment come down against a model maker for distributing the gist of content. We have yet to see a single judgment come down against a model maker for infringement at all.

Copyright is just an inapt tool here. It’s not going to do the job. It is not as though big interests have not tried to use this tool. It just doesn’t reflect what’s actually happening and it’s going to lose again and again.

We can imagine a theoretical legal regime where what is done with large language models counts as copyright infringement, we just don’t live in a world where that regime holds.

Topfi · 2026-03-15T19:08:15 1773601695

That is a very fair point, there are quite a few businesses and government agencies where I live, which are very deeply entrenched in very complex, decade spanning VBA based workflows that need absolute and fully compatibility before a switch away from "MS 365 Copilot" could even be considered and the name may give false expectations.

Now, I really, very much dislike it that often discussions on sites like this one can be utterly derailed by someone bringing up an utterly unrelated overhyped topic, so feel free to dismiss this, but I could honestly see LLMs providing a potential path to smoothing out such issues. Some model have gotten rather robust when it comes to making targeted changes to pre-existing Excel files dating back to before I was using a computer, including handling very specific modifications to ancient macros across multiple sheets. Perhaps, this could be leveraged to some extent, though being honest and trying not to overhype, I suspect that similar to those planning to use agentic coding to rewrite decades old, tested, crucially important COBOL code in a more modern language, there are likely many edge cases that will be hard to properly cover and if such a solution isn't both absolutely reliable and seamless to the users, large scale adoption by such entities will likely be impossible in the short term.

Topfi · 2026-03-15T18:59:12 1773601152

In fairness, Office is as generic a term as one can come up with for such a software suite. On top of that, I wouldn't be surprised if that fell under Genericide like Lego or Google and, lest we forget, the Microsoft Office brand does not exist anymore, it is 365 and Copilot now...

Topfi · 2026-03-10T19:06:01 1773169561

> "The Moltbook team has given agents a way to verify their identity and connect with one another on their human's behalf," Shah says. "This establishes a registry where agents are verified and tethered to human owners."

Have they? Did I miss something? Last I checked, there was no verification and most of the content shared from that site turned out to have been posted not by LLMs but rather (human) spammers, focused on Crypto grifts and creating hype.

Anyone more in this can happily correct me, but is there anything here of that sort, anything of value?

Compared to any prior social media acquire there doesn't seem a technically skilled team considering the exploits or an existing user base considering said user base is A) supposed to be bots by nature and secondly didn't even turn out to be that reliably, making this the first time someone wants bots and doesn't even get that.

Far is it from me to make strategic decisions for a company like Meta/Facebook, but the lack of a recent Llama release might merit more focus then spending on whatever this is.

phatfish · 2026-03-11T22:26:25 1773267985

I had never heard of the thing and checked it out. It appears to be an industrial scale slop generation machine. Exactly what you would expect if LLMs were let loose to recreate Reddit and introspect on their current context and SOUL.md or the other nonsense that OpenClaw can be customised with.

Not much human content that i could see, probably even the Crypto grifters got bored with it after a couple of days.

The "acquisition" must have given guys that made the thing some favourable terms, and it was a condition for them to even consider working at Meta. Because there is no way a global top 10 market cap company announces this deal willingly.

Topfi · 2026-02-25T10:57:12 1772017032

Have been following your models and semi-regularly ran them through evals since early summer. With the existing Coder and Mercury models, I always found that the trade-offs were not worth it, especially as providers with custom inference hardware could push model tp/s and latency increasingly higher.

I can see some very specific use cases for an existing PKM project, specially using the edit model for tagging and potentially retrieval, both of which I am using Gemini 2.5 Flash-Lite still.

The pricing makes this very enticing and I'll really try to get Mercury 2 going, if tool calling and structured output are truly consistently possible with this model to a similar degree as Haiku 4.5 (which I still rate very highly) that may make a few use cases far more possible for me (as long as Task adherence, task inference and task evaluation aren't significantly worse than Haiku 4.5). Gemini 3 Flash was less ideal for me, partly because while it is significantly better than 3 Pro, there are still issues regarding CLI usage that make it unreliable for me.

Regardless of that, I'd like to provide some constructive feedback:

1.) Unless I am mistaken, I couldn't find a public status page. Doing some very simple testing via the chat website, I got an error a few times and wanted to confirm whether it was server load/known or not, but couldn't

2.) Your homepage looks very nice, but parts of it struggle, both on Firefox and Chromium, with poor performance to the point were it affects usability. The highlighting of the three recommended queries on the homepage lags heavily, same for the header bar and the switcher between Private and Commercial on the Early Access page switches at a very sluggish pace. The band showcasing your partners also lags below. I did remove the very nice looking diffusion animation you have in the background and found that memory and CPU usage returned to normal levels and all described issues were resolved, so perhaps this could be optimized further. It makes the experience of navigating the website rather frustrating and first impressions are important, especially considering the models are also supposed to be used in coding.

3.) I can understand if that is not possible, but it would be great if the reasoning traces were visible on the chat homepage. Will check later whether they are available on the API.

4.) Unless I am mistaken, I can't see the maximum output tokens anywhere on the website or documentation. Would be helpful if that were front and center. Is it still at roughly 15k?

5.) Consider changing the way web search works on the chat website. Currently, it is enabled by default but only seems to be used by the model when explicitly prompted to do so (and even then the model doesn't search in every case). I can understand why web search is used sparingly as the swift experience is what you want to put front and center and every web search adds latency, but may I suggest disabling web search by default and then setting the model up so, when web search is enabled, that resource is more consistently relied upon?

6.) "Try suggested prompt" returns an empty field if a user goes from an existing chat back to the main chat page. After a reload, the suggested prompt area contains said prompts again.

One thing that I very much like and that has gotten my mind racing for PKM tasks are the follow up questions which are provided essentially instantly. I can see some great value, even combining that with another models output to assist a user in exploring concepts they may not be familiar with, but will have to test, especially on the context/haystack front.

volodia · 2026-02-26T20:23:41 1772137421

Thank you for the detailed feedback! I shared this already with the team.

Topfi · 2026-02-19T16:12:50 1771517570

Appears the only difference to 3.0 Pro Preview is Medium reasoning. Model naming has long gone from even trying to make sense, but considering 3.0 is still in preview itself, increasing the number for such a minor change is not a move in the right direction.

GrayShade · 2026-02-19T16:14:04 1771517644

Maybe that's the only API-visible change, saying nothing about the actual capabilities of the model?

xnx · 2026-02-19T16:24:29 1771518269

> increasing the number for such a minor change is not a move in the right direction

A .1 model number increase seems reasonable for more than doubling ARC-AGI 2 score and increasing so many other benchmarks.

What would you have named it?

Topfi · 2026-02-19T18:22:37 1771525357

My issue is that we haven't even gotten the release version of 3.0, that is also still in Preview, so may stick with 3.0 till that has been deemed stable.

Basically, what does the word "Preview" mean, if newer releases happen before a Preview model is stable? In prior Google models, Preview meant that there'd still be updates and improvements to said model prior to full deployment, something we saw with 2.5. Now, there is no meaning or reason for this designation to exist if they forgo a 3.0 still in Preview for model improvements.

xnx · 2026-02-19T18:40:15 1771526415

Given the pace AI is improving and that it doesn't give the exact same answers under many circumstances, is the the [in]stability of "preview" a concern?

GMail was in "beta" for 5 years.

Topfi · 2026-02-20T13:13:17 1771593197

Should have clarified initially what I meant by stable, especially because it isn't that known how these terms are defined for Gemini models. Not talking about getting consistent output from a not-deterministic model, but stable from a usage perspective and in the way Google uses the word "stable" to describe their model deployments [0]. "Preview" in regard to Gemini models means a few very specific restrictions including far stricter rate limits and a very tight 14 day deprecation window, making them models one cannot build on.

That is why I'd prefer for them to finish the role out of an existing model before starting work on a dedicated new version.

[0] https://ai.google.dev/gemini-api/docs/models

verdverm · 2026-02-19T19:18:18 1771528698

ChatGPT 4.5 was never released to the public, but it is widely believed to be the foundation the 5.x series is built on.

Wonder how GP feels about the minor bumps for other model providers?

Topfi · 2026-02-20T13:16:24 1771593384

Minor version bumps are good and I want model providers to communicate changes. The issue I am having is that Gemini "preview" class models have different deprecation timelines and rate limits, making them impossible to rely on for professional use cases. That's why I'd prefer they finish the 3.0 role out prior to putting resources into deploying a second "preview" class model.

For a stable deployment, Google needs a sufficient amount of hardware to guarantee inference and having two Pro models running makes that even more challenging: https://ai.google.dev/gemini-api/docs/models

verdverm · 2026-02-20T17:03:12 1771606992

Sorry, but you come off as an armchair devops saying things like this. Google is fine, they know more than anyone else about how to run Ai at scale.

"preview" != GA, sounds like you need to adjust your expectations

argsnd · 2026-02-19T16:15:26 1771517726

I disagree. Incrementing the minor number makes so much more sense than “gemini-3-pro-preview-1902” or something.

jannyfer · 2026-02-19T16:28:28 1771518508

According to the blog post, it should be also great at drawing pelicans riding a bicycle.

Topfi · 2026-02-17T20:20:19 1771359619

In my evals, I was able to rather reliably reproduce an increase in output token amount of roughly 15-45% compared to 4.5, but in large part this was limited to task inference and task evaluation benchmarks. These are made up of prompts that I intentionally designed to be less then optimal, either lacking crucial information (requiring a model to output an inference to accomplish the main request) or including a request for a less than optimal or incorrect approach to resolving a task (testing whether and how a prompt is evaluated by a model against pure task adherence). The clarifying question many agentic harnesses try to provide (with mixed success) are a practical example of both capabilities and something I do rate highly in models, as long as task adherence isn't affected overly negatively because of it.

In either case, there has been an increase between 4.1 and 4.5, as well as now another jump with the release of 4.6. As mentioned, I haven't seen a 5x or 10x increase, a bit below 50% for the same task was the maximum I saw and in general, of more opaque input or when a better approach is possible, I do think using more tokens for a better overall result is the right approach.

In tasks which are well authored and do not contain such deficiencies, I have seen no significant difference in either direction in terms of pure token output numbers. However, with models being what they are and past, hard to reproduce regressions/output quality differences, that additionally only affected a specific subset of users, I cannot make a solid determination.

Regarding Sonnet 4.6, what I noticed is that the reasoning tokens are very different compared to any prior Anthropic models. They start out far more structured, but then consistently turn more verbose akin to a Google model.

Topfi · 2026-02-16T07:08:47 1771225727

Unless I am mistaken, that is all plain old markdown, arguably the easiest to migrate format for such data there can possible be.

Heck, that was half the pitch behind Obsidian, even if the project someday ended, markdown would remain. And switching between Obsidian and e.g. Logseq shows the ease of doing so.