Hacker Newsnew | past | comments | ask | show | jobs | submit | bluegatty's commentslogin

this is wrong

It is not. I would suggest engaging in the other branch of this thread, because people who agreed with you voiced their opinion and they were proven utterly wrong.

Humanity does not understand how LLMs work. This is definitive.


That's not how training works - adjusting model weights to memorize a single data item is not going to fly.

Model weights store abilities, not facts - generally.

Unless the fact is very widely used and widely known, with a ton of context around it.

The model can learn the day JFK died because there are millions of sparse examples of how that information exists in the world, but when you're working on a problem, you might have 1 concern to 'memorize'.

That's going to be something different than adjusting model weights as we understand them today.

LLMs are not mammals either, it's helpful analogy in terms of 'what a human might find useful' but not necessary in the context of actual llm architecture.

The fact is - we don't have memory sorted out architecturally - it's either 'context or weights' and that's that.

Also critically: Humans do not remember the details of the face. Not remotely. They're able to associate it with a person and name 'if they see it again' - but that's different than some kind of excellent recall. Ask them to describe features in detail and maybe we can't do it.

You can see in this instance, this may be related to kind of 'soft lookup' aka associating an input with other bits of information which 'rise to the fore' as possibly useful.

But overall, yes, it's fair to take the position that we'll have to 'learn from context in some way'.


Also, with regards to faces, that's kind of what I'm getting at - we don't have grid cells for faces, there seem to be discrete, functional, evolutionary structures and capabilities that combine in ways we're not consciously aware of to provide abilities. We're reflexively able to memorize faces, but to bring that to consciousness isn't automatic. There've been amnesia and lesion and other injury studies where people with face blindness get stress or anxiety, or relief, when recognizing a face, but they aren't consciously aware. A doctor, or person they didn't like, showing up caused stress spikes, but they couldn't tell you who they were or their name, and the same with family members- they get a physiological, hormonal response as if they recognized a friend or foe, but it never rises to the level of conscious recognition.

There do seem to be complex cells that allow association with a recognizable face, person, icon, object, or distinctive thing. Face cells apply equally to abstractions like logos or UI elements in an app as they do to people, famous animals, unique audio stings, etc. Split brain patients also demonstrate amazing strangeness with memory and subconscious responses.

There are all sorts of layers to human memory, beyond just short term, long term, REM, memory palaces, and so forth, and so there's no simple singular function of "memory" in biological brains, but a suite of different strategies and a pipeline that roughly slots into the fuzzy bucket words we use for them today.


It's not just faces. When recognizing objects in the environment, we normally filter out a great number of details going through the visual cortex - by the time information from our eyes hits the level of conscious awareness, it's more of a scene graph.

Table; chair behind and little to the left of the chair; plant on table

Most people won't really have conscious access to all the details that we use in recognizing objects - but that is a skill that can be consciously developed, as artists and painters do. A non-artist would be able to identify most of the details, but not all (I would be really bad compared to an actual artist with colors and spatial relationships), and I wouldn't be able to enumerate the important details in a way that makes any kind of sense for forming a recognizable scene.

So it follows from that that our ability to recognize faces is not purely - or even primarily - an attribute of what we would normally call "memory", certainly in the sense of conscious memory where we can recall details on demand. Like you alluded to re: mammals and spaces, we're really good at identifying, categorizing, and recognizing new forms of structure.


I suspect we're going to need hypernetworks of some sort - dynamically generated weights, with the hypernet weights getting the dream-like reconsolidation and mapping into the model at large, and layers or entire experts generated from the hypernets on the fly, a degree removed from the direct-from-weights inference being done now. I've been following some of the token-free latent reasoning and other discussions around CoT, other reasoning scaffolding, and so forth, and you just can't overcome the missing puzzle piece problem elegantly unless you have online memory. In the context of millions of concurrent users, that also becomes a nightmare. Having a pipeline, with a sort of intermediate memory, constructive and dynamic to allow resolution of problems requiring integration into memorized concepts and functions, but held out for curation and stability.

It's an absolutely enormous problem, and I'm excited that it seems to be one of the primary research efforts kicking off this year. It could be a very huge capabilities step change.


Can I subscribe to your newsletter? You seem to be pretty plugged in to current research.

Yes, so I think that's a fine thought, I don't think it fits into LLM architecture.

Also, weirdly, even Lecun etc. are barely talking about this, they're thinking about 'world models etc'.

I think what you're talking about is maybe 'the most important thing' right now, and frankly, it's almost like an issue of 'Engineering'.

Like - its when you work very intently with the models so this 'issue' become much more prominent.

Your 'instinct' for this problem is probably an expression of 'very nuanced use' I'm going to guess!

So in a way, it's as much Engineering as it is theoretical?

Anyhow - so yes - but - probably not LLM weights. Probably.

I'll add a small thing: the way that Claude Code keeps the LLM 'on track' is by reminding it! Literally, it injects little 'TODO reminders' with some prompts, which is kind of ... simple!

I worked a bit with 'steering probes' ... and there's a related opportunity there - to 'inject' memory and control operations along those lines. Just as a starting point for a least one architectural motivation.


Not to forget we will need thousands of examples for the models to extract abilities the sample efficiency of these models is quite poor.

> That's not how training works - adjusting model weights to memorize a single data item is not going to fly.

Apologies; I think I got us all kind of off-track in this comment thread by stretching the definition of the term "fine-tuning" in my ancestor comment above.

Actual fine-tuning of the base model's weights (as one would do to customize a base model into a domain-specific model) works the way you're talking about, yes. The backprop from an individual training document would be a drop in the ocean; a "memory" so weak that, unless it touched some bizarre part of the latent vector-space that no other training document has so far affected (and so is until then all-zero), would be extremely unlikely to affect output, let alone create specific recall of the input.

And a shared, global incremental fine-tune of the model to "add memories" would be a hare-brained idea, anyway. Not even just that it wouldn't work, but that if it did work, it would be a security catastrophe, because now the model would be able to recall all this information gleaned from random tenant users' private chat transcripts, with nothing to differentiate that info from any other info to enable the model (or its inference framework) to compartmentalize it / prevent cross-tenant info leaks.

But let me rephrase what I was saying before:

> there's a way to take many transcripts of inference over a period, and convert/distil them together into an incremental-update training dataset (for memory, not for RLHF), that a model can be fine-tuned on as an offline batch process every day/week, such that a new version of the model can come out daily/weekly that hard-remembers everything you told it

As:

> for a given tenant user, there's a way to take all of their inference transcripts over a given period, and convert/distil them together into an incremental-update training dataset (for memory, not for RLHF), that a LoRA can be rebuilt (or itself fine-tuned) on. And that the work of all of these per-tenant LoRA rebuilds can occur asynchronously / "offline", on a batch-processing training cluster, gradually over the course of the day/week; such that at least once per day/week (presuming the tenant-user has any updated data to ingest), each tenant-user will get the effect of their own memory-LoRA being swapped out for a newer one.

---

Note how this is essentially what Apple claimed they would be doing with Apple Intelligence, re: "personal context."

The idea (that I don't think has ever come to fruition as stated—correct me if I'm wrong?) is that Apple would:

1. have your macOS and iOS devices spend some of their idle-on-charge CPU power to extract and normalize training fulltexts from whatever would be considered the user's "documents" — notes, emails, photos, maybe random text files on disk, etc.; and shove these fulltexts into some kind of iCloud-persisted database, where the fulltexts are PKI-encrypted such that only Apple's Private Compute Cloud (PCC) can decode them;

2. have the PCC produce a new/updated memory LoRA (or rather, six of them, because they need to separately imbue each of their domain-specific model "adapter" LoRAs with your personal-context memories);

3. and, once ready, have all your iCloud-account-synced devices to download the new versions of these memory-imbued adapter LoRAs.

---

And this is actually unnecessarily complex/circuitous for a cloud-hosted chat model. The ChatGPT/Claude/etc version of this architecture could be far simpler.

For a cloud-hosted chat model, you don't need a local agent to extract context from your devices; the context is just "past cloud-persisted chat transcripts." (But if you want "personal context" in the model, you could still get it, via an OpenClaw-style "personal agent"; such agents already essentially eat your files and spit them out external memories/RAGs/etc; the only change would be spitting them out into plain-old hidden-session chat transcripts instead, so as to influence the memories of the model they're running on.)

And you don't need a special securely-oblivious cluster to process that data, since unlike "Apple looking at the data on your computer" (which would upset literally everybody), nobody has any kind of expectation that e.g. OpenAI staff can't look at your ChatGPT conversation transcripts.

And cloud-hosted chat models don't really "do" domain-specific adapters (thus the whole "GPT" thing); so you only need to train one memory-LoRA per model. (Though I suppose that might still lead to training several LoRAs per user, if you're relying on smart routing to different models within a model family to save costs.)

And you don't need to distribute the memory-LoRAs back to client devices; as they can just live in an object store and get just-in-time loaded by the inference framework on a given node at the moment it begins an inference token-emission loop for a specific user. (Which might thus cause the inference cluster's routing to benefit from sticky sessions in a way it didn't before—but you don't need it; the LoRAs would likely be small enough to fetch and load within the ~second of delay it takes these cloud-hosted models to allocate you a node.)


This is a fine thought, I'm reluctant about it. It could work, I don't think it's obvious. It's very, very hard to know what to train for and not, and this still leaves the 'fact v. skill' problem - even LORA won't enable a model to remember your favourite lunch place.

This is kind of an existential problem with context I think. Maybe we need new architectures.


Email history caches. They could also have provided requirements to provide communications etc..

No, that's not at all how this works.

They have a court order obviously to collect evidence.

You have offered zero evidence to indicate there is 'political pressure' and that statement by prosecutors doesn't hint at that.

'No crime was prevented by harassing workers' is essentially non sequitor in this context.

It could be that that this is political nonsense, but there would have to be more details.

These issues are really hard but we have to confront them. X can alter electoral outcomes. That's where we are at.


It's amazing that all those other companies have not figured out that their apps are generally bloat, and they release all sorts of models to Apple's fairly tight lineup.

The winning example of tight product management is right there for them, but they continue to act like 'feature factories' without any concious 'whole product' design philosophy.

Probably many people within these organizations are aware, but they don't have the power to resist ingrained operational culture.


There is a possible winning strategy in trying to cover bases Apple isn’t interested in. Apple has shown that they’ll make phones that seem to be successful to some degree (the mini) but just aren’t successful enough by whatever internal metric Apple is using. And there are some things they just don’t have right now like foldable phones.

(I’m aware of the rumors)

That doesn’t mean you can’t go overboard. I don’t know Samsung’s current lineup, but I think we’ve all seen PC manufacturers who make 75 different models that are all just ever so slightly different for seemingly no reason.


They make them for channels, not consumers, and, it's partly 'an east Asian' supply chain business culture thing. They're not thinking about how the brand/product appears as simple form in consumers minds, but about deliveries, parts, channel customers, optimizations, national differentiations.

It takes an incredible amount of organizational discipline to do what Apple does and without that ingrained into culture it has zero chance of working.

And yes - they are trying to fill a lot of holes - all sorts of holes, in all sorts of different ways.

It may be true that this is actually an optimal 2cnd place strategy. Samsung may possibly be dong the right thing and consumer confusion is the price we pay for not paying a few extra $ for an iPhone.


First, this issue has nothing to do with what Carney is talking about, second - nobody in Canada wants anything to with your 'ethno nationalist wars', third - the frequency with which this issue is brought up and pigeon-holed into everything is absurd, but fourth - and most critically - you're lying: the 'murderers' by all accounts were Indian nationals and the link you provided literally indicates that 'Karan Brar, age 22, Kamal Preet Singh, age 22, and Karan Preet Singh, age 28' arrested for murder - are Indian Nationals on temporary visas in Canada.


> nobody in Canada wants anything to with your 'ethno nationalist wars'

Absurd. These are YOUR 'ethno nationalist wars' because your country has given them a safe haven. This problem does not exist in India. Not one Sikh I know sympathizes with these separatists, and I have plenty of Sikh friends, been to their homes, been to their hometowns.


These are literally murders by Indian nationals on other Indian nationals, involving Indian government.

We want nothing to do with this.

Nobody is getting 'safe haven' - we have 'laws' and 'citizenship' so we respect those things, otherwise, we'd prefer all of you who want to continue your infighting to go home. Totally unwelcome.

Crucially - has nothing to do with this post.


> These are literally murders by Indian nationals on other Indian nationals

They are all in your immigration pipeline or already through it. The crimes are all on Canadian soil. Who has jurisdiction in the so-called "rules-based international order"?

> involving Indian government

This is your fantasy. You're playing fast and loose with accusations, just like Carney and Trudeau were while calling it "rules-based international order".

> We want nothing to do with this.

Then stop providing asylum. Stop courting them for votes. Prosecute criminals.

> Crucially - has nothing to do with this post.

Refer to the first line that I quoted.


"They are all in your immigration pipeline or already through it. The crimes are all on Canadian soil. "

India Logic: "We go somewhere else to commit crimes, it's their fault"

I don't want to say anything too offensive, but this is 'garbage logic'.

On the subject of migration - it's literally the 'garbage logic' that the majority of 'good people' are trying to escape.

Stop trying to defend the indefensible.


Canada logic: "Let's take in people who have links to Canadian crime gangs, and when things go bad, let's just blame India"

> it's literally the 'garbage logic' that the majority of 'good people' are trying to escape.

Thank you very much!! Please take more 'good people'! I heard "asylum crackdown began in Canada" by someone else right here. Please go protest it. I suppose these people are all upstanding model citizens of Canada now. You are most welcome to blame the murder of Harpreet Singh Uppal on India too. Just keep taking more 'good people'.

> Stop trying to defend the indefensible.

OK. I will personally accept all future blame, just like Jesus Christ. Only if you promise to keep taking more 'good people'.


> This problem does not exist in India. Not one Sikh I know sympathizes with these separatists

Then problem solved! If there are no separatists there is nobody to offer asylum to!


It is that simple. Canada is not learning. https://pbs.twimg.com/media/G76aXJOWkAA4CNy?format=jpg&name=...


Most of the asylum claims came before CY2025, which is when the false asylum crackdown began in Canada [0].

A major issue was the Truduea-era diplomatic spat that led to the expulsion of Canadian [1] and Indian [2] diplomatic staff who cooperated on background checks along with an MP in Punjab who ran a "cash for asylum claim" racket [3].

After Carney became PM and Anand became MFA, the Canada-India relationship went back on track, and Trudeau era appointees were largely sidelined.

[0] - https://indianexpress.com/article/cities/chandigarh/canada-c...

[1] - https://www.canada.ca/en/global-affairs/news/2024/10/ministe...

[2] - https://www.mea.gov.in/press-releases.htm?dtl/38420/India+ex...

[3] - https://theprint.in/ground-reports/punjabi-illegal-migration...


The government has to mandate it on some level with purchasing power.

If the government switched away from Microsft and refused to accept MS document formats for any legal reason - then things might shift.

Most businesses just don't care, they want they easy button.

A law firm does not want to screw around, they just click 'buy' on Word, Outlook, Teams.

There's a deep psychology to it.

I remember a developer telling me that Oracle 'was the only real database'.

It's not so much propaganda, just the propagandistic power of incumbency. People who only know one thing are hard pressed to believe there could be something else.

This is more than 50% brand, narrative etc.

We techies tend to underestimate the power of perception, even when it's of our own creation etc. i.e. people fighting over Linux and it's various distros.


Every time you send a request to a model you're already providing all of the context history along with it. To edit the context, just send a different context history. You can send whatever you want as history, it's entirely up to you and entirely arbitrary.

We only think in conversational turns because that's what we've expected a conversation to 'look like'. But that's just a very deeply ingrained convention.

Forget that there is such a thing as 'turns' in a LLM convo for now, imagine that it's all 'one-shot'.

So you ask A, it responds A1.

But when you and B, and expect B1 - which depends on A and A1 already being in the convo history - consider that you are actually sending that again anyhow.

Behind the scenes when you think you're sending just 'B' (next prompt) you're actually sending A + A1 + B aka including the history.

A and A1 are usually 'cached' but that's not the simplest way to do it, the caching is an optimization.

Without caching the model would just process all of A + A1 + B and B1 in return just the same.

And then A + A1 + B + B1 + C and expect C1 in return.

It just so happens it will cache the state of the convo at your previous turn, and so it's optimized but the key insight is that you can send whatever context you want at any time.

If after you send A + A1 + B + B1 + C and get C1, if you want to then send A + B + C + D and expect D1 ... (basically sending the prompts with no responses) - you can totally do that. It will have to re-process all of that aka no cached state, but it will definitely do it for you.

Heck you can send Z + A + X, or A + A1 + X + Y - or whatever you want.

So in that sense - what you are really sending (if you're using the simplest form API), is sending 'a bunch of content' and 'expecting a response'. That's it. Everything is actually 'one shot' (prefill => response) and that's it. It feels conversational but structural and operational convention.

So the very simple answer to your question is: send whatever context you want. That's it.


Bigger context makes responses slower.

Context is limited.

You do not want the cloud provider running a context compaction if you can control it a lot better.

There are even tips on when to ask the question like "send first the content then ask the question" vs. "ask the question then send the content"


When history is cached conversations tend not to be slower, because the LLM can 'continue' from a previous state.

So if there was already A + A1 + B + B1 + C + C1 and you asking 'D' ... well, [A->C1] is saved as state. It costs 10ms to prepare. Then, they add 'D' as your question and that will be done 'all tokens at once' in bulk - which is fast.

Then - they they generate D1 (the response) they have to do it one token at a time, which is slow. Each token has to be processed separately.

Also - even if they had to redo- all of [A->C1] 'from scratch' - its not that slow, because the entire block of tokens can be processed in one pass.

'prefill' (aka A->C1) is fast, which by the way is why it's 10x cheaper.

So prefill is 10x faster than generation, and cache is 10x cheaper than prefill as a very general rule of thumb.


Thats only the case with KV Cache and we do not know how and how long providers keep it.


Prefill is 10x faster than generation without caching, and 100x faster with caching - as a very crude measure. So it's not a matter of 'only the case'. Those are different scenarios. Some hosts are better than others with respect to managing caching, but the better one's provide decent SLA on that.


This is how I view it as well.

And... and...

This results in a _very_ deep implication, which big companies may not be eager to let you see:

they are context processors

Take it for what it is.


What you are trying to say is they are plagiarists and training on the input?

We know that already I don’t know why have to be quiet or hint at it, in fact they have been quite explicit about it.

Or is there some other context to your statement? Anyway that’s my “take that for what you will”.


"It starts by believing that there are distinct human races (which there are not). . That alone makes most US Americans racist based on language alone. "

Sorry, but no.

The scientific community has moved away from 'race' in the biological sense (although there is debate) but the sociological construct of race, which is what we refer to in this context, obviously exists.

When a person 'self identifies' as Black, or Asian or White - that is 'race' - in the 'social construct' sense and it's perfectly accepted and normal - the recognition of that does not make one racist.


> but the sociological construct of race, which is what we refer to in this context, obviously exists.

I doubt that something built on self-identification yields a meaningful concept of racism.


It's clear as day, and it's hard to understand that someone could be confused by this.

It's literally on the census form.

'Race' is a cultural euphemism for broader ethnicity.

AKA 'European = White' - 'African = Black' - more or less.

These are not arbitrary groups of 'self identification' like 'emo' or 'punk'.

These groups are even self organizing - every single US city is built around small enclaves of groups - they pop right out on urban maps.

We've been fighting tribal wars since the dawn of time, it's not hard to imagine how the 'Flemish' vs. 'Dutch' is not going to extend to 'European vs. African'.

Elon Musk, on twitter, 2 days ago, was interjecting on this horrible bit of 'race war' nonsense, talking about 'blacks eviscerate whites' etc..

Again - while there's feeble support for the notion of 'race' in the field of biology (although I think it's more controversial than stated), we obviously have cultural foundations around those concepts.

Honestly - this kind of argument is plausibly the 'worst thing' about HN. I don't understand how something so common and obvious could be devoid in the face of some, odd, hair-splitting rhetoric.


It's obviously racist - but people have to stop assuming that word means one thing.

In that statement, it's not disdain for another group, it's disdain and resignation over racial politics.

He seems to in fact have empathy, but has become maligned for some reason.

He's seems to be 'giving up' on the cause and suggesting people go their separate ways.

It's frankly much more cynical than it is racist.

That's nothing near a traditional racist view.

It's the posture of a cynical, old angry man - not some kind of White Nationlist.

I'm not justifying anything but I am indicating that these thins are obviously nuanced.

That said - I'm reflecting on a single comment, not his entire body of ugly commentary.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: