More

EigenLord · 2025-10-24T07:04:04 1761289444

I've always felt that you can't think in anything besides thought. Words, images, symbols, etc, are all side-effects. They absolutely bend back and influence the thought process, but they are always secondary and indirect. Thought itself is ineffable.

EigenLord · 2025-10-24T06:58:32 1761289112

Interesting work but this strikes me as a somewhat quixotic fight against inevitable tendencies of statistical models. Reinforcement learning has a single goal, an agreeable mean. Reinforcement learning stops when the LLM produces agreeable responses more often than not, the only way you can achieve absolute certainty here is if you tune it for an infinite amount of time. I also don't see how this method couldn't be subsumed by a simpler method like dynamic temperature adjustment. Transformers are fully capable of generating unpredictable yet semantic text based on a single hyperparameter. Maybe it would make more sense to simply experiment with different temperature settings. Usually it's a fixed value.

EigenLord · 2025-10-24T06:41:53 1761288113

I think there's a critical flaw with Anthropic's approach to memory which is that they seem to hide it behind a tool call. This creates a circularity issue: the agent needs to "remember to remember." Think how screwed you would be if you were consciously responsible for knowing when you had to remember something. It's almost a contradiction in terms. Recollection is unconscious and automatic, there's a constant auto-associative loop running in the background at all times. I get the idea of wanting to make LLMs more instrumental and leave it to the user to invoke or decide certain events: that's definitely the right idea in 90% of cases. But for memory it's not the right fit. In contrast OpenAI's approach, which seems to resemble more generic semantic search, leaves things wanting for other reasons. It's too lossy.

EigenLord · 2025-10-24T06:37:05 1761287825

I wish more comp sci curricula would sprinkle in more general courses in logic and especially 20th century analytic philosophy. Analytic philosophy is insanely relevant to many computer science topics especially AI.

EigenLord · 2025-07-08T05:48:07 1751953687

I've been thinking lately about how AGI runs up against the No Free Lunch Theorem. This is what irritates me: science is not determining the narrative. Money is. I highly recommend mathematician David Wolpert's work on the topic. I think he inadvertently proved that ASI is physically impossible. Certainly he proved that AOI (artificial omniscient intelligence) is impossible.

One thing he showed is that you can't have a universe with two omniscient intelligences (as it would be intractable for them to predict the other's behavior.)

It's also very questionable whether "humanlike" intelligence is truly general in the first place. I think cognitive neurobiologists would agree that we have a specific "cognitive niche", and while this symbolic niche seems sufficiently general for a lot of problems, there are animals that make us look stupid in other respects. This whole idea that there is some secret sauce special algorithm for universal intelligence is extremely suspect. We flatter ourselves and have committed to a fundamental anthropomorphic fallacy that seems almost cartoonishly elementary for all the money behind it.

jxjnskkzxxhx · 2025-07-09T11:03:21 1752059001

TIL there actually is something called "no free lunch in search and optimization"[1].

See however that the theorem is quite weak. Requires eg the assumption that the search space has no structure. They even have the example of quadratic problems. It's mostly a useless saying, it appears to me.

[1] https://en.m.wikipedia.org/wiki/No_free_lunch_in_search_and_...

FilosofumRex · 2025-07-08T06:40:23 1751956823

AGI can't be defined, because it's the means by which definitions are created. You can only measure it contemporaneously by some consensus method such as ARC.

You can't define AGI, any more than you can define ASA (artificial sports ability). Intelligence, like athleticism changes both quantitively and qualitatively. The Greek Olympic champions of 2K yrs ago wouldn't qualify for high school championships today, however, they were once regarded as great athletes.

lern_too_spel · 2025-07-08T16:18:12 1751991492

ASI is as different from AOI as BB(8) is from infinity. The impossibility of AOI says bubkis about ASI.

fragmede · 2025-07-08T06:21:40 1751955700

When hasn't money determined the narrative?

jxjnskkzxxhx · 2025-07-09T11:05:20 1752059120

Ironically, you missed the point. The person you're responding to is saying the opposite: the narrative (that there is something unique about how our brain solves problems) determines the money (invested).

EigenLord · 2025-07-08T05:36:18 1751952978

Diffusion is just the logically most optimally behavior for searching massively parallel spaces without informed priors. We need to think beyond language modeling however and start to view this in terms of drug discovery etc. A good diffusion model + the laws of chemistry could be god-tier. I think language modeling has the AI community's in its grips right now and they aren't seeing the applications of the same techniques to real world problems elsewhere.

dawnofdusk · 2025-07-08T05:41:02 1751953262

Actually in most deep learning schemes for science adding in the "laws of nature" as constraints makes things much worse. For example, all the best weather prediction models utilize basically zero fluid dynamics. Even though a) global weather can be in principle predicted by using the Navier-Stokes equations and b) deep learning models can be used to approximately evaluate the Navier-Stokes equations, we now know that incorporating physics into these models is mostly a mistake.

The intuitive reason might be that unconstrained optimization is easier than constrained optimization, particularly in high dimensions, but no one really knows the real reason. It may be that we are not yet at the end of the "bigger is better" regime, and at the true frontier we must add the laws of natures to eke out the last remaining bits of performance possible.

anotherpaul · 2025-07-08T05:46:36 1751953596

Well diffusion models have long already made the jump to biology at least. Esm3 and alphafold 3 both are diffusion based.

EigenLord · 2025-05-05T03:57:14 1746417434

Years ago, in my writings I talked about the dangers of "oracularizing AI". From the perspective of those who don't know better, the breadth of what these models have memorized begins to approximate omniscience. They don't realize that LLMs don't actually truly know anything, there is no subject of knowledge that experiences knowing on their end. ChatGPT can speak however many languages, write however many programming languages, give lessons on virtually any topic that is part of humanity's general knowledge. If you attribute a deeper understanding to that memorization capability I can see how it would throw someone through a loop.

At the same time, there is quite a demand for a (somewhat) neutral, objective observer to look at our lives outside the morass of human stakes. AI's status as a nonparticipant, as a deathless, sleepless observer, makes it uniquely appealing and special from an epistemological standpoint. There are times when I genuinely do value AI's opinion. Issues with sycophancy and bias obviously warrant skepticism. But the desire for an observer outside of time and space persists. It reminds me of a quote attributed to Voltaire: "If God didn't exist it would be necessary to invent him."

jfil · 2025-05-05T05:05:52 1746421552

A loved one recently had this experience with ChatGPT: paste in a real-world text conversation between you and a friend without real names or context. Tell it to analyze the conversation, but say that your friend's parts are actually your own. Then ask it to re-analyze with your own parts attributes to you correctly. It'll give you vastly different feedback on the same conversation. It is not objective.

MoonGhost · 2025-05-05T10:11:59 1746439919

Good no know. Probably makes sense to ask personal advises as 'for my friend'.

bufferoverflow · 2025-05-05T07:11:31 1746429091

That works on humans too.

kelseyfrog · 2025-05-05T16:27:24 1746462444

"Oracularizing AI" has a lot of mileage.

It's not too much to say that AI, LLMs in particular, satisfy the requisites to be considered a form of divination. ie:

1. Indirection of meaning - certainly less than the Tarot, I Ching, or runes, but all text is interpretive. Words in a Saussurian way are always signifiers to the signified, or in Barthes's death of the author[2] - precise authorial intention is always inaccessible.

2. A sign system or semiotic field - obvious in this case: human language.

3. Assumed access to hidden knowledge - in the sense that LLM datasets are popularly known to contain all the worlds knowledge, this necessarily includes hidden knowledge.

4. Ritualized framing - Approaching an LLM interface is the digital equivalent to participating in other divinatory practices. It begins with setting the intention - to seek an answer. The querent accesses the interface, formulates a precise question by typing, and commits to the act by submitting the query.

They also satisfy several of the typical but not necessary aspects of divinatory practices:

5. Randomization - The stochastic nature of token sampling naturally results in random sampling.

6. Cosmological backing - There is an assumption that responses correspond to the training set and indirectly to the world itself. Meaning embedded in the output correspond in some way - perhaps not obviously - to meaning in the world.

7. Trained interpreter - In this case, as in many divinatory systems, the interpreter and querent are the same.

8. Feedback loop - ChatGPT for example is obviously a feedback loop. Responses naturally invite another query and another - a conversation.

It's often said that sharing AI output is much like sharing dreams - only meaningful to the dreamer. In this framework, sharing AI responses are more like sharing Tarot card readings. Again, only meaningful to the querent. They feel incredibly personalized like horoscopes, but it's unclear whether that meaning is inherent to the output or simply the querents desire to imbue the output with by projecting their meaning onto it.

Like I said, I feel like there's a lot of mileage in this perspective. It explains a lot about why people feel a certain way about AI and hearing about AI. It's also a bit unnerving; we created another divinatory practice and a HUGE chunk of people participate and engage with it without calling it such and simply believing it, mostly because it doesn't look like Tarot or runes, or I Ching even though ontologically it fills the same role.

Notes: 1. https://en.wikipedia.org/wiki/Signified_and_signifier

2. https://en.wikipedia.org/wiki/The_Death_of_the_Author

EigenLord · 2025-05-05T03:40:54 1746416454

I think the answer to the professor's dismay is quite simple. Many people are in university to survive a brutal social darwinist economic system, not to learn and cultivate their minds. Only a very small handful of them were ever there to study Euler angles earnestly. The rest view it as a hoop they have to jump through to hopefully get a job that might as well be automated away by AI anyway. Also viewed from a conditional reinforcement perspective, all the professor has to do is start docking grade points from students who are obviously cheating. Theory predicts they will either stop doing it, or get so good at it that it becomes undetectable-possibly an in-demand skill for the future.

doctorpangloss · 2025-05-05T16:52:31 1746463951

Whose system though?

I agree, it's weird for parents to say, "Jump through these hoops, and for every dollar you earn grinding sesame for some company, we'll give you an additional two."

Working and educating yourself is decent and dignified, no? Is this a bad deal?

tenacious_tuna · 2025-05-05T19:11:29 1746472289

> Working and educating yourself is decent and dignified, no?

I think that depends radically on the nature of the work. I hold a BS in Computer Science but am at an organization that requires me to use LLMs as part of my performance evaluation; I could protest, but it puts my immigration status at risk (my employer has sponsored me into my current country). I view the things asked of me (using LLMs) as degrading, but I'm unable to effectively protest that despite being well-regarded as an engineer (by peers and past employers) and credentialed (BS in CS).

Put differently, most people do A Job because they need to put food on the table. One of my partners used to work in the veterinary field, which took an immense physical toll on them. They're much happier being (f)unemployed currently, being able to work in the garden and make good food and produce art, but our finances are suffering for it; they're hunting for jobs, but most of the current openings are pretty bad in terms of work/life balance and future opportunity.

Working is not inherently necessary; in our current economic system it's exploitatively-required in order to live any sort of decent and dignified life, and there's loads of stories about people who work but aren't treated with dignity (thru healthcare or housing or food strife).

johnnyanmac · 2025-05-06T02:53:14 1746499994

>Whose system though?

clearly the billionaires who made it so a decent job isn't even guaranteed to cover rent.

Balgair · 2025-05-05T05:57:04 1746424624

Nit pick: He's not a professor, just a grad student at the same place he got his undergrad, and he's mostly gone to university during covid. At least per his page here: https://claytonwramsey.com/about/

Its not like professors get real training either, but the guy doesn't seem to have gotten any real pedagogy.

I guess that I'm driving at that this guy is awfully young and the essay was a hot take. We should judge it accordingly.

EigenLord · 2025-05-01T06:32:13 1746081133

I would love it if LLMs told me I'm wrong more often and said "actually no I have a better idea." Provided, of course, that it actually follows up with a better idea.

EigenLord · 2025-04-14T03:48:19 1744602499

The author makes good general points but seems to be overloading MCP's responsibilities imo. My understanding of MCP is that it just provides a ready-made "doorway" for LLMs to enter and interact with externally managed resources. It's a bridge or gateway. So is it really MCP's fault that it:

>makes it easier to accidentally expose sensitive data.

So does the "forward" button on emails. Maybe be more careful about how your system handles sensitive data. How about:

>MCP allows for more powerful prompt injections.

This just touches on wider topic of only working with trusted service providers that developers should abide by generally. As for:

>MCP has no concept or controls for costs.

Rate limit and monitor your own usage. You should anyway. It's not the road's job to make you follow the speed limit.

Finally, many of the other issues seem to be more about coming to terms with delegating to AI agents generally. In any case it's the developer's responsibility to manage all these problems within the boundaries they control. No API should have that many responsibilities.

TeMPOraL · 2025-04-14T08:02:54 1744617774

Yeah. That's another in a long line of MCP articles and blogposts that's been coming up over the past few weeks, that can be summarized as "breaking news: this knife is sharp and can cut someone if you swing it at people, it can cut you if you hold it the wrong way, and is not a toy suitable for small children".

Well, yes. A knife cuts things, it's literally its only job. It will cut whatever you swing it at, including people and things you didn't intend to - that's the nature of a general-purpose cutting tool, as opposed to e.g. safety razor or plastic scissors for small children, which are much safer, but can only cut few very specific things.

Now, I get it, young developers don't know that knives and remote access to code execution on a local system are both sharp tools and need to be kept out of reach of small children. But it's one thing to remind people that the tool needs to be handled with care; it's another to blame it on the tool design.

Prompt injection is a consequence of the nature of LLMs, you can't eliminate it without degrading capabilities of the model. No, "in-band signaling" isn't the problem - "control vs. data" separation is not a thing in nature, it's designed into systems, and what makes LLMs useful and general is that they don't have it. Much like people, by the way. Remote MCPs as a Service are a bad idea, but that's not the fault of the protocol - it's the problem of giving power to third parties you don't trust. And so on.

There is technical and process security to be added, but that's mostly around MCP, not in it.

Joker_vD · 2025-04-14T13:53:15 1744638795

Well. To repurpose you knife analogy, they (we?) duct-taped a knife on an erratic, PRNG-controlled roomba and now discover that people are getting their Achilles tendons sliced. Technically, it's all functioning exactly as intended, but: this knife was designed specifically to be attached to such roombas, and apparently nobody stopped to think whether it was such a great idea.

And admonishments of "don't use it when people are around, but if you do, it's those people's fault when they get cut: they should've be more careful and probably wore some protective foot-gear" while technically accurate, miss the bigger problem. That is, that somebody decided to strap a sharp knife to a roomba and then let it whiz around in the space full of people.

Mind you, we have actual woodcutting table saws with built-in safety measures: they instantly stop when they detect contact with human skin. So you absolutely can have safe knives. They just cost more, and I understand that most people value (other) people's health and lives quite cheaply indeed, and so don't bother buying/designing/or even considering such frivolities.

dharmab · 2025-04-14T14:13:06 1744639986

This is a total tangent, but we can't have 100% safe knives because one of the uses for a knife is to cut meat. (Sawstop the company famously uses hot dogs to simulate human fingers in their demos.)

TeMPOraL · 2025-04-14T14:55:51 1744642551

Yes. Also, equally important is the fact that table saws are not knives. The versatility of a knife was the whole point of using it as an example.

--

EDIT: also no, your comment isn't a tangent - it's exactly on point, and a perfect illustration of why knives are a great analogy. A knife in its archetypal form is at the highest point of its generality as a tool. A cutting surface attached to a handle. There is nothing you could change in this that would improve it without making it less versatile. In particular, there is no change you could make that would make a knife safer without making it less general (adding a handle to the blade was the last such change).

No, you[0] can't add a Sawstop-like system to it, because as you[1] point out, it works by detecting meat - specifically, by detecting the blade coming in contact with something more conductive than wood. Such "safer" knife thus can't be made from non-conductive materials (e.g. ceramics), and it can't be used to work with fresh food, fresh wood, in humid conditions, etc.[2]. You've just turned a general-purpose tool into a highly specialized one - but we already have a better version of this, it's the table saw!

Same pattern will apply to any other idea of redesigning knives to make them safer. Add a blade cage of some sort? Been done, plenty of that around your kitchen, none of it will be useful in a workshop. Make knife retractable and add a biometric lock? Now you can't easily share the knife with someone else[3], and you've introduced so many operational problems it isn't even funny.

And so on, and so on; you might think that with enough sensors and a sufficiently smart AI, a perfectly safe knife could be made - but then, that's also exist, it's called you the person who is wielding the knife.

To end this essay my original witty comment has now become, I'll spell it out: like a knife, LLMs are by design general-purpose tools. You can make them increasingly safer by sacrificing some aspects of their functionality. You cannot keep them fully general and make them strictly safer, because the meaning of "safety" is itself highly situational. If you feel the tool is too dangerous for your use case, then don't use it. Use a table saw for cutting wood, use a safety razor for shaving, use a command line and your brain for dealing with untrusted third-party software - or don't, but then don't go around blaming the knife or the LLM when you hurt yourself by choosing to use too powerful a tool for the job at hand. Take responsibility, or stick to Fisher-Price alternatives.

Yes, this is a long-winded way of saying: what's wrong with MCP is that a bunch of companies are now trying to convince you to use it in a dangerous way. Don't. Your carelessness is your loss, but their win. LLMs + local code execution + untrusted third parties don't mix (neither do they mix if you remove "LLMs", but that's another thing people still fail to grasp).

As for solutions to make systems involving LLMs safer and more secure - again, look at how society handles knives, or how we secure organizations in general. The measures are built around the versatile-but-unsafe parts, and they look less technical, and more legal.

(This is to say: one of the major measures we need to introduce is to treat attempts at fooling LLMs the same way as fooling people - up to and including criminalizing them in some scenarios.)

--

[0] - The "generic you".

[1] - 'dharmab

[2] - And then if you use it to cut through wet stuff, the scaled-down protection systems will likely break your wrist; so much for safety.

[3] - Which could easily become a lethal problem in an emergency, or in combat.

skybrian · 2025-04-14T17:10:35 1744650635

The problem with the “knife is sharp” argument is that it’s too generic. It can be deployed against most safety improvements. The modern world is built on driving accident rates down to near-zero. That’s why we have specialized tools like safety razors. Figuring out what to do to reduce accident rates is what postmortems are for - we don’t just blame human error, we try to fix things systematically.

As usual, the question is what counts as a reasonable safety improvement, and to do that we would need to go into the details.

I’m wondering what you think of the CaMeL proposal?

https://simonwillison.net/2025/Apr/11/camel/#atom-everything

noodletheworld · 2025-04-14T09:57:59 1744624679

Some of the other issues are less important than others, but even if you accept “you have to take responsibility for yourself”, let me quote the article:

> As mentioned in my multi-agent systems post, LLM-reliability often negatively correlates with the amount of instructional context it’s provided. This is in stark contrast to most users, who (maybe deceived by AI hype marketing) believe that the answer to most of their problems will be solved by providing more data and integrations. I expect that as the servers get bigger (i.e. more tools) and users integrate more of them, an assistants performance will degrade all while increasing the cost of every single request. Applications may force the user to pick some subset of the total set of integrated tools to get around this.

I will rephrase it in stronger terms.

MCP does not scale.

It cannot scale beyond a certain threshold.

It is Impossible to add an unlimited number of tools to your agents context without negatively impacting the capability of your agent.

This is a fundamental limitation with the entire concept of MCP and needs addressing far more than auth problems, imo.

You will see posts like “MCP used to be good but now…” as people experience the effects of having many MCP servers enabled.

They interfere with each other.

This is fundamentally and utterly different from installing a package in any normal package system, where not interfering is a fundamental property of package management in general.

Thats the problem with MCP.

As an idea it is different to what people trivially expect from it.

weird-eye-issue · 2025-04-14T10:34:29 1744626869

I think this can largely be solved with good UI. For example, if an MCP or tool gets executed that you didn't want to get executed, the UI should provide an easy way to turn it off or to edit the description of that tool to make it more clear when it should be used and should not be used by the agent.

Also, in my experience, there is a huge bump in performance and real-world usage abilities as the context grows. So I definitely don't agree about a negative correlation there, however, in some use cases and with the wrong contexts it certainly can be true.

zoogeny · 2025-04-14T17:04:02 1744650242

I don't think that could be sufficient to solve the problem.

I'm using Gemini with AI Studio and the size of a 1 million token context window is becoming apparent to me. I have a large conversation, multiple paragraphs of text on each side of the conversation, with only 100k tokens or so. Just scrolling through that conversation is a chore where it becomes easier just to ask the LLM what we were talking about earlier rather than try to find it myself.

So if I have several tools, each of them adding 10k+ context to a query, and all of them reasonable tool requests - I still can't verify that it isn't something "you [I] didn't want to get executed" since that is a vague description of the failure states of tools. I'm not going to read the equivalent of a novel for each and every request.

I say this mostly because I think some level of inspectability would be useful for these larger requests. It just becomes impractical at larger and larger context sizes.

robertlagrant · 2025-04-14T11:00:33 1744628433

> For example, if an MCP or tool gets executed that you didn't want to get executed, the UI should provide an easy way to turn it off or to edit the description of that tool to make it more clear when it should be used and should not be used by the agent.

Might this become more simply implemented as multiple individual calls, possibly even to different AI services, chained together with regular application software?

weird-eye-issue · 2025-04-14T14:01:19 1744639279

I don't understand your question at all

If you are saying why have autonomous agents at all and not just workflows, then obviously the answer is that it just depends on the use case. Most of the time workflows that are not autonomous are much better, but not always, and sometimes they will also include autonomous parts in those workflows

TeMPOraL · 2025-04-14T16:52:21 1744649541

Simple: if the choice is getting overwhelming to the LLM, then... divide and conquer - add a tool for choosing tools! Can be as simple as another LLM call, with prompt (ugh, "agent") tasked strictly with selecting a subset of available tools that seem most useful for the task at hand, and returning that to "parent"/"main" "agent".

You kept adding more tools and now the tool-master "agent" is overwhelmed by the amount of choice? Simple! Add more "agents" to organize the tools into categories; you can do that up front and stuff the categorization into a database and now it's a rag. Er, RAG module to select tools.

There are so many ways to do it. Using cheaper models for selection to reduce costs, dynamic classification, prioritizing tools already successfully applied in previous chat rounds (and more "agents" to evaluate if a tool application was successful)...

Point being: just keep adding extra layers of indirection, and you'll be fine.

soulofmischief · 2025-04-14T18:50:14 1744656614

The problem is that even just having the tools in the context can greatly change the output of the model. So there can be utility in the agent seeing contextually relevant tools (RAG as you mentioned, etc. is better than nothing) and a negative utility in hiding all of them behind a "get_tools" request.

empath75 · 2025-04-14T13:53:13 1744638793

"Sequential thinking" is one that I tried recently because so many people recommend it, and I have never, ever, seen the chatbot actually do anything but write to it. It never follows up any of it's chains of thoughts or refers to it's notes.

DavidPP · 2025-04-14T15:50:28 1744645828

In which client and with which LLM are you using it?

I use it in Claude Desktop for the right use case, it's much better than thinking mode.

But, I admit, I haven't tried it in Cursor or with other LLMs yet.

kiitos · 2025-04-14T13:38:55 1744637935

> It is Impossible to add an unlimited number of tools to your agents context without negatively impacting the capability of your agent.

Huh?

MCP servers aren't just for agents, they're for any/all _clients_ that can speak MCP. And capabilities provided by a given MCP server are on-demand, they only incur a cost to the client, and only impact the user context, if/when they're invoked.

noodletheworld · 2025-04-14T14:32:10 1744641130

> they only incur a cost to the client, and only impact the user context, if/when they're invoked.

Look it up. Look up the cross server injection examples.

I guarantee you this is not true.

An MCP server is at it's heart some 'thing' that provides a set of 'tools' that an LLM can invoke.

This is done by adding a 'tool definition'.

A 'tool definition' is content that goes into the LLM prompt.

That's how it works. How do you imagine an LLM can decide to use a tool? It's only possible if the tool definition is in the prompt.

The API may hide this, but I guarantee you this is how it works.

Putting an arbitrary amount of 3rd party content into your prompts has a direct tangible impact on LLM performance (and cost). The more MCP servers you enable the more you pollute your prompt with tool definitions, and, I assure you, the worse the results are as a result.

Just like pouring any large amount of unrelated crap into your system prompt does.

At a small scale, it's ok; but as you scale up, the LLM performance goes down.

Here's some background reading for you:

https://github.com/invariantlabs-ai/mcp-injection-experiment...

https://docs.anthropic.com/en/docs/build-with-claude/tool-us...

cruffle_duffle · 2025-04-14T15:57:08 1744646228

I think makers of LLM “chat bots” like the Claude desktop or cursor have a ways to go when it comes to exposing precisely what the LLM is being promoted.

Because yes, for the LLM to find the MCP servers it needs that info on its prompt. And the software is currently hiding how that information is being exposed. Is it prepended to your own message? Does it put it at the start of the entire context? If yes, wouldn’t real-time changes in tool availability invalidate the entire context? So then does it add it to end of the context window instead?

Like nobody really has this dialed in completely. Somebody needs to make a LLM “front end” that is the raw de-tokenized input and output. Don’t even attempt to structure it. Give me the input blob and output blob.

… I dunno. I wish these tools had ways to do more precise context editing. And more visibility. It would help make more informed choices on what to prompt the model with.

/Ramble mode off.

But slightly more serious; what is the token cost for a MCP tool? Like the llm needs its name, a description, parameters… so maybe like 100 tokens max per tool? It’s not a lot but it isn’t nothing either.

kiitos · 2025-04-14T16:17:06 1744647426

I've recently written a custom MCP server.

> An MCP server is at it's heart some 'thing' that provides a set of 'tools' that an LLM can invoke.

A "tool" is one of several capabilities that a MCP server can provide to its callers. Other capabilities include "prompt" and "resource".

> This is done by adding a 'tool definition'. A 'tool definition' is content that goes into the LLM prompt. That's how it works. How do you imagine an LLM can decide to use a tool? It's only possible if the tool definition is in the prompt.

I think you're using an expansive definition of "prompt" that includes not just the input text as provided by the user -- which is generally what most people understand "prompt" to mean -- but also all available user- and client-specific metadata. That's fine, just want to make it explicit.

With this framing, I agree with you, that every MCP server added to a client -- whether that's Claude.app, or some MyAgent, or whatever -- adds some amount of overhead to that client. But that overhead is gonna be fixed-cost, and paid one-time at e.g. session initialization, not every time per e.g. request/response. So I'm struggling to imagine a situation where those costs are anything other than statistical line noise, compared to the costs of actually processing user requests.

> https://docs.anthropic.com/en/docs/build-with-claude/tool-us...

To be clear, this concept of "tool" is completely unrelated to MCP.

> https://github.com/invariantlabs-ai/mcp-injection-experiment...

I don't really understand this repo or its criticisms. The authors wrote a related blog post https://invariantlabs.ai/blog/whatsapp-mcp-exploited which says (among other things) that

> In this blog post, we will demonstrate how an untrusted MCP server ...

But there is no such thing as "an untrusted MCP server". Every MCP server is assumed to be trusted, at least as the protocol is defined today.

zoogeny · 2025-04-14T17:14:53 1744650893

> But that overhead is gonna be fixed-cost, and paid one-time at e.g. session initialization, not every time per e.g. request/response.

I don't work for a foundational model provider, but how do you think the tool definitions get into the LLM? I mean, they aren't fine-tuning a model with your specific tools definitions, right? Your just using OpenAI's base model (or Claude, Gemini, etc.) So at some point the tool definitions have to be added to the prompt. It is just getting added to the prompt auto-magically by the foundation provider. That means it is eating up some context window, just a portion of the context window that is normally reserved for the provider, a section of the final prompt that you don't get to see (or alter).

Again, while I don't work for these companies or implement these features, I cannot fathom how the feature could work unless it was added to every request. And so the original point of the thread author stands.

kiitos · 2025-04-14T17:43:07 1744652587

You're totally right, in that: whatever MCP servers your client is configured to know about, have a set of capabilities, each of which have some kind of definition, all of which need to be provided to the LLM, somehow, in order to be usable.

And you're totally right that the LLM is usually general-purpose, so the MCP details aren't trained or baked-in, and need to be provided by the client. And those details probably gonna eat up some tokens for sure. But they don't necessarily need to be included with every request!

Interactions with LLMs aren't stateless request/response, they're session-based. And you generally send over metadata like what we're discussing here, or user-defined preferences/memory, or etc., as part of session initialization. This stuff isn't really part of a "prompt" at least as that concept is commonly understood.

zoogeny · 2025-04-14T17:51:24 1744653084

I think we are confusing the word "prompt" here leading to miscommunication.

There is the prompt that I, as a user, send to OpenAI which then gets used. There there is "prompt" which is being sent to the LLM. I don't know how these things are talked about internally at the company. But they take the "prompt" you send them and add a bunch of extra stuff to it. For example, they add in their own system message and they will add your system message. So you end up with something like <OpenAI system message> + <User system message> + <user prompt>. That creates a "final prompt" that gets sent to the LLM. I'm sure we both agree on that.

With MCP, we are also adding in <tool description> to that final prompt. Again, it seems we are agreed on that.

So the final piece of the argument is, as that "final prompt" (or whatever is the correct term) is growing. It is the size of the provider system prompt, plus the size of the user system prompt, plus the size of the tool description, plus the size of the actual user prompt. You have to pay that "final prompt" cost for each and every request you make.

If the size of the "final prompt" affects the performance of the LLM, such that very large "final prompt" sizes adversely affect performance, than it stands to reason that adding many tool definitions to a request will eventually degrade the LLM performance.

kiitos · 2025-04-14T20:43:37 1744663417

> With MCP, we are also adding in <tool description> to that final prompt. Again, it seems we are agreed on that.

Interactions with a LLM are session-based, when you create a session there is some information sent over _once_ as part of that session construction, that information applies to all interactions made via that session. That initial data includes contextual information, like user preferences, model configuration as specified by your client, and MCP server definitions. When you type some stuff and hit enter that is a user prompt that may get hydrated with some additional stuff before it gets sent out, but it doesn't include any of that initial data stuff provided at the start of the session.

noodletheworld · 2025-04-14T23:41:57 1744674117

> that information applies to all interactions made via that session

Humm.. maybe you should run an llama.cpp server in debug mode and review the content that goes to the actual LLM; you can do that with the verbose flag or `OLLAMA_DEBUG=1` (if you use ollama).

What you are describing is not how it works.

There is no such thing as an LLM 'session'.

That is a higher level abstraction that sits on top of an API that just means some server is caching part of your prompt and taking some fragment you typed in the UI and combining them on the server side before feeding them to the LLM.

It makes no difference how it is implemented technically.

Fundamentally; any request you make which can invoke tools will be transformed, as some point, into a definition that includes the tool definitions before it is passed to the LLM.

That has a specific, measurable cost on LLM performance as the number of tool definitions go up.

The only solution to that is to limit the number of tools you have enabled; which is entirely possible and reasonable to do, by the way.

My point is that adding more and more and more tools doesn't scale and doesn't work.

It only works when you have a few tools.

If you have 50 MCP servers enabled, your requests are probably degraded.

kagevf · 2025-04-15T01:32:30 1744680750

> There is no such thing as an LLM 'session'.

This matches my understanding too, at least how it works with Open AI. To me, that would explain why there's a 20 or 30 question limit for a conversation, because the necessary context that needs to be sent with each request would necessarily grow larger and larger.

Spivak · 2025-04-14T04:04:37 1744603477

I think the author's point is that the architecture of MCP is fundamentally extremely high trust between not only your agent software and the integrations, but the (n choose 2) relationships between all of them. We're doing the LLM equivalent of loading code directly into our address space and executing it. This isn't a bad thing, dlopen is incredibly powerful with this power, but the problem being solved with MCP just isn't that level of trust.

The real level of trust is on the order OAuth flows where the data provider has a gun sighted on every integration. Unless something about this protocol and it's implementations change I expect every MCP server to start doing side-channel verification like getting an email "hey your LLM is asking to do thing, click the link to approve." Where in this future it severely inhibits the usefulness of agents in the same vein as Apple's "click the notification to run this automation."

zoogeny · 2025-04-14T17:20:51 1744651251

Sure, at first, until the users demand a "always allow this ..." kind of prompt and we are back in the same place.

A lot of these issues seem trivial when we consider having a dozen agents running on tens of thousands of tokens of context. You can envision UIs that take these security concerns into account. I think a lot of the UI solutions will break down if we have hundreds of agents each injecting 10k+ tokens into a 1m+ context. The problems we are solving for today won't hold as LLMs continue to increase in size and complexity.

ZiiS · 2025-04-14T07:52:18 1744617138

> Rate limit and monitor your own usage. You should anyway. It's not the road's job to make you follow the speed limit.

A better metaphor is the car, not the road. It is legally required to accurately tell you your speed and require deliberate control to increase it.

Even if you stick to a road; whoever made the road is required to research and clearly post speed limits.

jacobr1 · 2025-04-14T14:32:38 1744641158

Exactly. It is pretty common for APIs to actually signal this too. Headers to show usage limits or rates. Good error codes (429) with actual documentation on backoff timeframes. If you use instrument your service to respect read and respect the signals it gets, everything moves smoother. Backing stuff like that back into the MCP spec or at least having common conventions that are applied on top will be very useful. Similarly for things like tracking data taint, auth, tracing, etc ... Having a good ecosystem makes everything play together much nicer.

TeMPOraL · 2025-04-14T17:04:14 1744650254

Also extending the metaphor, you can make a road that controls where you go and makes sure you don't stray from it (whether by accident or on purpose): it's called rail, and its safety guarantees come with reduced versatility.

Don't blame roads for not being rail, when you came in a car because you need the flexibility that the train can't give you.

fsndz · 2025-04-14T07:24:01 1744615441

why would anyone accept to expose sensitive data so easily with MCP ? also MCP does not make AI agents more reliable, it just gives them access to more tools, which can decrease reliability in some cases:https://medium.com/thoughts-on-machine-learning/mcp-is-mostl...

Eisenstein · 2025-04-14T08:03:45 1744617825

People accept lots of risk in order to do things. LLMs offer so much potential that people want to use so they will try, and it is but through experience that we can learn to mitigate any downsides.

sshh12 · 2025-04-14T04:23:48 1744604628

Totally agree, hopefully it's clear closer to the end that I don't actually expect MCP to solve and be responsible for a lot of this. More so MCP creates a lot of surface area for these issues that app developers and users should be aware of.

peterlada · 2025-04-14T09:13:57 1744622037

Love the trollishness/carelessness of your post. Exactly as you put it: "it is not the road's job to limit your speed".

Like a bad urban planner building a 6 lane city road with the 25mph limit and standing there wondering why everyone is doing 65mph in that particular stretch. Maybe sending out the police with speed traps and imposing a bunch of fines to "fix" the issue, or put some rouge on that pig, why not.

Someone · 2025-04-14T07:07:27 1744614447

> Rate limit and monitor your own usage. You should anyway. It's not the road's job to make you follow the speed limit.

In some sense, urban planners do design roads to make you follow the speed limit. https://en.wikipedia.org/wiki/Traffic_calming:

“Traffic calming uses physical design and other measures to improve safety for motorists, car drivers, pedestrians and cyclists. It has become a tool to combat speeding and other unsafe behaviours of drivers”

reliabilityguy · 2025-04-14T09:44:19 1744623859

> It's not the road's job to make you follow the speed limit.

Good road design makes it impossible to speed.