One of the big problems with Attention Mechanisms is that the Query needs to look over every single key, which for long contexts becomes very expensive.
A little side project I've been working on is to train a model that sits on top of the LLM, looks at each key and determines whether it's needed after a certain lifespan, and evicts it if possible (after the lifespan is expired). Still working on it, but my first pass test has a reduction of 90% of the keys!
In general prediction markets can’t be “correct” or “incorrect” - for instance if a prediction market says there’s a 60% chance of an event occurring, and it doesn’t occur, was the market right or wrong? Well it’s hard to say - certainly the market said the event was more likely to occur than not, but only just, and who knows? Maybe the event _only just_ occurred, and very nearly didn’t!
So generally we say a prediction market is “correct” if it is “well calibrated”, which is to say that if we took all the events that the market said had a 60% chance of occurring, then approximately 60% percent of these events occurred (with the same holding true for all other percentages).
On this note, an interesting phenomenon that used to occur was “favorite-longshot bias”, where markets would consistently overestimate the likelihood of longshot events occurring - so events that the market predicted would occur 10% of the time would only occur 5% of the time. What’s fascinating is that once people realized that this bias exited, they began to exploit it by making bets against longshots, which had the effect of moving the market and removing the biases, making the markets well calibrated. It’s a pretty neat example of the efficient market hypothesis in action!
Some of the longshot biases still exists and can't be removed due to technical constraints on the platforms. A lot of times there is a minimum contract price, which effectively means the probability of unlikely events cannot be modeled as lower than 1% or 0.1% or whatever. But there are contracts for events much less likely than that.
There are also issues with the time value of money for long-shot events. Someone has to be willing to buy a share of "No", and if that works out to a return lower than the risk-free rate (eg. buying t-bills) there will be no incentive to take the "No" position. That makes anything roughly under 3-4% per year pretty unreliable.
> for instance if a prediction market says there’s a 60% chance of an event occurring, and it doesn’t occur, was the market right or wrong? Well it’s hard to say - certainly the market said the event was more likely to occur than not, but only just, and who knows? Maybe the event _only just_ occurred, and very nearly didn’t!
For most events like this, you'd want to see the market spike to 0% or 100% as the deadline approached. And in particular for an event that happens, you want to see the spike to 100% before it happens. Remaining at 60% until after the fact is wrong because the occurrence of the event becomes more certain as it gets closer.
Being "well-calibrated" as you describe is a very bad quality metric in the sense that two sets of predictions can achieve the same calibration profile while differing markedly in quality. The farther the predictions are from 50%, the better they are, but your calibration metric doesn't take this into account.
The issue there is time. The Nobel prizes will be announced in around 9 months. Buying a share of "No" would currently cost 98.2 cents, working out to a (basically) risk-free return of around 2.4%. Alternatively someone who wants a very low-risk investment product could just buy 1-year t-bills with a return of... ~3.5%. And that doesn't require messing around with buying crypto and the inherent risk of trusting Polymarket with your money.
Anything under 3%/year of time until decision is going to have pretty limited predictive value within that range. Anything starting above that range will end up hitting that floor rather than going to zero because of the difficulty of finding a counterparty.
While Polymarket does offer holding rewards interest, it looks like it doesn't for this particular market.
That doesn't mean there aren't other explanations. It could mean that No holders expect to incur an opportunity cost greater than the risk free rate. Combine that with how there's low liquidity (there's less than $300 on the book buying Yes, and at 2 cents or less), and so we could just be seeing the effect of random fish temporarily distorting the price. It could also mean that the risk of a smart contract failing is making it not worth the hassle for a market maker to come in at such a slim margin and low volume.
They're offering interest on roughly a dozen hand-picked markets, according to their documentation. (I wasn't aware of that, so I stand corrected on the general assertion that they never do.)
> That doesn't mean there aren't other explanations.
Why do you need other explanations, when the observed probability can be precisely and fully explained by opportunity cost?
I don't have to "need" other explanations in order for them to exist. The current price does happen to accurately reflect what the risk free rate would imply. But look at the graph history: it hovered around 1% for a large chunk of December.
How much volume on this bet? Let's ignore black swan events and say it's a guaranteed 3% return. On how much? $1? $10? $1m?
I'd weigh the accuracy by how much money is at stake...
Even then, a "perfect" prediction market need not be accurate, if people use it for hedging. If some low probability event is really bad for me, I may pay over odds (pushing the implied probability up) to get paid if it happens. The equilibrium probability may be efficient, reasonable and biased.
Well, Nobel peace prizes aren't usually awarded to people calling for invasions of their home country either, or cheering for the extrajudicial double-tap killing of smugglers/random fishermen.
Who's to say a dead person can't have done the most to "promote peace conferences" as mentioned in Nobel's will? These days, I'd say dead people make a larger net contribution to peace than most politicians.
To be fair you’re not really providing a hard stance against the estimate. You say it is unlikely, and indeed the prediction is a 3% chance. That’s unlikely.
Well markets are evaluated on a number of different metrics depending on what you’re trying to determine.
If you want to go be pedantic about it and select one metric, markets are evaluated on their Brier Score or some other Proper Scoring Rule, not accuracy.
However, I prefer calibration as a high level way to explain prediction market performance to people, as it’s more intuitive.
Yeah it's a good way to introduce the idea. But I don't think someone would really grasp it until they understand why both calibration and "discrimination" are necessary in determining if a prediction market is accurate.
I suspect that you are arguing semantics, where parent and grandparent focus on the nuance of what is ACTUALLY being measured. I am saying it like this, because while I never used prediction markets, I briefly looked into them to see if I could use them well. The question of accuracy came up, which is why I happen to align with posters above.
Noob question from me: what’s the difference between accuracy and calibration? A well calibrated market would be more accurate and vice versa, not?
Edit: just found the answer myself: “accuracy measures the percentage of correct predictions out of total predictions, while calibration assesses whether a prediction market's assigned probabilities align with the actual observed frequency of those outcomes”
Suppose there are 1000 events and 500 will have outcome A and 500 will have outcome B. If you predict a 50% chance of A for every event you'll be perfectly calibrated. On the other hand, if you predict a 90% chance of a certain outcome and you're right for 800 events, you're not perfectly calibrated but you have a lower Brier score (lower is better).
A forecaster can be calibrated but almost only assign probabilities in the 40--60 % range. This is not as ueful as one assiging calibrated probabilities in the full range.
We try to measure the increased usefulness of the latter with proper scoring rules.
I know one anecdote is not data, but his investment in BYD all the way back in 2008 does counter that viewpoint somewhat - his investment success in the BYD case isn’t from other investors following him in, it’s from him identifying BYD as a successful company far before any other major investors did.
Overly specific LLM research into KV cache eviction.
The vast majority of tokens in a sequence will be irrelevant to an attention mechanism outside of a very small window.
Right now however we tend to either keep all cache values forever, or dump them all once they hit a certain age.
My theory is that you can train model to look at the key vectors and from that information alone work out how long to keep a the token in the cache for. Results so far look promising and it’s easy to add after the fact without retraining the core model itself.
I made a tool for this! It's an essay writing platform that tracks the edits and keystrokes rather than the final output, so its AI detection accuracy is _much_ higher than other tools:
https://collie.ink/
I've been exploring this concept in LLMs for the last week or so, to see if I can RL train one into being inherently curious.
I haven't got any beyond my own working notes and some basic plots, but I've unceremoniously dumped them into a document here incase anyone else finds them interesting. If so I'd _love_ to chat with you. enjeyw @ google's email provder.
I mean I kind of get it - overgeneralising (and projecting my own feelings), but I think HN favours introducing and discussing foundational concepts over things that are closer to memorising/wrote-learning. I think AI Math vs Leetcode broadly fits into that category.
I have no reasonable theory as to how Trump/RFK will be able to reveal credible information about Autism that wasn’t already available from public research papers.
About 10 years ago I became more aware that reducing my consumption of meat was good for the world. The was good for Beyond Meat’s prospects.
About 5 years ago I became more aware that reducing my consumption of ultra processed food was good for me. This was very bad for Beyond Meat’s prospects.
A little side project I've been working on is to train a model that sits on top of the LLM, looks at each key and determines whether it's needed after a certain lifespan, and evicts it if possible (after the lifespan is expired). Still working on it, but my first pass test has a reduction of 90% of the keys!
https://github.com/enjeyw/smartkv