More

DoctorOetker · 2026-03-20T13:57:54 1774015074

The dataset climbmix 400b looks like it is 600GB, it would be neat if someone could host this in compressed form, given that LLM can be used to compress, even having a small LLM compress it would perform better than classical compression algorithms, why is this approach not used within the ML community?

Or is it the "anyone who means anything in the field, has access to high bandwidth anyway"?

DoctorOetker · 2026-03-20T13:41:40 1774014100

I wonder about the following:

To calculate an gradient step, in practice one doesn't accumulate the gradient for the full corpus, but updates the weights on mini-batches.

Suppose one runs conventional gradient descent on minibatches multiple times with different starting seeds, and then considers a set of pre-trained models M_i

From a random starting point we thus have an idea of the desired end-region in weightspace (lets say a gaussian cloud fit to the final M_i's).

Then it seems like one could score update strategies by how much a single iteration has approached the gaussian cloud, by scoring just the approach on a number of minibatches or just a few update iterations. Instead of searching update strategy space by waiting until pretraining has finished for each candidate update strategy. Only the candidate strategies that perform well enough on 1 or a few iterations would be considered worthy of further consideration, those that pass (a lower number of candidates) are then inspected for approach to the gaussian target after another round of iterations etc.

It seems like it should be possible to optimize the optimization iteration loop, by running it just once for many candidates and observing their convergence to the known desired end region.

DoctorOetker · 2026-03-20T13:31:18 1774013478

> He is a researcher who understands neural networks and their architectures exceptionally well. That is all.

And that is precisely why he is more qualified on the subject than your average vibe coder!

DoctorOetker · 2026-03-18T08:32:24 1773822744

This probability is drawing proportional to organism counts, instead of brain cell counts.

Would it not be more representative if the weighting included neuron counts?

In a sense I ascribe to the belief of such a lottery, except that we are all the same "I", we just alternatively wake up as physics evaluating the progress for this or that electron, proton etc in this or that rock or neuron and progressing the state indeterministically according to the rules of physics.

Our identity is a pragmatic illusion (just like the illusion that water is a continuus medium, is a pragmatic one, as it helps summarize the behavior of water).

Imagine an amnesiac elder in an elderly home, still knows the rules of chess, but can't form long term memories any more: its his turn, and he's playing black, there is a small notebook with his plans and strategies, jotted down during the earlier turns, he makes some notes and then a move.

The caretakers turn around the chess board, swap the black notebook with the white notebook and leave the amnesiac bewildered for a few minutes. Then he reads his earlier notes in the white notebook, deliberates his options and makes a move, with a white piece.

The caretakers turn the chess board around again.

This is physics, and the "player" is you, me, everyone, and we are physics.

The notebook is the state of your brain, and your move is indeterminate physics (with deterministic probabilities) evolving the state of the universe.

Does identity exist: yes! as a pragmatic summary, even natural selection latched onto this illusion out of necessity.

Weighting by neurons will be more representative, of universal experience in the earthly biosphere.

threatofrain · 2026-03-18T14:38:22 1773844702

Identity is statistical propensity… does the framing of illusion actually do anything here?

DoctorOetker · 2026-03-18T18:33:59 1773858839

basically one could ask oneself:

given that our brain is composed of many neurons

and given that a company is composed of many employees

or a nation state composed of many agents;

why then is our subjective experience (which we can not prove to others, but of which most of us are convinced that everyone has) such that I perceive my environment at the level of a single brain, and not at the level of a single neuron and not at the level of a nation state aware of all the state secrets etc.

Why don't I subjectively experience as if I were a single neuron, with neighbor neurons in this brain?

Why don't I perceive as if I am a nation state?

Natural selection feeds back at the level of a genome, so it has evolved to optimize information transfer primarily within a single organism, not constrained within a single neuron, and more private than sharing all knowledge across brains.

To another extent one could say it's an illusion due to historically biological feedback, but phrased differently due to a lack of technology to clone mental states, pause them, fork, rewind to an earlier state etc. Once technology becomes capable of preserving, digitizing and emulating brain states, this concept of identity will blur, 2 forked instances of the same mental state would remember the same PIN code and other credentials. It will become possible to merge (say with consent) 2 digitized brain states, smoothly by adding connections between the neurons of one and the other, increasing data bandwidth, and making memories a shared concept. 2 mental states could talk over a low bandwidth for ages to communicate what things they have observed during a separation (say spies catching up), or they could do this near instantaneously by merging their mental state, drastically increasing survival rates because situational awareness can advance immediately.

a digital mental state could encounter a fork in a road, decide to fork into A and B, each explore one leg of the fork, and then meet up again and merge the experiences.

DoctorOetker · 2026-03-18T08:21:14 1773822074

Extra terrestrial propagation of life, if real would have evolved to have non-zero survival rates in interstellar radiation regimes and timescales.

The fragility of life-as-we-know-it that has undergone serial passage in an environment largely shielded from radiation, is not necessarily representative of putative life-forms carried by little rocks in space.

I am neither convinced for nor against the idea that life may have been carried over by interstellar rocks: on the one hand, its a major promiscuity between celestial bodies within star systems, galaxies, etc. on the other hand since we haven't discovered other life forms yet we have no idea on the missing probability densities of life in the bulk of the universe, so the Bayesian catapult can swing either way, we just lack the data for now.

DoctorOetker · 2026-03-18T07:59:55 1773820795

> This isn’t cowardice. It’s a calculation: If allied leaders thought that their sacrifice might count for something in Washington, they might choose differently. But most of them have stopped trying to find the hidden logic behind Trump’s actions, and they understand that any contribution they make will count for nothing. A few days or weeks later, Trump will not even remember that it happened.

What a cynical take: as if the life-purpose of the allies in NATO is to be remembered by a US president. Disintegrating the nuclear and ballistic sites and development sites should be plenty of motivation on its own, its buying a safer future, for the West, for the local neighborhood (and sure, they will have to bite through the death spasms and predelegated attack for a while).

DoctorOetker · 2026-03-18T07:36:00 1773819360

A non-physicist opines that Iran posed 'no imminent threat', the timescales for nuclear weapons and the timescales of a thug pulling a knife on you in the street are enormously different. A non-physicist doesn't fully comprehend the threat and the imminence of this threat which Iran posed, and resigns? That's a win-win: he believes he did the right thing, while the system continues to deconstruct Iran's nuclear and ballistic development facilities.

DoctorOetker · 2026-03-17T04:15:28 1773720928

This is the sanest reply on the thread.

During WW2 when Britain captured all incoming german spies and ran a fake german spy network, they could redirect the german V1 or V2 bombs, by misreporting what they did or didn't hit.

Allowing bets on acts of violence allows the perpetrators to assess their success rate...

DoctorOetker · 2026-03-15T10:03:40 1773569020

On the educational-financial feedback structure of Intelliland.

The recent discovery of a long separated human civilization on Intelliland, and their advanced technologies has shocked the whole world, we assumed every square mile was accounted for since the surface of the globe is essentially a sphere, entirely mapped by satellites, but to avoid repeating what everyone already knows first from their own announcement (shared in all spoken languages), and then repeated at nauseam in the news.

We will ignore their motives for staying invisible for so long and instead focus on just one aspect of their civilization: education. In the dataset about their society (their full history, politics, science, law, ...) we have chosen to focus on their education system, precisely because we are all dazzled by their superior technologies. How did they pull this off with such a low population count? Why did we fail achieving similar scientific and technological growth rates in what is now commonly referred to as the "outer dumbosphere" ?

Given the huge size of their documentation of their educational system, we decided to focus on just a single aspect: curriculum determination.

How did the inhabitants of Intelliland decide what information to teach to the next generation, as well as each other?

It turns out that they didn't. You read this right, we also couldn't believe it, but it's true, they didn't. There was no committee, no dictator to decide what the next generation should be indoctrinated with, but also no democratic system to decide what to teach the next generation (and indeed each other, which will become clear to the reader).

Their educational system uses targets so simple, and a mechanism for valuing such targets displayed such minimalism it boggles the mind.

So how did they decide on what factoids to include in their curricula? To answer that question, we first have to stress they didn't use curricula in the sense that we understand it. But they did use the concept of statements, or factoids as we might call them in the outer dumbosphere.

Each factoid was assigned an income value: as long as you could prove you could reproduce the factoid on their proctoring systems (basically computers in rooms where one isn't allowed to bring in cheat sheets etc.) then that income value would be received by the citizen, on top of whatever they earn otherwise.

An example might be "commutativity of addition: a + b = b + a" any citizen capable of reproducing it might receive say 10 cents that month (I made up this example and value for demonstrative purpouses).

All knowledge (true or false, factual or fictional) could be entered and assigned some income value. There were no curricula (a note on the combining coefficients later) per se.

The reader, like us, will certainly not be impressed by this explanation: it sounds like a philosophical definition trick, what does one call a "curriculum", and who sets the values?

Again, it turns out no committee sets the income values, nor the values for the combining coefficients.

Every citizen is encouraged to report which statements they know (they can select of indicate those claims at home and schedule the checking for the next time they make time to go to a proctoring facility). Of course they will select the higher value statements in their knowledge, so commutativity of addition is seldom demonstrated in the proctoring facilities (unless one is still just learning as a child).

There is thus an incentive to report your knowledge.

The citizens have historically organized sometimes as a mixture of private entrepreneurs, government employees or private sector employees, in different proportions, but we stayed focused on the "curriculum selection feedback mechanism" so we refer the curious reader to the Intelliland documentation archives if they wish to learn more.

Some citizens earn more than other citizens, and nearly all turn out to work in the knowledge economy, given that their society is highly automated. Working under the assumption that statistically a smarter person will earn money faster, a strong direct feedback is that whatever high-income earners provably understand has a higher likelihood of helping achieve higher income. To translate it more vulgarly: a safety engineer will earn a lot more than a quack shaman, and thus the beliefs of the safety engineer enjoy higher associated rewards than the beliefs of the quack shaman. This encourages the next generation, but also the quack shaman to learn and demonstrate similar knowledge as the safety engineer in order to enjoy more income. This is the direct feedback term. It is important to stress that for calculating the value of a factoid, one doesn't correlate with a citizens total income: part on applying skills and part on proving knowledge; to calculate the value of skills one correlates with the income earned by work, not by study, to prevent open loop amplification of factoid values.

There is also an indirect feedback term: since their society tracks the proclaimed most valuable beliefs, and since their society tracks each financial transaction it can backpropagate this value signal instantly (somehow both the belief testing as well as the financial transactions are kept private but still accountable through the use of cryptography, but that was above the feeble mind of the authors of this text...)

As another artificial example for us in the outer dumbosphere: suppose you book a flight with a certain company, then their society knows how much fuel the company buys, what people worked in that fuel industry and what their knowledge was, so every time anyone books a flight the income value for a lot of chemistry factoids is automatically increased. But also metallurgy factoids since the company bought that plane. And Navier-Stokes equations since the plane manufacturer hires physicists and engineers, and so on. The government doesn't need to know the price of the fuel per liter, or the price of airplanes per piece, it just sees money flowed from the company to people with certain provable knowledge of factoids, and the value of such factoids is thus increased according to usage measured in expense rates.

When the system was first to be attempted in Intelliland there was a lot of fear of economists and other elite positions negatively affecting the previously manually selected curricula. Some pointed out that it didn't matter since that would self-correct: to the extent elites were effectively freeloading on the lower classes, the freeloading tricks, and financial parlance and euphemisms surrounding it would become decentralized public knowledge and then their democracy would start closing all the loopholes in their system essentially overnight, and that is precisely what happened: the population saw how simple and manipulative the games of the elites were and just closed those loopholes. And from then on genuine factoids started rising in value so quick even the ex-elites weren't bitter: although they had to get accustomed to life-long study, this actually gave them more self-confidence and their perceived need to freeload on top of others diminished as their new gained knowledge gave them confidence in life.

Just like one can measure the correlation between citizens knowledge of a factoid with their income, one can correlate combinations of knowledge with their income: reproducing the Navier Stokes equation without understanding the continuity equation makes no sense, so it is important to also automagically determine the values for combination coefficients, and they are actually computed in exactly the same way, even though the formulas look a bit more complicated.

A crucial consequence for explaining their absurd scientific and technological growth rates is that the financial feedback is not just highly reflective of value to society, it is also nearly instantaneous: here in the outer dumbosphere, if an engineer or physicist (or a group of them) had a crucial insight, then as soon as this factoid was capitalized, its income value would rise, and students could immediately learn from the state of the art. No waiting for generations until the knowledge has entered textbooks, and then waiting until professors hopefully acknowledge the importance and value, and so on. In the outer dumbosphere it could take decades, sometimes centuries for breakthroughs to end up in widespread study materials; in the outer dumbosphere breakthroughs lingered for a long time until enough similar breakthroughs accumulated for a "paradigm shift" to occur, whereas in Intelliland, the breakthrough propagated near instantaneously. There is also another component in the value determination algorithm: to prevent new insights from remaining privately held beliefs in order to maintain a monopoly on a certain insight, there is a high reward for factoids that eventually rise in value, these coefficients are so large that there is no incentive to hold back this new insight. This component is rewarded a bit later after the fact: the factoid needs to prove its scalable economic value over time before the contributor notices this significant reward.

So it turns out there is no need for committee's deciding curricula, like guardians of the sacred truth, there is no need for experts to assess the value of each factoid. All You Need Is Correlation.

In the next installment we shall discuss how they arrived at this system, and their form of democracy employed: what Intelliland inhabitants call informed democracy.

DoctorOetker · 2026-03-15T08:52:35 1773564755

what prevents digital holography on DVD writables from performing such computations optically, even if less efficient?

imagine each layer in the computation consisting of a DVD + a number of (embedding dimension) light sensors and light sources (or perhaps OPA / external cavity laser setups);

instead of N light sources it could be 1 light source and a ferroelectric FLCOS display like the cheap 320 x 240 monochrome high refresh rate displays in the cheap toy projectors from the past

https://github.com/ElectronAsh/FLCOS-Mini-Projector-ESP32

it doesn't sound too crazy and could permit a low entry cost to a bulky and probably less energy efficient setup, but with updates models you could just burn a new hologram on a fresh DVD

and people wouldn't be tied to advanced semiconductor manufacture.