Hacker Newsnew | past | comments | ask | show | jobs | submit | dTal's commentslogin

I love this gorgeous and evocative little time waster and come back to it every now and then. Notes:

It starts out buttery smooth but over time its performance slows to a crawl. Changing window geometry seems to do some sort of garbage collection and it speeds back up. I just hit F11 twice real quick.

The optimal strategy is to try and make the trip parabolically with a single large burn at liftoff.

Gravity physics is of course symmetrical on ascent and descent, so the optimum time to start your deceleration burn is approximately when your downward velocity is equal to whatever your upward velocity was when you stopped burning.


The "car-like handling" is still physically accurate - thrusters automatically align your velocity vector to match your view direction. You can think of it as simply an interface - view direction is both a command and a display.

Sort of. They are deterministic in the same way that flipping a coin is deterministic - predictable in principle, in practice too chaotic. Yes, you get the same predicted token every time for a given context. But why that token and not a different one? Too many factors to reliably abstract.

It always feels like I just have to figure out and type the correct magical incantation, and that will finally make LLMs behave deterministically. Like, I have to get the right combination of IMPORTANT, ALWAYS, DON'T DEVIATE, CAREFUL, THOROUGH and suddenly this thing will behave like an actual computer program and not a distracted intern.

>Yes, you get the same predicted token every time for a given context. But why that token and not a different one? Too many factors to reliably abstract.

Fixed input-to-output mapping is determinism. Prompt instability is not determinism by any definition of this word. Too many people confuse the two for some reason. Also, determinism is a pretty niche thing that is only necessary for reproducibility, and prompt instability/unpredictability is irrelevant for practical usage, for the same reason as in humans - if the model or human misunderstands the input, you keep correcting the result until it's right by your criteria. You never need to reroll the result, so you never see the stochastic side of the LLMs.


Like the brain

Nobody said anything about Europeans having a "natural right". Bad enough to derail a conversation with irrelevant political nitpicking, unforgiveable to use a strawman to do so. Boo.

It's not irrelevant.

GP made a comparison between what we're going through and the Industrial Revolution. Ignoring the negatives of that revolution - like by acting as though the "new world" was uninhabited/unused and so Europeans had a right to its resources - seems like a bad idea.


> like by acting as though the "new world" was uninhabited/unused and so Europeans had a right to its resources - seems like a bad idea.

maybe it was a bad idea, but that's what happened.


I'm afraid you are misremembering. The movie is explicitly eugenicist. The people of the future are explicitly biologically stupid. The opening transcript is unambiguous:

[Man Narrating] As the 21st century began… human evolution was at a turning point.

Natural selection, the process by which the strongest, the smartest… the fastest reproduced in greater numbers than the rest… a process which had once favored the noblest traits of man… now began to favor different traits.

[Reporter] The Joey Buttafuoco case-

Most science fiction of the day predicted a future that was more civilized… and more intelligent.

But as time went on, things seemed to be heading in the opposite direction.

A dumbing down.

How did this happen?

Evolution does not necessarily reward intelligence.

With no natural predators to thin the herd… it began to simply reward those who reproduced the most… and left the intelligent to become an endangered species.


What is "explicitly eugenicist" in observing that the unprecedented way mankind has dominated its environment has changed the selection pressures we are subject to?

My quest to survive to adulthood and pass on my genes looked nothing like the gauntlet an Homo erectus specimen would have run.


Hmm... this sounds a lot like the old RISC vs CISC argument all over again. RISC won because simplicity scales better and you can always define complex instructions in terms of simple ones. So while I would relish experiencing the timeline in which our computerized chums bootstrap into sentience through the judicious application of carefully selected and highly nuanced words, it's playing out the other way: LLMs doing a lot of 'thinking' using a small curated set of simple and orthogonal concepts.

RISC good. CISC bad. But CISC tribe sneaky — hide RISC inside. Look CISC outside, think RISC inside. Trick work long time.

Then ARM come. ARM very RISC. ARM go in phone. ARM go in tablet. ARM go everywhere. Apple make ARM chip, beat x86 with big club. Many impressed. Now ARM take server too. x86 tribe scared.

RISC-V new baby RISC. Free for all. Many tribe use. Watch this one.

RISC win brain fight. x86 survive by lying. ARM win world.


RISC tribe also sneaky. Hide CISC inside.

The LLM has no accessible state beyond its own output tokens; each pass generates a single token and does not otherwise communicate with subsequent passes. Therefore all information calculated in a pass must be encoded into the entropy of the output token. If the only output of a thinking pass is a dumb filler word with hardly any entropy, then all the thinking for that filler word is forgotten and cannot be reconstructed.

Yeah but not all tokens are created equal. Some tokens are hard to predict and thus encode useful information; some are highly predictable and therefore don't. Spending an entire forward pass through the token-generation machine just to generate a very low-entropy token like "is" is wasteful. The LLM doesn't get to "remember" that thinking, it just gets to see a trivial grammar-filling token that a very dumb LLM could just as easily have made. They aren't stenographically hiding useful computation state in words like "the" and "and".

>They aren't stenographically hiding useful computation state in words like "the" and "and".

When producing a token the model doesn't just emit the final token but you also have the entire hidden states from previous attention blocks. These hidden states are mixed into the attention block of future tokens (so even though LLMs are autoregressive where a token attends to previous tokens, in terms of a computational graph this means that the hidden states of previous tokens are passed forward and used to compute hidden states of future tokens).

So no it's not wasteful, those low-perplexity tokens are precisely spots that can instead be used to do plan ahead and do useful computation.

Also I would not be sure that even the output tokens are purely "filler". If you look at raw COT, they often have patterns like "but wait!" that are emitted by the model at crucial pivot points. Who's to say that the "you're absolutely right" doesn't serve some other similar purpose of forcing the model into one direction of adjusting its priors.


Huh okay, there was a major gap in my mental model. Thanks for helping to clear it up.

Well to be fair the fact that they "can" doesn't mean models necessarily do it. You'd need some interp research to see if they actually do meaningfully "do other computations" when processing low perplexity tokens. But the fact that by the computational graph the architecture should be capable of it, means that _not_ doing this is leaving loss on the table, so hopefully optimizer would force it to learn to so.

> They aren't stenographically hiding useful computation state in words like "the" and "and".

Do you know that is true? These aren’t just tokens, they’re tokens with specific position encodings preceded by specific context. The position as a whole is a lot richer than you make it out to be. I think this is probably an unanswered empirical question, unless you’ve read otherwise.


I am quite certain.

The output is "just tokens"; the "position encodings" and "context" are inputs to the LLM function, not outputs. The information that a token can carry is bounded by the entropy of that token. A highly predictable token (given the context) simply can't communicate anything.

Again: if a tiny language model or even a basic markov model would also predict the same token, it's a safe bet it doesn't encode any useful thinking when the big model spits it out.


I just don’t share your certainty. You may or may not be right, but if there isn’t a result showing this, then I’m not going to assume it.

> stenographically hiding steganographically*

can you prove this?

train an LLM to leave out the filler words, and see it get the same performance at a lower cost? or do it at token selection time?


Low entropy is low entropy. You can prove it by viewing the logits of the output stream. The LLM itself will tell you how much information is encoded in each token.

Or if you prefer, here's a Galilean thought experiment: gin up a script to get a large language model and a tiny language model to predict the next token in parallel; when they disagree, append the token generated by the large model. Clearly the large model will not care that the "easy" tokens were generated by a different model - how could it even know? Same token, same result. And you will find that the tokens that they agree on are, naturally, the filler words.

To be clear, this observation merely debunks the idea that filler words encode useful information, that they give the LLM "room to think". It doesn't directly imply that an LLM that omits filler words can be just as smart, or that such a thing is trivial to make. It could be that highly predictable words are still important to thought in some way. It could be that they're only important because it's difficult to copy the substance of human thought without also capturing the style. But we can be very sure that what they aren't doing is "storing useful intermediate results".


You don't need to compile it yourself though? Unless you want CUDA support on Linux I guess, dunno why you'd need such a silly thing though:

https://github.com/ggml-org/llama.cpp/releases


> dunno why you'd need such a silly thing though

I'm not sure I follow, what alternative to CUDA on Linux offers similar performance?


>Consider that if ending a relationship causes noticeable problems to external observers, it’s almost by definition because you were in it “too long”. That is you developed a strong attachment, shared assets, or had kids with what was in hindsight obviously the wrong person.

Reducing it to "right person / wrong person" is a very narrow viewpoint. People can change in unpredictable ways, including yourself. Relationships end - or continue - for so many reasons, both emotional and pragmatic. It's simply too reductive to say that if a relationship causes pain when it ends, there was necessarily some sort of mistake. It could even be that the pain is a price to pay for a life experience that you'd be worse off for not having...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: