More

Grosvenor · 2025-12-14T00:30:29 1765672229

Worthwhile for the hotel and book recommendations.

Obviously he has better food taste than I do, so those too. I will shit like a mink and love it.

Grosvenor · 2025-12-13T21:11:35 1765660295

What he's getting at is single level storage. Ram isn't used for loading data & working on it. Ram is cache. The size of your disk defines the "size" of your system.

This existed in Lisp and Smalltalk systems. Since there's no disk/running program split you don't have to serialize your data. You just pass around Lisp sexprs or Smalltalk code/ASTs. No more sucking your data from Postgres over a straw, or between microservices, or ...

These systems are magnitudes smaller and simpler than what we've built today. I'd love to see them exist again.

Grosvenor · 2025-12-11T16:27:05 1765470425

And we have those French/English text corpora in the form of Canadian law. All laws in Canada at the federal level are written in English and French.

This was used to build the first modern language translation systems, testing them going from English->french->english. And in reverse.

You could do similar here , understanding that your language is quite stilted legalese.

Edit: there might be other countries with similar rules in place that you could source test data from as well.

adamzwasserman · 2025-12-11T17:13:56 1765473236

Incredibly, I had not thought to use that data set.

Now I will. Thanks.

seszett · 2025-12-11T19:24:49 1765481089

Belgian federal law is also written in Dutch, French and German, by the way.

But no English so you might not be interested.

Grosvenor · 2025-12-04T19:48:25 1764877705

Could this generate pressure to produce less memory hungry models?

hodgehog11 · 2025-12-04T20:07:48 1764878868

There has always been pressure to do so, but there are fundamental bottlenecks in performance when it comes to model size.

What I can think of is that there may be a push toward training for exclusively search-based rewards so that the model isn't required to compress a large proportion of the internet into their weights. But this is likely to be much slower and come with initial performance costs that frontier model developers will not want to incur.

jiggawatts · 2025-12-04T21:03:22 1764882202

> exclusively search-based rewards so that the model isn't required to compress a large proportion of the internet into their weights.

That just gave me an idea! I wonder how useful (and for what) a model would be if it was trained using a two-phase approach:

1) Put the training data through an embedding model to create a giant vector index of the entire Internet.

2) Train a transformer LLM but instead only utilising its weights, it can also do lookups against the index.

Its like a MoE where one (or more) of the experts is a fuzzy google search.

The best thing is that adding up-to-date knowledge won’t require retraining the entire model!

Grosvenor · 2025-12-04T20:18:08 1764879488

Yeah that was my unspoken assumption. The pressure here results in an entirely different approach or model architecture.

If openAI is spending $500B then someone can get ahead by spending $1B which improves the model by >0.2%

I bet there's a group or three that could improve results a lot more than 0.2% with $1B.

parineum · 2025-12-04T20:22:54 1764879774

> so that the model isn't required to compress a large proportion of the internet into their weights.

The knowledge compressed into an LLM is a byproduct of training, not a goal. Training on internet data teaches the model to talk at all. The knowledge and ability to speak are intertwined.

thisrobot · 2025-12-04T20:44:54 1764881094

I wonder if this maintains the natural language capabilities which are what LLM's magic to me. There is a probably some middle ground, but not having to know what expressions, or idiomatic speech an LLM will understand is really powerful from a user experience point of view.

UncleOxidant · 2025-12-04T20:17:26 1764879446

Or maybe models that are much more task-focused? Like models that are trained on just math & coding?

agoodusername63 · 2025-12-04T22:56:32 1764888992

isn't that what the mixture of experts trick that all the big players do is? Bunch of smaller, tightly focused models

irthomasthomas · 2025-12-05T16:57:11 1764953831

Not exactly. MoE uses a router model to select a subset of layers per token. This makes them faster but still requires the same amount of RAM.

lofaszvanitt · 2025-12-04T20:12:33 1764879153

Of course and then watch those companies reined in.

Grosvenor · 2025-12-04T16:57:06 1764867426

I’ll counter with the jaguar xk engine in production for 43 years.

https://en.wikipedia.org/wiki/Jaguar_XK_engine

I assume the American s will be by with a pushrod v8 soon.

whaleofatw2022 · 2025-12-04T19:55:04 1764878104

Actually I'll give a V6 instead:

https://en.wikipedia.org/wiki/Buick_V6_engine

1961-2008.

Grosvenor · 2025-12-02T19:05:39 1764702339

I love browsing who is hiring. Sometimes you come across companies like this (and others) that you just hope do well.

Grosvenor · 2025-12-01T20:59:25 1764622765

German speakers usually have very good English, but this is already one of their tells.

Fixed that for you.

Grosvenor · 2025-12-01T18:24:00 1764613440

            .-~~\
           /     \ _
           ~x   .-~_)_
             ~x".-~   ~-.
         _   ( /         \   _
         ||   T  o  o     Y  ||
       ==:l   l   <       !  I;==
          \\   \  .__/   /  //
           \\ ,r"-,___.-'r.//
            }^ \.( )   _.'//.
           /    }~Xi--~  //  \
          Y    Y I\ \    "    Y
          |    | |o\ \        |
          |    l_l  Y T       | 
          l      "o l_j       !
           \                 /
    ___,.---^.     o       .^---.._____

"~~~ " ~ ~~~"

SEEKING WORK - Data scientist, consulting & fractional leadership, US/remote worldwide, email in profile.

All I want for christmas is some gnarly problems to chew on, otherwise it's coal for Christmas.

I'm a data scientist with 20+ years experience who enjoys gnarly, avant-garde problems. I saved a well known German automaker from lemon law recalls. I've worked with a major cloud vendor to predict when servers would fail, allowing them to load shed in time.

Some of the things I've done:

    - Live chip counting for estmating betting in casinos.
    - Automotive part failure prediction (Lemon law recalls)
    - Server fleet failure prediction allowing load shedding.
    - Shipping piracy risk prediction - routing ships away from danger. 
    - Oil reservoir & well engineering forecasting production.
    - Realtime routing (CVRP-PD-TW, shifts) for on demand delivery. 
    - Legal entity and contract term extraction from documents. 
    - Wound identification & tissue classification.
    - The usual LLM and agent stuff. (I'd love to work on effective executive functioning)
    - Your nasty problem here.

I use the normal stacks you'd expect. Python, Pytorch, Spark/ray, Jupyter/Merimo, AWS, Postgres, Mathematica and whatever else is needed to get the job done. Ultimately it's about the problem, not the tools.

I have years of experience helping companies plan, prototype, and productionize sane data science solutions. Get in touch if you have a problem, my email is in my profile.

Grosvenor · 2025-12-01T00:47:40 1764550060

> Ideally, that thing should be related to your major, like programming competitions for CS. You need an accomplishment you bring to the table that no other applicant exceeds.

I wonder how Linus Torvalds would do in programming competition.

Grosvenor · 2025-11-28T20:40:28 1764362428

I think it not only varies by the words, but also the movie genre/director.

Every Christopher Nolan movie would need every word in boldface. Can you even hear (or understand) what is happening in Tenet?