Hacker Newsnew | past | comments | ask | show | jobs | submit | Grosvenor's commentslogin

Worthwhile for the hotel and book recommendations.

Obviously he has better food taste than I do, so those too. I will shit like a mink and love it.


What he's getting at is single level storage. Ram isn't used for loading data & working on it. Ram is cache. The size of your disk defines the "size" of your system.

This existed in Lisp and Smalltalk systems. Since there's no disk/running program split you don't have to serialize your data. You just pass around Lisp sexprs or Smalltalk code/ASTs. No more sucking your data from Postgres over a straw, or between microservices, or ...

These systems are magnitudes smaller and simpler than what we've built today. I'd love to see them exist again.


And we have those French/English text corpora in the form of Canadian law. All laws in Canada at the federal level are written in English and French.

This was used to build the first modern language translation systems, testing them going from English->french->english. And in reverse.

You could do similar here , understanding that your language is quite stilted legalese.

Edit: there might be other countries with similar rules in place that you could source test data from as well.


Incredibly, I had not thought to use that data set.

Now I will. Thanks.


Belgian federal law is also written in Dutch, French and German, by the way.

But no English so you might not be interested.


Could this generate pressure to produce less memory hungry models?

There has always been pressure to do so, but there are fundamental bottlenecks in performance when it comes to model size.

What I can think of is that there may be a push toward training for exclusively search-based rewards so that the model isn't required to compress a large proportion of the internet into their weights. But this is likely to be much slower and come with initial performance costs that frontier model developers will not want to incur.


> exclusively search-based rewards so that the model isn't required to compress a large proportion of the internet into their weights.

That just gave me an idea! I wonder how useful (and for what) a model would be if it was trained using a two-phase approach:

1) Put the training data through an embedding model to create a giant vector index of the entire Internet.

2) Train a transformer LLM but instead only utilising its weights, it can also do lookups against the index.

Its like a MoE where one (or more) of the experts is a fuzzy google search.

The best thing is that adding up-to-date knowledge won’t require retraining the entire model!


Yeah that was my unspoken assumption. The pressure here results in an entirely different approach or model architecture.

If openAI is spending $500B then someone can get ahead by spending $1B which improves the model by >0.2%

I bet there's a group or three that could improve results a lot more than 0.2% with $1B.


> so that the model isn't required to compress a large proportion of the internet into their weights.

The knowledge compressed into an LLM is a byproduct of training, not a goal. Training on internet data teaches the model to talk at all. The knowledge and ability to speak are intertwined.


I wonder if this maintains the natural language capabilities which are what LLM's magic to me. There is a probably some middle ground, but not having to know what expressions, or idiomatic speech an LLM will understand is really powerful from a user experience point of view.

Or maybe models that are much more task-focused? Like models that are trained on just math & coding?

isn't that what the mixture of experts trick that all the big players do is? Bunch of smaller, tightly focused models

Not exactly. MoE uses a router model to select a subset of layers per token. This makes them faster but still requires the same amount of RAM.

Of course and then watch those companies reined in.

I’ll counter with the jaguar xk engine in production for 43 years.

https://en.wikipedia.org/wiki/Jaguar_XK_engine

I assume the American s will be by with a pushrod v8 soon.


Actually I'll give a V6 instead:

https://en.wikipedia.org/wiki/Buick_V6_engine

1961-2008.


I love browsing who is hiring. Sometimes you come across companies like this (and others) that you just hope do well.

German speakers usually have very good English, but this is already one of their tells.

Fixed that for you.


            .-~~\
           /     \ _
           ~x   .-~_)_
             ~x".-~   ~-.
         _   ( /         \   _
         ||   T  o  o     Y  ||
       ==:l   l   <       !  I;==
          \\   \  .__/   /  //
           \\ ,r"-,___.-'r.//
            }^ \.( )   _.'//.
           /    }~Xi--~  //  \
          Y    Y I\ \    "    Y
          |    | |o\ \        |
          |    l_l  Y T       | 
          l      "o l_j       !
           \                 /
    ___,.---^.     o       .^---.._____
"~~~ " ~ ~~~"

SEEKING WORK - Data scientist, consulting & fractional leadership, US/remote worldwide, email in profile.

All I want for christmas is some gnarly problems to chew on, otherwise it's coal for Christmas.

I'm a data scientist with 20+ years experience who enjoys gnarly, avant-garde problems. I saved a well known German automaker from lemon law recalls. I've worked with a major cloud vendor to predict when servers would fail, allowing them to load shed in time.

Some of the things I've done:

    - Live chip counting for estmating betting in casinos.
    - Automotive part failure prediction (Lemon law recalls)
    - Server fleet failure prediction allowing load shedding.
    - Shipping piracy risk prediction - routing ships away from danger. 
    - Oil reservoir & well engineering forecasting production.
    - Realtime routing (CVRP-PD-TW, shifts) for on demand delivery. 
    - Legal entity and contract term extraction from documents. 
    - Wound identification & tissue classification.
    - The usual LLM and agent stuff. (I'd love to work on effective executive functioning)
    - Your nasty problem here.
I use the normal stacks you'd expect. Python, Pytorch, Spark/ray, Jupyter/Merimo, AWS, Postgres, Mathematica and whatever else is needed to get the job done. Ultimately it's about the problem, not the tools.

I have years of experience helping companies plan, prototype, and productionize sane data science solutions. Get in touch if you have a problem, my email is in my profile.


> Ideally, that thing should be related to your major, like programming competitions for CS. You need an accomplishment you bring to the table that no other applicant exceeds.

I wonder how Linus Torvalds would do in programming competition.


I think it not only varies by the words, but also the movie genre/director.

Every Christopher Nolan movie would need every word in boldface. Can you even hear (or understand) what is happening in Tenet?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: