Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They didn't compare against the best models because they were trying to do "in class" comparisons, and the 70B model is in the same class as Sonnet (which they do compare against) and GPT3.5 (which is much worse than sonnet). If they're beating sonnet that means they're going to be within stabbing distance of opus and gpt4 for most tasks, with the only major difference probably arising in extremely difficult reasoning benchmarks.

Since llama is open source, we're going to see fine tunes and LoRAs though, unlike opus.



Llama is open weight, not open source. They don’t release all the things you need to reproduce their weights.


Not really that either, if we assume that “open weight” means something similar to the standard meaning of “open source”—section 2 of the license discriminates against some users, and the entirety of the AUP against some uses, in contravention of FSD #0 (“The freedom to run the program as you wish, for any purpose”) as well as DFSG #5&6 = OSD #5&6 (“No Discrimination Against Persons or Groups” and “... Fields of Endeavor”, the text under those titles is identical in both cases). Section 7 of the license is a choice of jurisdiction, which (in addition to being void in many places) I believe was considered to be against or at least skirting the DFSG in other licenses. At best it’s weight-available and redistributable.


Those are all great points and these companies need to really be called out for open washing


It's a good balance IMHO. I appreciate what they have released.


I appreciate it too, and they're of course going to call it "open weights", but I reckon we (the technically informed public) should call it "weights-available".


Has anyone tested how close you need to be to the weights for copyright purposes?


It's not even clear if weights are copyrightable in the first place, so no.


Is it really useful to make an LLM open source when it takes millions of $ to train it?

At that scale, open weights with permissive license is much more useful than open source.


Which large model projects are open source in that sense? That its full source code including training material is published.


Olmo from AI2. They released the model weights plus training data and training code.

link: https://allenai.org/olmo


even if they released them, wouldn't it be prohibitively expensive to reproduce the weights?


It's impossible. Meta itself cannot reproduce the model. Because training is randomized and that info is lost. First samples a coming at random. Second there are often drop-out layers, they generate random pattern which exists only on GPU during training for the duration of a single sample. Nobody saves them, it would take much more than training data. If someone tries to re-train the patterns will be different, which results in different weight and divergence from the beginning. Model will converge to something completely different. With close behavior if training was stable. LLMs are stable.

So, no way to reproduce the model. This requirement for 'open source' is absurd. It cannot be reliably done even for small models due to GPU internal randomness. Only the smallest trained on CPU in single thread. Only academia will be interested.


1.3 million GPU hrs for the 8b model. Take you around 130 years to train on a desktop lol.


Interesting. LLAMA is trained using 16K GPUs so it would have taken around a quarter for them. An hour of GPU use costs $2-$3 so training a custom solution using LLAMA should be atleast $15K to $1M. I am trying to get started with this thing. A few guys suggested 2 GPUs were a good start but I think that would only be good for 10K training samples.


On the topic of LoRAs and finetuning, have a Colab for LoRA finetuning Llama-3 8B :) https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe...


"within stabbing distance"

dunno if english is your mother tongue, but this sounds really good (although a tad aggressive :-) )) !


As Mike Judge's historical documents show, this enhanced aggression will seem normal in a few years or even months.


ML Twitter was saying that they're working on a 400B parameter version?


Meta themselves are saying that: https://ai.meta.com/blog/meta-llama-3/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: