They didn't compare against the best models because they were trying to do "in c...

blackeyeblitzar · on April 18, 2024

Llama is open weight, not open source. They don’t release all the things you need to reproduce their weights.

mananaysiempre · on April 18, 2024

Not really that either, if we assume that “open weight” means something similar to the standard meaning of “open source”—section 2 of the license discriminates against some users, and the entirety of the AUP against some uses, in contravention of FSD #0 (“The freedom to run the program as you wish, for any purpose”) as well as DFSG #5&6 = OSD #5&6 (“No Discrimination Against Persons or Groups” and “... Fields of Endeavor”, the text under those titles is identical in both cases). Section 7 of the license is a choice of jurisdiction, which (in addition to being void in many places) I believe was considered to be against or at least skirting the DFSG in other licenses. At best it’s weight-available and redistributable.

blackeyeblitzar · on April 18, 2024

Those are all great points and these companies need to really be called out for open washing

amitport · on April 19, 2024

It's a good balance IMHO. I appreciate what they have released.

ikurei · on April 19, 2024

I appreciate it too, and they're of course going to call it "open weights", but I reckon we (the technically informed public) should call it "weights-available".

lumost · on April 19, 2024

Has anyone tested how close you need to be to the weights for copyright purposes?

tdullien · on April 19, 2024

It's not even clear if weights are copyrightable in the first place, so no.

whiplash451 · on April 21, 2024

Is it really useful to make an LLM open source when it takes millions of $ to train it?

At that scale, open weights with permissive license is much more useful than open source.

throwaway4good · on April 19, 2024

Which large model projects are open source in that sense? That its full source code including training material is published.

soccernee · on April 19, 2024

Olmo from AI2. They released the model weights plus training data and training code.

link: https://allenai.org/olmo

ktzar · on April 19, 2024

even if they released them, wouldn't it be prohibitively expensive to reproduce the weights?

zingelshuher · on April 20, 2024

It's impossible. Meta itself cannot reproduce the model. Because training is randomized and that info is lost. First samples a coming at random. Second there are often drop-out layers, they generate random pattern which exists only on GPU during training for the duration of a single sample. Nobody saves them, it would take much more than training data. If someone tries to re-train the patterns will be different, which results in different weight and divergence from the beginning. Model will converge to something completely different. With close behavior if training was stable. LLMs are stable.

So, no way to reproduce the model. This requirement for 'open source' is absurd. It cannot be reliably done even for small models due to GPU internal randomness. Only the smallest trained on CPU in single thread. Only academia will be interested.

lawlessone · on April 19, 2024

1.3 million GPU hrs for the 8b model. Take you around 130 years to train on a desktop lol.

iamlearningai · on April 23, 2024

Interesting. LLAMA is trained using 16K GPUs so it would have taken around a quarter for them. An hour of GPU use costs $2-$3 so training a custom solution using LLAMA should be atleast $15K to $1M. I am trying to get started with this thing. A few guys suggested 2 GPUs were a good start but I think that would only be good for 10K training samples.

danielhanchen · on April 19, 2024

On the topic of LoRAs and finetuning, have a Colab for LoRA finetuning Llama-3 8B :) https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe...

wiz21c · on April 19, 2024

"within stabbing distance"

dunno if english is your mother tongue, but this sounds really good (although a tad aggressive :-) )) !

waffletower · on April 19, 2024

As Mike Judge's historical documents show, this enhanced aggression will seem normal in a few years or even months.

htrp · on April 18, 2024

ML Twitter was saying that they're working on a 400B parameter version?

mkl · on April 18, 2024

Meta themselves are saying that: https://ai.meta.com/blog/meta-llama-3/