What's the benefit for Meta? They are now the true open source AI providers (aft...

radq · on May 24, 2023

Mark Zuckerberg talks about this in their Q1 earnings call.

"I think that there's an important distinction between the products we offer and a lot of the technical infrastructure, especially the software that we -- that we write to support that. And historically, whether it's the Open Compute project that we've done or just open sourcing a lot of the infrastructure that we've built, we've historically open sourced a lot of that infrastructure, even though the products themselves are obviously were not -- we haven’t open sourced the code for our core products or anything like that.

And the reason why I think why we do this is that unlike some of the other companies in the space, we're not selling a cloud computing service where we try to keep the different software infrastructure that we're building proprietary. For us, it's way better if the industry standardizes on the basic tools that we're using and therefore we can benefit from the improvements that others make and others’ use of those tools can, in some cases like Open Compute, drive down the costs of those things which make our business more efficient too.

So I think to some degree we're just playing a different game on the infrastructure than companies like Google or Microsoft or Amazon, and that creates different incentives for us. So overall, I think that that's going to lead us to do more work in terms of open sourcing, some of the lower level models and tools.

But of course, a lot of the product work itself is going to be specific and integrated with the things that we do. So it's not that everything we do is going to be open. Obviously, a bunch of this needs to be developed in a way that creates unique value for our products, but I think in terms of the basic models, I would expect us to be pushing and helping to build out an open ecosystem here, which I think is something that's going to be important."

contrarian1234 · on May 24, 2023

"we can benefit from the improvements that others make and others’ use of those tools can"

I have always had the impression that maintaining an open source project is way more work than you get back from "the community" of users. Is this not true? Are for instance the internal facebook react users benefiting a huge amount from what outside contributes have built on top of react?

I think an unspoken dimension is that kneecaping the other big tech companies' entrenchments and denying them a market is always good for them - esp when as they point out, it doesn't actually hurt any of their own business interests. Other faang are always a future threat. Hurting them is always a good business move

0xB31B1B · on May 24, 2023

Broad use helps uncover bugs and make the software more resilient and reliable. They don’t fix all the bugs, and they don’t build features the community wants for the sake of it, but having users of your tools is a benefit.

liamwire · on May 24, 2023

Not only that, but also increases the talent pool you can hire from that has familiarity with your internal tooling from the start.

snovv_crash · on May 24, 2023

You get less input from the community than what you put in, but you also get different input than you would get from in-house devs who are all in the same bubble.

pawelmurias · on May 24, 2023

> I have always had the impression that maintaining an open source project is way more work than you get back from "the community" of users You get a ton of valuable work back from quality contributors. There is a vocal minority of people complaining that they feel burned out because of contributions but attracking some high quality contributors can help a lot. Pretraining new hires is valuable and the new hires will also train on the open source project docs.

aunty_helen · on May 24, 2023

MS put a cool 10b into OpenAI thinking they would have a massive tech moat. FB leaks llama and now OpenAI only has it's status as the bitcoin of LLMs (first, biggest, incumbent)

FB's plan is to F everyone else (MAAG) by making sure they can't make billions off tech that FB have sitting on the shelf, yet is extremely expensive for a true startup competitor to get in on.

robertlagrant · on May 24, 2023

It's noncommercial only though, right? So people can't spin it up and start using it in the work?

mx20 · on May 24, 2023

The software got rewritten already and the model weights are probably not protecttable. Especially if you use the model weights to train your own model. Why would you be allowed to use copyrighted data to train your Models but not other Models?

papruapap · on May 24, 2023

No directly but people can take it as "inspiration" to create new ones.

johnpublic · on May 24, 2023

Basically: "commoditise your complement " applied to Facebook, means they want to comoditise the foundational tech like AI. And open source is the route to that.

dpflan · on May 24, 2023

"For us, it's way better if the industry standardizes on the basic tools that we're using and therefore we can benefit from the improvements that others make and others’ use of those tools can, in some cases like Open Compute, drive down the costs of those things which make our business more efficient too." -- isn't this the Web 2.0 mantra applied to software?

iosjunkie · on May 24, 2023

This is the OSS model that been around for 30 years. Operating systems, web servers, countless other projects that help build the internet we know today. Now, AI tools from Meta.

arketyp · on May 24, 2023

In the highly interesting recent memo leaked from Google, the argument is made that open source will come out the winner in the AI battle and specifically that

"Paradoxically, the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet's worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.

The value of owning the ecosystem cannot be overstated. Google itself has successfully used this paradigm in its open source offerings, like Chrome and Android. By owning the platform where innovation happens, Google cements itself as a thought leader and direction-setter, earning the ability to shape the narrative on ideas that are larger than itself."

https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...

quaintdev · on May 24, 2023

They did same thing with numerous projects of theirs like golang, protobuf, grpc

robertlagrant · on May 24, 2023

This makes more sense than the moat argument. Open sourcing with a noncommercial licence means they get to incorporate effort back into their project, but others can't use it in their businesses. All the academic etc effort can be captured in this way.

mx20 · on May 24, 2023

The software part already got reimplemented and Models are used as training data for new Models by others. You could argue if using images and other copyrighted training data is allowed you also can use Model to train your own Model.

spacebacon · on May 24, 2023

Precisely. The old way contains innovation to a platform (fb apps, play, appstore). The new way speaks for itself. Total domination of an ecosystem from the root up.

dragonwriter · on May 24, 2023

> They are now the true open source AI providers

Except neither this model nor several of their recently-lauded “open” releases are open source; they are CC-BY-NC 4.0, aka, you are free to tinker and share, but not to use the work or derivatives for commercial purposes. Any community effort the Meta’s hobbyist-source license attracts is work that isn’t enabling commercial competition, unlike actual open source systems like Suno’s Bark (MIT) or even use-restricted-but-not-non-commercial shared source licenses like Stable Diffusion’s CreativeML Open RAIL-M.

Narew · on May 24, 2023

Some of these models are really open source. For example segment anything is MIT licensed.

stale2002 · on May 24, 2023

> Any community effort the Meta’s hobbyist-source license attracts is work that isn’t enabling commercial competition

So what?

Sure, maybe the Googles of the world aren't building on top of meta's products, but I can tell you that a lot of startups are.

Does it make these startups vulnerable, to long term future legal action? Sure, but nobody is thinking that far ahead. What people are thinking about is how to get users and show off flashy demos to investors.

Instead, people are just pushing out products, breaking meta's licenses, and not telling people about it, while they attempt to get traction.

Strict licensing, without enforcement, is not worth the paper that the contract is written on.

So yes, it is still beneficial that the code is released, even with a bad license.

dragonwriter · on May 24, 2023

> So what?

So, that's a reason that might wish to release a non-open-source model with this particular license, and one that provides an alternative to the “Meta is doing this because they stand to benefit from open source models taking off”, specifically, “Meta is doing this because it stands to benefit from drawing energy away from open source models into ones that cannot legally be used to commercially compete”.

> Does it make these startups vulnerable, to long term future legal action? Sure, but nobody is thinking that far ahead.

Well, the startups may not be, but Meta maybe is, and its acquiring a zero-cost, upside-only investment in every startup doing that. “Unjust enrichment”.

hnlmorg · on May 24, 2023

This might be an unpopular opinion on HN but the whole “ask for forgiveness not for permission” view some take to business feels pretty bad taste to me.

stale2002 · on May 24, 2023

If it works it works.

Crying about it doesn't change it's effectiveness.

And really, I don't think meta cares either.

They likely are releasing this stuff, with a strict license, just so they don't have any liability, or bad publicity.

But, they likely are happy that everyone is using their stuff.

capitol_ · on May 24, 2023

Aha, yes, the business ethics 'it's only illegal if you get caught' that gave us Elizabeth Holmes and similar.

stale2002 · on May 24, 2023

That is indeed how the law works. The law is nothing without enforcement.

A better example would be Uber, though, a company which is now massively success despite early legal problems

Also, once again, Facebook likely wants everyone to be using their product and is unlikely to go after people.

Buttons840 · on May 24, 2023

But I am able to train my own LLM on the output of their LLM, right? Or are the big AI players going to argue that you cannot train an AI on data you don't have a license to? (See the catch 22 here?)

dragonwriter · on May 24, 2023

> But I am able to train my own LLM on the output of their LLM, right?

Sure. And, there's an argument that the license only applies to the code because model weights aren’t subject to copyright anyway. And available-under-any-license is a lot better than OpenAI’s current stance as far as enabling anyone else, since they’ve gone completely closed to the point where even their papers on their models are more PR than reproducible science. There's a continuum from secret sauce to “do what thou wilt”, and I am not a zealot arguing anything not Open Source must be rejected as not a positive step.

mijoharas · on May 24, 2023

I think some of the licenses specifically forbid that. (as far as I remember, chatGPT has that in their terms of service).

KRAKRISMOTT · on May 24, 2023

This is interesting, I remember previously bark was license encumbered because the neural codec provider Encodec (also by meta) was non-free.

woodson · on May 24, 2023

EnCodec has since been relicensed to MIT.

ftufek · on May 24, 2023

My guess is this isn't their competitive edge, network effects, products, data and distribution is.

In a way, it takes away their competitors edge while racing to the bottom to compete with open source. At the same time, they establish themselves as experts and keep attracting great talent that wants to publish their work openly. And it benefits all of us, so good marketing amongst developers too.

throwaway20222 · on May 24, 2023

This comment resonates with me and reminds me of T-Mobile making international roaming free; they didn’t really have a ton of business coming from that service, but knew how important it was for their competitors. They made theirs free and forced the industry down that path. (Have since added some fees back but the point is similar to your thoughts)

imwm · on May 24, 2023

As Ben Thompson pointed out recently, unlike Google and OpenAI, Meta benefits from open source AI taking off because that makes everyone better content creators, which accrues further value to their social media platforms.

dragonwriter · on May 24, 2023

If Meta stands to benefit from open source AI taking off, why are its models CC-BY-NC 4.0 instead of open source?

EDIT: On reflection, you can probably extend the content creation argument to say that noncommercial tools enabling that without enabling commercial competition, to the extent that some of the models will be integrated into Meta products, is the best of all worlds for Meta, so the basic argument works even without open source in the strict sense.

wernst · on May 24, 2023

Commoditize your complement https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/

rapsey · on May 24, 2023

Meta has one of the best if not the best open source track record. They do it likely because it does not interfere with their business model. If outsiders find ways to improve their tech it only helps them.

robertlagrant · on May 24, 2023

Google's is pretty incredible too. Golang; VP8; Chromium; K8s; Android.

screye · on May 24, 2023

Open source = no money.

Facebook doesn't want the models to be the money making bit, because they aren't a licensing/subscription service. They are an ads and soon hardware-platform company. They want those bits to be what people pay for. Not the models.

All these models are licensed under a non-commercial license. So their competitors don't gain a real advantage.

Other than OpenAI (who are remarkably tight lipped), ML researchers are pretty chatty in both their papers and watercooler hangouts. So, the information is going to get out either way. Might as well get ahead of it, and look like the good guy in the process.

reaperman · on May 24, 2023

This model is vision-only so it can't be SOTA even if it's #1 performing in many of the original categories of benchmarks, which it is (it's a very very good model).

We've moved on from ImageNet-style tests "Choose the most appropriate label for this image from 200 possible labels" to much more advanced "Reasoning" tests[0]. PaLI[1] is potentially the SoTA here but BeIT-3[2] may be better example for my thesis. Notice that BeIT-3 is trained on not just images, but also trained in natural language. It outperforms purely image-trained models on even pure-image tasks like Object Detection and Semantic Segmentation.

Take a look at the major benchmarks for Segmentation (ade20k) [4]: DINOv2, 11th place. BEiT-3, 4th place. Yes, BEiT-3 has 72% more parameters but it's also basically an entire LLM. Even GPT-4 is a multi-modal model, and actually accepts images as prompt inputs, OpenAI just doesn't expose that ability.

More importantly, the new multi-modal models can understand human questioning like "What type of flowers are in the blue buckets of this image?" and respond intelligently, in English/whatever.

DINOv2 was trained with techniques borrowed from LLM training methods, but is not trained for natural language.

0: https://paperswithcode.com/area/reasoning

1: https://arxiv.org/pdf/2209.06794v2.pdf

2: https://paperswithcode.com/paper/image-as-a-foreign-language...

3: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

4: https://paperswithcode.com/sota/semantic-segmentation-on-ade...

nologic01 · on May 24, 2023

In randome order

1) Positive PR to compensate, e.g for privacy law violation fines (see yesterdays news cycle)

2) Identifying and nurturing the next generation of hires or acquihires in this very technical area

3) Depriving the competition from alternative business models that might threaten the adtech walled garden they exclusively rely on

4) Collecting ideas about how these models can be improved or used creatively (related to 2)

5) Keeping some key employees happy with intangible rewards (name recognition, career prospects)

When you spend billions on development this type of strategic leakage is just a few marked droplets into a digital ocean

rippeltippel · on May 24, 2023

Following the leak of Llama, they now probably enjoy having a worldwide open-source community improving their models for free, as elaborated in this leaked (sic!) Google document: https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...

bugglebeetle · on May 24, 2023

Purely my speculation: OpenAI is hobbling their products trying to support all kinds of integrations, specifically Microsoft’s. GPT-4 is not performant enough for end user applications, so they’ve had to gimp a lot of its reasoning to make it speedier.

This opens up an opportunity for their competitors to eat into their moat because OpenAI is treading water/downgrading their product, chasing scale. Meta is leveraging this opening to flood the field with amazing open source tools, all of which compete with OpenAI offerings, knowing that the open source community will run with them and further erode OpenAI’s moat.

whiplash451 · on May 24, 2023

A nice side effect for Meta is also to bring the pytorch vs tensorflow battle to its final chapter.

samstave · on May 24, 2023

Litmus testing.

Meta is cornered. And it needs to figure its way through its current positioning.

They have some deep tech, and they have deep pools of historic global sentiment, which, if one were to train GPT4 on...

So, I can only surmise nefarious actions shall (are?) be afoot.

i2cmaster · on May 24, 2023

All of the innovation happens for their models first and they don't have to pay for it.