Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's quite funny to me that "open source" has turned out to be a significantly more confusing term than "Free Software" due to the fact that people don't seem to understand why the word "source" is in there.

The word "source" is literally meant to mean "where it comes from", both in regards to executable software, and large language models. If the training data for a language model is not "open", then the language model is not "open source", full stop. Training data is the source of a language model in the same way code is the source of an executable program.



> people don't seem to understand why the word "source" is in there.

Disagreeing with you is not the same as not understanding. There clearly isn't a consensus, but no shortage of people just asserting that others are wrong.


If you have access to any language models, I encourage you to ask this question:

> What is the significance of the word "source" in the expression "source code"?

I landed on this question by first asking it about "open source" rather than "source code", but the answers referred to "source code", which is a bit circular for this conversation. I'll share a truncated version of how Mistral Large replied:

> It is called the "source" because it is the origin or the primary input from which the executable form of the program is derived.

The primary input from which a language model is derived is the training data. That is the source. If the training data for a model is not open, then the model is not open source, because the source of the model is not open.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: