This is a really interesting thread to read as a lawyer who spends his days (recently) considering how existing laws apply to AI training and operation.
I'll start by describing my understanding of the allegation, and then talk to some points raised.
I don't read Italian, but the English-language statement by Garante (Italy) is here. [0]. It appears to allege both that the original training (creation of the model) and the ongoing improvement via customer Inputs was done without a "legal basis." GDPR requires companies to have a "legal basis" for processing personal data before they do so -- whether that be from consent, or a legitimate interest (very vague), or for a task carried out in the public interest.
Italy isn't going into details, but saying they're banning while they investigate.
Some interesting questions raised in the thread: does the model actually contain personal data? Probably yes. Even if data is converted to integers during training, since those integers can be reversed into personal data, GDPR counts it as "pseudonymized" not "anonymized," and therefore subject to GDPR.
Did OpenAI actually not get consent? It seems to me they did obtain consent for using user inputs for training, since their terms of service plainly state they will use inputs for training data. But GDPR likes to demand higher standards of consent (checkboxes) and as far as I know OpenAI didn't use checkboxes.
I think this is pretty interesting to watch play out.
Finally some facts - the rest of the thread is wild speculation and blind judgement. Yes, few countries lose access to strongest available LLM while the investigation is undergoing. There are bigger things at stake.
I don't think this is about current model containing personal data. IMO this is about "Open"AI operation, a US based company (with US lack of regulation and disregard for user privacy) collecting the data about the rest of the world. And we are talking about _very_ personal data. The kind of data people would be reluctant to share even with their doctors and psychologists, but will still mention to the non-judgemental chatbot with a memory they can seemingly wipe at will. This is potentially dangerous. Governments should at least investigate how this data is collected and used. The funny thing is, both sides of Atlantic benefit when EU is enforcing the GDPR.
Italy, France, Germany... that's already a big enough market for a more ethical competitor to step in and provide an alternative LLM. We desperately need another player in this field. (BTW, please sign the LAION's petition to democratize the AI research)
I'll start by describing my understanding of the allegation, and then talk to some points raised.
I don't read Italian, but the English-language statement by Garante (Italy) is here. [0]. It appears to allege both that the original training (creation of the model) and the ongoing improvement via customer Inputs was done without a "legal basis." GDPR requires companies to have a "legal basis" for processing personal data before they do so -- whether that be from consent, or a legitimate interest (very vague), or for a task carried out in the public interest.
Italy isn't going into details, but saying they're banning while they investigate.
Some interesting questions raised in the thread: does the model actually contain personal data? Probably yes. Even if data is converted to integers during training, since those integers can be reversed into personal data, GDPR counts it as "pseudonymized" not "anonymized," and therefore subject to GDPR.
Did OpenAI actually not get consent? It seems to me they did obtain consent for using user inputs for training, since their terms of service plainly state they will use inputs for training data. But GDPR likes to demand higher standards of consent (checkboxes) and as far as I know OpenAI didn't use checkboxes.
I think this is pretty interesting to watch play out.
[0] - https://www.gpdp.it/web/guest/home/docweb/-/docweb-display/d...