This sounds like some weirdly petty political wrangling that would delight any full-blooded bureaucrat.
The desire to make demands about training data is weird. Open source has never included a requirement to provide documentation of any kind. If there was some requirement for documentation, few would care and most just do their thing anyway. FOSS licenses facilitate sharing by giving people an easy way to make their code legally usable by others.
There’s nothing that quite matches source code + compiled binary. There are permissively licensed datasets and models. I’ll call either open source. Neither is equivalent to source code but either can be a source.
Nice article. That’s exactly what I’ve been complaining about for the last year.
In the end it’s not super obvious how to apply the term “open source” to AI models. And not everyone uses the OSI definition anyways.
And after all the lawsuits started last year, it’s clear why they do it. To protect themselves. Maybe it’s going to get better once we update our legislation.
Ultimately, the community needs to decide what it’s trying to achieve
And I’d underline that this “community” consists of multi billion dollar big tech companies that have the millions to spare to train these models. It’s not “we” the people.