That is at least an improvement over including in its corpus the entire worldwide collection of copyrighted materials.
Comment on Public AI: Free and Ethical AI models with Social good in mind
Dekkia@this.doesnotcut.it 13 hours ago
This is the definition of ethically sourced data from the Apertus website:
[…] the training corpus builds only on data which is publicly available.
So they still train on Websites, Blogs and Social Media. Ethical my ass.
theherk@lemmy.world 13 hours ago
Tywele@lemmy.dbzer0.com 10 hours ago
And they respect robots.txt afaik
Artisian@lemmy.world 4 hours ago
Let’s include the whole paragraph at least.