Comment

Comment on Public AI: Free and Ethical AI models with Social good in mind

Dekkia@this.doesnotcut.it ⁨5⁩ ⁨months⁩ ago

This is the definition of ethically sourced data from the Apertus website:

[…] the training corpus builds only on data which is publicly available.

So they still train on Websites, Blogs and Social Media. Ethical my ass.

source

Sort:hotnew top

theherk@lemmy.world ⁨5⁩ ⁨months⁩ ago
That is at least an improvement over including in its corpus the entire worldwide collection of copyrighted materials.

source
- Tywele@lemmy.dbzer0.com ⁨5⁩ ⁨months⁩ ago
  And they respect robots.txt afaik
  
  source
- Dekkia@this.doesnotcut.it ⁨5⁩ ⁨months⁩ ago
  But that stuff is copywritten as well most of the time.
  
  Just because it’s free to look at doesn’t mean it’s free to download, modify or feed into an AI.
  
  source
  - theherk@lemmy.world ⁨5⁩ ⁨months⁩ ago
    Yeah, for sure. I’m not saying it is good at all, just that scraping some proportion of copyrighted material is an improvement over scraping all the copyrighted material.
    
    source
Artisian@lemmy.world ⁨5⁩ ⁨months⁩ ago
Let’s include the whole paragraph at least.

Apertus was developed with due consideration to Swiss data protection laws, Swiss copyright laws, and the transparency obligations under the EU AI Act. Particular attention has been paid to data integrity and ethical standards: the training corpus builds only on data which is publicly available. It is filtered to respect machine-readable opt-out requests from websites, even retroactively, and to remove personal data, and other undesired content before training begins.

source
- Dekkia@this.doesnotcut.it ⁨5⁩ ⁨months⁩ ago
  So it’s opt-out. Great
  
  source
  - Artisian@lemmy.world ⁨5⁩ ⁨months⁩ ago
    As I read it, data must be available according to swiss copyright law, not personal, available using the open web. Further, they retroactively respect opt-out requests.
    
    source
kalkulat@lemmy.world ⁨5⁩ ⁨months⁩ ago
Sounds like a ripping good way to keep corporate data (and government secrets) from the public radar.

That way we won’t find out whose hands public taxdollars (or public-owned structures rented to corporations) wind up in.

source
- Artisian@lemmy.world ⁨5⁩ ⁨months⁩ ago
  ?? Which are improved by using ChatGPT because?
  
  source