Comment

Comment on Wikipedia has banned AI-generated text, with two exceptions

Wikipedia probably wants to sell access to LLMs to train. It’s only valuable if Wikipedia remains a high-quality, slop-free source.

I think even AI zealots think there should be silos of content to train from that are fully human generated. Training slop on slop makes the slop even worse.

source

Sort:hotnew top

Grimy@lemmy.world ⁨2⁩ ⁨months⁩ ago
Sell licenses of what? It’s already all in the creative commons iirc.

source
- Zagorath@quokk.au ⁨2⁩ ⁨months⁩ ago
  The content is CC licensed, but they are trying to block AI scraping because it overloads their servers. They have a paid API that uses a lot less compute for both Wikipedia and the AI, as well as being a revenue source for Wikipedia.
  
  source
  - ricecake@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
    Yes, but…
    
    en.wikipedia.org/…/Wikipedia%3ADatabase_download
    
    That’s because viewing the page uses server resources, as done API access. If you want the data you can download the database directly.
    
    source
SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨months⁩ ago
AI already trains on Wikipedia.

commoncrawl.org

source
MountingSuspicion@reddthat.com ⁨2⁩ ⁨months⁩ ago
This was only done because the editors pushed to minimize AI involvement. There’s a comment here already mentioning that: lemmy.world/comment/22826863

source