AI already trains on Wikipedia.
Comment on Wikipedia has banned AI-generated text, with two exceptions
errer@lemmy.world 3 weeks agoWikipedia probably wants to sell access to LLMs to train. It’s only valuable if Wikipedia remains a high-quality, slop-free source.
I think even AI zealots think there should be silos of content to train from that are fully human generated. Training slop on slop makes the slop even worse.
SuspciousCarrot78@lemmy.world 3 weeks ago
MountingSuspicion@reddthat.com 3 weeks ago
This was only done because the editors pushed to minimize AI involvement. There’s a comment here already mentioning that: lemmy.world/comment/22826863
Grimy@lemmy.world 3 weeks ago
Sell licenses of what? It’s already all in the creative commons iirc.
Zagorath@quokk.au 3 weeks ago
The content is CC licensed, but they are trying to block AI scraping because it overloads their servers. They have a paid API that uses a lot less compute for both Wikipedia and the AI, as well as being a revenue source for Wikipedia.
ricecake@sh.itjust.works 3 weeks ago
Yes, but…
en.wikipedia.org/…/Wikipedia%3ADatabase_download
That’s because viewing the page uses server resources, as done API access. If you want the data you can download the database directly.