Wikipedia, the online nonprofit encyclopedia, laid out a simple plan to ensure its website continues to be supported in the AI era, despite its declining traffic.
Can’t you just download the entire thing for free?
Submitted 4 months ago by BrikoX@lemmy.zip to technology@lemmy.zip
Wikipedia, the online nonprofit encyclopedia, laid out a simple plan to ensure its website continues to be supported in the AI era, despite its declining traffic.
Can’t you just download the entire thing for free?
I imagine this would be discouraged for corporate entities. Corps shouldn’t freeload.
I don’t get it though… Why would any company use this when Wikimedia also offers a download of the entirety of Wikipedia, for free?
Maybe it’s just that if the AI companies don’t know, they can hopefully get a little money from them?
You think AI companies care what they scrape. Their system is set up to scrape anything it can get.
Oh I know, I was just thinking that if the AI companies will make an exception for Wikipedia (by paying) like the Wikimedia people think, they could also download the complete thing for free. But yeah they probably won’t do any of that so this was kinda useless I think
They can scrape an ongoing log of interactions between editors about the articles themselves, which is probably fairly worthwhile content honestly. More content there than in articles probably as well.
From skimming that linked page, I think that this download perhaps doesn’t include recent pages? Because in the section talking about enterprise stuff, it mentions the paid API for recent articles
It seems you’re right, I’m just dumb and didn’t read ths article I linked
honestly, this will only work if the AI companies were actually ethical which… they’re not known to be.
Paid API…
Wikipedia = Reddit
Wikipedia (or the Wikimedia Foundation) is mostly driven by donations and volunteers, unlike Reddit…
Also, scraping every page on Wikipedia is incredibly heavy, especially compared to things like downloading a compressed copy of the entire site through torrents.
In the age of AI slop that you can’t trust, Wikipedia use is going down??
Doubtful
Kind of funny: When Wikipedia was new, people often said that you couldn’t trust information on it because anyone could have written it, even if they were unqualified, biased, or deliberately deceptive. I guess that’s still true today, but with the advent of automated misinformation generators, the Wiki almost seems authoritative in comparison.
Can confirm, I’ve been a Wikipedia zealot the entire time and people really do seem to have accepted it. If you ignore what else makes them cheer, it’s a huge victory.
Yeah, when I was at school in the early 00s we were specifically told not to use Wikipedia as a research source.
People think they can trust the slop, is the thing. If they even think so far ahead, they probably think that an answer that exists on wikipedia will just be provided by the AI, saving them the time to search for it themselves. I’ve heard more than one horror story of ChatGPT use in particular backfiring on someone who somehow legitimately thought it was just another form of search engine, and didn’t verify the information provided.
sgtlion@hexbear.net 4 months ago
If AI paid fairly for their training data, they’d be making the biggest losses in human history.
It’s almost like all successful capitalist business is based on theft and exploitation.