Recent reporting by Nieman Lab describes how some major news organizations—including The Guardian, The New York Times, and Reddit—are limiting or blocking access to their content in the Internet Archive’s Wayback Machine. As stated in the article, these organizations are blocking access largely out of concern that generative AI companies are using the Wayback Machine as a backdoor for large-scale scraping.
These concerns are understandable, but unfounded. The Wayback Machine is not intended to be a backdoor for large-scale commercial scraping and, like others on the web today, we expend significant time and effort working to prevent such abuse. Whatever legitimate concerns people may have about generative AI, libraries are not the problem, and blocking access to web archives is not the solution; doing so risks serious harm to the public record.
daychilde@lemmy.world 3 days ago
Knowledge rot is already a problem and has been for years – where you try to follow some links only to find they’re dead, or people deleted their content. The anecdotes of finding some old problem and someone just said “I figured it out”. Sure, archival won’t fix that specific example, but the principle is there - we lose so much information.
It would be nice if we had a government that worked for We the People and made information archival mandatory — likr the Library of Congress already does with printed materials.
grue@lemmy.world 2 days ago
xkcd.com/979/
daychilde@lemmy.world 2 days ago
Precisely that one, yes :)
vacuumflower@lemmy.sdf.org 2 days ago
Yes, so there was a time when I was dreaming day and night about something like those LLMs, but for archiving knowledge. That is, archiving existing statements with subjects and objects and relations, a bit more high-level and less generalized than LLMs. Syllogisms, semantic relationships, distances in application. Sort of what holocrons are in Star Wars.
daychilde@lemmy.world 2 days ago
So kinda like an ethical LLM^[But I get your distinctions and I’m on board with that. It’d be nice!]. I’d be on board with that.
I know it’s unpopular to say, but I’ve found the latest version of Gemini to be pretty useful. But you have to know what they’re good for and not. General knowledge? Generally pretty decent. But you have to ask for sources and check those sources, and don’t tell it what you think, ask it what it knows and to admit when it doesn’t know things. I wouldn’t put my life on the line, but for looking up random stuff, it’s pretty decent.
I know LLMs will get worse and shittier, which I think is a bummer, because they could be so damned useful.