Given that the Internet Archive is the de facto standard way to cite material as seen on a given date — they’re a trustworthy party that will probably persist for a long time — that’s going to make it harder to cite content on Reddit.
Reddit will block the Internet Archive
Submitted 2 weeks ago by General_Effort@lemmy.world to technology@lemmy.world
https://www.theverge.com/news/757538/reddit-internet-archive-wayback-machine-block-limit
Comments
tal@lemmy.today 2 weeks ago
Deceptichum@quokk.au 2 weeks ago
Damn, guess if you want reddit data to train your AI that you’ll need to pay Spez for access.
tal@lemmy.today 2 weeks ago
It’s important for people writing papers and such who need to cite material.
I wonder if there’s some way to use the TLS certificate to bootstrap a cryptographically-signed copy of a webpage with timestamp that someone could later validate as having been downloaded on that date. I don’t know if existing TLS libraries are capable of that. Like, Web browser menu option “Store cryptographically-signed webpage”. Absent a later certificate compromise, I’d think that that’d at least provide people a way to credibly say “this is really what was on that webpage on August 15th, 2026”.
misteloct@lemmy.dbzer0.com 2 weeks ago
Don’t forget, Reddit is legally allowed to train on your content, but not the other way around. It’s consistent with US law, where corporate tax is half of income tax.
conorab@lemmy.conorab.com 2 weeks ago
As somebody who often ends up using Reddit like Stackoverflow and in some cases needing the Internet Archive (IA) to find the original post after it’s been deleted or garbled, I think this is a wakeup call for those go to Reddit both to get technical help and to post it. More than ever, Reddit is becoming an unreliable place to find answers for old obscure issues and if they are going to lockout places like the IA then I think it’s time people stopped contributing their solutions to Reddit.
cashsky@sh.itjust.works 2 weeks ago
Searching anywhere in general is getting shittier and shittier by day. Web searches are riddled with hallucinated AI generated garbage pages. Finding the right answer for difficult problems is getting worse and worse. We are sliding rapidly into Idiocracy.
dizzy@lemmy.ml 2 weeks ago
Not to mention so many projects putting their support in walled garden chat services like Discord that you can’t even search via search engine. Even if you can figure out who asked the right question and when, you have to trawl through a sea of inane garbled chat to get to the developer/expert response.
Specialised topic forums really need to make a resurgence but I doubt they will.
baggachipz@sh.itjust.works 2 weeks ago
We are sliding rapidly into Idiocracy.
Buddy, we are already there. “Ow, my balls!” Would be high-brow tv these days.
mojofrododojo@lemmy.world 2 weeks ago
yup. continuing to feed them traffic after their repeated attacks on the userbase is just sad. stop using them. yeah it sucks the info is gone, but acting like they’ll wake up and change is absurd.
NauticalNoodle@lemmy.ml 2 weeks ago
When I joined Lemmy I decided it was unwise to trust anything on Reddit less than a year old. Now it’s anything under two years old.
Sxan@piefed.zip 2 weeks ago
Every instance where I've needed to use TIA for someþing on Reddit (because Reddit blocks some of my VPN exit nodes), it's been for some old post. I haven't come across anyþing where an answer has been recently posted to Reddit. Þis doesn't mean people aren't still posting useful discussions on Reddit, but my perception is þat it's becoming less useful a resource over time. Maybe because þe knowledgeable people have mostly migrated off?
Ofttimes what I've looked up in TIA for Reddit was already cached. Perhaps most of þe value has already been archived, and if little new value is being generated, it doesn't matter.
Þe upshot is, I'm not sure how much effect þis will actually have.
mrgoosmoos@lemmy.ca 2 weeks ago
exact same here. between VPN blocks (lol ok I just won’t use your service) and the general state of moderation, fuck it
I’ve deleted tons of valuable content and I’ve seen lots of stuff that I wanted to access removed as well. it’s annoying, but oh well. other forums will remain
mazzilius_marsti@lemmy.world 2 weeks ago
most of my technical questions about Linux are not even answered lol. So difficult to get good answers on reddit.
Blackmist@feddit.uk 2 weeks ago
It’s another move to protect against AI scraping that isn’t paying them for access.
sqgl@sh.itjust.works 2 weeks ago
Weren’t Reddit comparing a couple of years ago that too many AI bots crawls were stressing their servers.
Doesn’t the internet archive relieve that stress?
supersquirrel@sopuli.xyz 2 weeks ago
Doesn’t the internet archive relieve that stress?
I think that was probably the real reason for the block, the Internet Archive is too functional, scalable and accessible of a service for reddit’s lame excuses about needing to gatekeep access to the community created content on their website to not make reddit look totally stupid unless they came up with an excuse to block the Internet Archive.
Keyboard@lemmy.world 2 weeks ago
I already gave up from Reddit long time ago. Deleted all
Truscape@lemmy.blahaj.zone 2 weeks ago
When RIF died, Voyager became the new forum app for me.
boonhet@sopuli.xyz 2 weeks ago
Apollo and Voyager for me so I straight-up retained the same UI.
Keyboard@lemmy.world 2 weeks ago
Maybe I should try voyager too
Keyboard@lemmy.world 2 weeks ago
Thanks for sharing. I will check it out
jjlinux@lemmy.zip 2 weeks ago
Yup, same here.
mojofrododojo@lemmy.world 2 weeks ago
this is the way.
captainastronaut@seattlelunarsociety.org 2 weeks ago
As long as the previous collections of archives are still intact. We probably don’t need all of their new spam posts in the wayback machine anyway
hamFoilHat@lemmy.world 2 weeks ago
It is my understanding that if you block the wayback machine from indexing your site it will also delist the history as well.
Jason2357@lemmy.ca 2 weeks ago
They do archive sites against the owners wishes when they consider it an important site for public archiving, like some news sites. They are in no obligation to delete the archives and hope they don’t.
Natanael@infosec.pub 2 weeks ago
The ability to block crawling is separate from the ability to delist old pages. The latter usually happens after domains change owners
Sxan@piefed.zip 2 weeks ago
LOL I should have scrolled down first You said what I said, with fewer words, first.
DFX4509B_2@lemmy.org 2 weeks ago
Just more vindication for my ditching that trash heap of a platform.
Someonelol@lemmy.dbzer0.com 2 weeks ago
YouTube’s already throttling users in their mobile site. They have these massive channel cards in their feeds and the video titles/thumbnails disappear after a few offerings, leaving you with the ability to blindly click on a video.
DFX4509B_2@lemmy.org 2 weeks ago
I’ve declared my YT channel to be dormant starting on the 13th due to this AI age-gating crap.
wanchutri@jlai.lu 2 weeks ago
Time to use peertube
DFX4509B_2@lemmy.org 2 weeks ago
And Invidious while that’s still an option, but I have both a PeerTube and Odysee set up already.
Njos2SQEZtPVRhH@piefed.social 2 weeks ago
People who posted on Reddit ( speaking in the past tense, because who would continue to do so now that we have better things? ) never intended for it to be of limited access. Reddit was a publicly accessible place, and people shared their thoughts and comments on it because it was the frontpage on the internet, so the place of choice to share things with the world. That being scraped should not be a problem. But clearly Reddit didn't want to give you a platform to share your thoughts with the world, they wanted you to donate your thoughts and take it as their property so that they can capitalize on it.
General_Effort@lemmy.world 2 weeks ago
I don’t know… I mean, I agree. But I’m seeing a lot of demands that instances should prevent scraping. Ok, it could be astroturf; a campaign by Reddit/data brokers to neutralize the free competition. But you have seen all those deleted posts on Reddit. Those are some special little minds.
Njos2SQEZtPVRhH@piefed.social 2 weeks ago
you're right, there's probably some anti-ai/anti-scraping folks on there aswell as here. Personally I most definitely hate intellectual property more than I do generative AI. But you're right, different people on there will feel differently. But the point still stands that for those who thought they shared their thoughts with the world, their ideas that they donated were taken from them.
bigbabybilly@lemmy.world 2 weeks ago
That place is becoming more and more of a shithole. Bots, Ads, trolls, garbage mods… deleted the app last month.
espentan@lemmy.world 2 weeks ago
I quit reddit, cold turkey, the day they shut off free API access for 3rd parties. Except for a couple of fairly niche subs I haven’t missed it at all.
AstralPath@lemmy.ca 2 weeks ago
Same here. I’ve been better off ever since.
User79185@discuss.tchncs.de 2 weeks ago
This is huge blow to archivism, thanks to corporate greed and enshittification of reddit. Worst MBA filled POS.
MangioneDontMiss@lemmy.ca 2 weeks ago
reddit can go fuck itself.
Eh_I@lemmy.world 2 weeks ago
That’s the kind of talk that can get you banned from Reddit. 😜
MangioneDontMiss@lemmy.ca 2 weeks ago
I imagine almost my entire Post history can get me banned on Reddit.
JakenVeina@midwest.social 2 weeks ago
The company says that AI companies have scraped data from the Wayback Machine, so it’s going to limit what the Wayback Machine can access.
Yeah, wouldn’t want those AI companies to get all that data for free. Gotta make 'em pay for it.
brygphilomena@lemmy.dbzer0.com 2 weeks ago
Instead of regulating tech, they are going the fuck over everyone route.
SocialMediaRefugee@lemmy.world 2 weeks ago
So reddit will become even less valuable
HexesofVexes@lemmy.world 2 weeks ago
Oh no, someone might not be paying them for their user generated content (!)
To be fair, it’s probably best that history forgets this period of the web…
ulterno@programming.dev 2 weeks ago
that history forgets this period
and thus it repeats
WhyJiffie@sh.itjust.works 2 weeks ago
don’t forget, we easily repeat what we “learned” anyway
MadMadBunny@lemmy.ca 2 weeks ago
Damn you Spez.
ozoned@piefed.social 2 weeks ago
Good plan. Keep locking down your big tech platforms, and we'll all be over here letting folks know where they can find freedom.
aquovie@lemmy.cafe 2 weeks ago
Careful. Lemmy is too small to draw the attention of sophisticated, persistent abuse. As a company, Reddit has struggled with revenue and we’ve all seen those struggles quite publicly. Lemmy instances with those same challenges would probably just fold and close up.
Federated networks give you freedom but the potential for abuse is proportional to that freedom while at the same time, federation is far more expensive taken as a whole.
bytesonbike@discuss.online 2 weeks ago
Lemmy instances with those same challenges would probably just fold and close up.
Can confirm. I set up a pixelfed instance for my city with the goal of moving people from Insta to this version. After about three months, user accounts went from 1-10 signups a week to a hundred a week.
No way did that many business owners sign up. And yep, all spam.
After a while, my random weekend project in Spring became a full time job. I closed it last month.
girsaysdoom@sh.itjust.works 2 weeks ago
I’m sure it would persist even after an event of malicious activity. It may just turn out like email with servers needing to be added to an allowlist at worst and more moderation. I think scalability might be the limiting factor at some point though and as a result we could end up with several disconnected islands of server clusters instead of globally meshed servers.
yarr@feddit.nl 2 weeks ago
Or… let them stay on Reddit. I like lemmy much better, and it’s possibly due to the people that are not present and the lack of commercial interest.
ozoned@piefed.social 2 weeks ago
No harm in that. To each their own. :-) Everyone gets to decide at least.
Jason2357@lemmy.ca 2 weeks ago
I think if the fediverse was ever to become more mainstream, it would naturally splinter. For example, the corporate stuff would be big, and those people who value the small-instance experience we have now would probably de-federate from it. There would always be small fediverses, even if the big fediverses got REALLY big.
ZombieMantis@lemmy.world 2 weeks ago
Just make your own invite-only server if you’re so worried about it. Digital freedom should be for everyone, not just a few antisocial nerds.
Capybara_mdp@reddthat.com 2 weeks ago
Does anyone have any good tech- related forums on Lemmy? I’m still digging around as i find a lot of interesting but “Quiet” ones.
bytesonbike@discuss.online 2 weeks ago
In the tech world, we call that a honeytrap.
Bloomcole@lemmy.world 2 weeks ago
‘freedom’ as long as the mod agrees with you.
phantomwise@lemmy.ml 2 weeks ago
Nice of them to protect their (users’) content from AI scrapping. So that they can charge AI companies for it instead.
muusemuuse@sh.itjust.works 2 weeks ago
They aren’t doing that. They are protecting content from being scraped for free. Reddit is perfectly happy to charge for AI access to user-generated content.
ebolapie@lemmy.world 2 weeks ago
No, that’s not what’s happening. They’re preventing scrapers from accessing the content at no charge. They’re totally willing to make deals for access to their content in exchange for money.
sturmblast@lemmy.world 2 weeks ago
Fuck Reddit
FalseTautology@lemmy.zip 2 weeks ago
I am new to Lemmy, is there a fuckreddit sub?
morto@piefed.social 2 weeks ago
In a way, the entire lemmy community is the fuckreddit sub
frongt@lemmy.zip 2 weeks ago
Why would you want to spend more time thinking about a dead site?
FalseTautology@lemmy.zip 2 weeks ago
I just like to laugh at things I dislike. And I also like to see how bad it’s getting. Iwas in the undelete sub and it was amazing.
lka1988@lemmy.dbzer0.com 2 weeks ago
Yes.
Hi welcome to Lemmy, we hate reddit here.
MedicPigBabySaver@lemmy.world 2 weeks ago
Fuck Reddit and Fuck Spez.
kokesh@lemmy.world 2 weeks ago
They can keep their shit for themselves, stopped caring a long time ago.
thisbenzingring@lemmy.sdf.org 2 weeks ago
fucking reddit…
Peculiaris@lemmy.zip 2 weeks ago
In the lieu of an IPO u/spez has actively destroyed everything that made Reddit good! Gate keeping the API thinking it’ll help with making some bigshot LLM some day lol
Cornpop@lemmy.world 2 weeks ago
Time to just ignore them and scrape it anyways
adespoton@lemmy.ca 2 weeks ago
OK, I stopped posting on Reddit but left my account and comments in place because I considered them part of the public record. If Reddit is taking that record private, it’s time for me to start removing my content from the platform.
Does anyone know if historical Reddit content will remain in IA? If not, I’m going to have to back up years of content somewhere else.
MehBlah@lemmy.world 2 weeks ago
When reddit has mutated a few more times. They start erasing stuff themselves. It will be lost to time and that fills me with hope.
Eh_I@lemmy.world 2 weeks ago
Fuck Spez
MonkderVierte@lemmy.zip 2 weeks ago
The company limited search crawlers to google, why are you surprised?
buddascrayon@lemmy.world 2 weeks ago
Not that reddit isn’t hot garbage right now, and has been for a while actually, but there’s a lot of people here who have glazed over the reason why reddit instituted this policy.
AI companies are scraping the way back machine. This is something that should concern all of us.
NigelFrobisher@aussie.zone 2 weeks ago
Is that even possible?
RustyShackleford@literature.cafe 2 weeks ago
Since spez dislikes this picture
thisbenzingring@lemmy.sdf.org 2 weeks ago
lol i think that might be the worst/best thing I have seen in a long time
rhythmisaprancer@piefed.social 2 weeks ago
Unrelated but is your username a play on benzene?
lka1988@lemmy.dbzer0.com 2 weeks ago
Image
Lawnman23@lemmy.world 2 weeks ago
fuck spez
YiddishMcSquidish@lemmy.today 2 weeks ago
Cuck boy getting pegged by post top op Garfield is definitely not something I had jotted down in my day-at-a-glance.
phutatorius@lemmy.zip 2 weeks ago
I would have at least expected him to ask Spez to put some lasagna on his bumhole as lube.
mesamunefire@piefed.social 2 weeks ago
Art.
finix_the_psyker@sopuli.xyz 2 weeks ago
What a terrible day to have eyes.