Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

Reddit will block the Internet Archive

⁨645⁩ ⁨likes⁩

Submitted ⁨⁨17⁩ ⁨hours⁩ ago⁩ by ⁨General_Effort@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://www.theverge.com/news/757538/reddit-internet-archive-wayback-machine-block-limit

source

Comments

Sort:hotnewtop
  • kokesh@lemmy.world ⁨1⁩ ⁨hour⁩ ago

    They can keep their shit for themselves, stopped caring a long time ago.

    source
  • MehBlah@lemmy.world ⁨1⁩ ⁨hour⁩ ago

    When reddit has mutated a few more times. They start erasing stuff themselves. It will be lost to time and that fills me with hope.

    source
  • phantomwise@lemmy.ml ⁨1⁩ ⁨hour⁩ ago

    Nice of them to protect their (users’) content from AI scrapping. So that they can charge AI companies for it instead.

    source
    • muusemuuse@sh.itjust.works ⁨51⁩ ⁨minutes⁩ ago

      They aren’t doing that. They are protecting content from being scraped for free. Reddit is perfectly happy to charge for AI access to user-generated content.

      source
  • conorab@lemmy.conorab.com ⁨12⁩ ⁨hours⁩ ago

    As somebody who often ends up using Reddit like Stackoverflow and in some cases needing the Internet Archive (IA) to find the original post after it’s been deleted or garbled, I think this is a wakeup call for those go to Reddit both to get technical help and to post it. More than ever, Reddit is becoming an unreliable place to find answers for old obscure issues and if they are going to lockout places like the IA then I think it’s time people stopped contributing their solutions to Reddit.

    source
    • cashsky@sh.itjust.works ⁨9⁩ ⁨hours⁩ ago

      Searching anywhere in general is getting shittier and shittier by day. Web searches are riddled with hallucinated AI generated garbage pages. Finding the right answer for difficult problems is getting worse and worse. We are sliding rapidly into Idiocracy.

      source
      • dizzy@lemmy.ml ⁨5⁩ ⁨hours⁩ ago

        Not to mention so many projects putting their support in walled garden chat services like Discord that you can’t even search via search engine. Even if you can figure out who asked the right question and when, you have to trawl through a sea of inane garbled chat to get to the developer/expert response.

        Specialised topic forums really need to make a resurgence but I doubt they will.

        source
      • baggachipz@sh.itjust.works ⁨3⁩ ⁨hours⁩ ago

        We are sliding rapidly into Idiocracy.

        Buddy, we are already there. “Ow, my balls!” Would be high-brow tv these days.

        source
    • mojofrododojo@lemmy.world ⁨8⁩ ⁨hours⁩ ago

      yup. continuing to feed them traffic after their repeated attacks on the userbase is just sad. stop using them. yeah it sucks the info is gone, but acting like they’ll wake up and change is absurd.

      source
    • NauticalNoodle@lemmy.ml ⁨9⁩ ⁨hours⁩ ago

      When I joined Lemmy I decided it was unwise to trust anything on Reddit less than a year old. Now it’s anything under two years old.

      source
    • Sxan@piefed.zip ⁨5⁩ ⁨hours⁩ ago

      Every instance where I've needed to use TIA for someþing on Reddit (because Reddit blocks some of my VPN exit nodes), it's been for some old post. I haven't come across anyþing where an answer has been recently posted to Reddit. Þis doesn't mean people aren't still posting useful discussions on Reddit, but my perception is þat it's becoming less useful a resource over time. Maybe because þe knowledgeable people have mostly migrated off?

      Ofttimes what I've looked up in TIA for Reddit was already cached. Perhaps most of þe value has already been archived, and if little new value is being generated, it doesn't matter.

      Þe upshot is, I'm not sure how much effect þis will actually have.

      source
      • mrgoosmoos@lemmy.ca ⁨3⁩ ⁨hours⁩ ago

        exact same here. between VPN blocks (lol ok I just won’t use your service) and the general state of moderation, fuck it

        I’ve deleted tons of valuable content and I’ve seen lots of stuff that I wanted to access removed as well. it’s annoying, but oh well. other forums will remain

        source
        • -> View More Comments
    • mazzilius_marsti@lemmy.world ⁨7⁩ ⁨hours⁩ ago

      most of my technical questions about Linux are not even answered lol. So difficult to get good answers on reddit.

      source
  • RustyShackleford@literature.cafe ⁨16⁩ ⁨hours⁩ ago

    Since spez dislikes this picture

    source
    • thisbenzingring@lemmy.sdf.org ⁨16⁩ ⁨hours⁩ ago

      lol i think that might be the worst/best thing I have seen in a long time

      source
      • rhythmisaprancer@piefed.social ⁨12⁩ ⁨hours⁩ ago

        Unrelated but is your username a play on benzene?

        source
        • -> View More Comments
    • finix_the_psyker@sopuli.xyz ⁨6⁩ ⁨hours⁩ ago

      What a terrible day to have eyes.

      source
    • lka1988@lemmy.dbzer0.com ⁨15⁩ ⁨hours⁩ ago

      Image

      source
    • YiddishMcSquidish@lemmy.today ⁨13⁩ ⁨hours⁩ ago

      Cuck boy getting pegged by post top op Garfield is definitely not something I had jotted down in my day-at-a-glance.

      source
      • phutatorius@lemmy.zip ⁨7⁩ ⁨hours⁩ ago

        I would have at least expected him to ask Spez to put some lasagna on his bumhole as lube.

        source
    • Lawnman23@lemmy.world ⁨15⁩ ⁨hours⁩ ago

      fuck spez

      source
    • mesamunefire@piefed.social ⁨15⁩ ⁨hours⁩ ago

      Art.

      source
  • MedicPigBabySaver@lemmy.world ⁨4⁩ ⁨hours⁩ ago

    Fuck Reddit and Fuck Spez.

    source
  • tal@lemmy.today ⁨17⁩ ⁨hours⁩ ago

    Given that the Internet Archive is the de facto standard way to cite material as seen on a given date — they’re a trustworthy party that will probably persist for a long time — that’s going to make it harder to cite content on Reddit.

    source
    • Deceptichum@quokk.au ⁨13⁩ ⁨hours⁩ ago

      Damn, guess if you want reddit data to train your AI that you’ll need to pay Spez for access.

      source
      • tal@lemmy.today ⁨10⁩ ⁨hours⁩ ago

        It’s important for people writing papers and such who need to cite material.

        I wonder if there’s some way to use the TLS certificate to bootstrap a cryptographically-signed copy of a webpage with timestamp that someone could later validate as having been downloaded on that date. I don’t know if existing TLS libraries are capable of that. Like, Web browser menu option “Store cryptographically-signed webpage”. Absent a later certificate compromise, I’d think that that’d at least provide people a way to credibly say “this is really what was on that webpage on August 15th, 2026”.

        source
        • -> View More Comments
      • misteloct@lemmy.dbzer0.com ⁨7⁩ ⁨hours⁩ ago

        Don’t forget, Reddit is legally allowed to train on your content, but not the other way around. It’s consistent with US law, where corporate tax is half of income tax.

        source
  • Blackmist@feddit.uk ⁨9⁩ ⁨hours⁩ ago

    It’s another move to protect against AI scraping that isn’t paying them for access.

    source
  • NigelFrobisher@aussie.zone ⁨3⁩ ⁨hours⁩ ago

    Is that even possible?

    source
    • General_Effort@lemmy.world ⁨2⁩ ⁨hours⁩ ago

      Technologically no. Reddit sends out the data to 10s of millions of users as part of their normal operations. They need to try to block those who collect that data for the IA. Reddit has the very short end of the stick.

      The problem is that evading such counter-measures may be criminal in the US. Obviously, EU laws are much harsher.

      source
  • forkDestroyer@infosec.pub ⁨3⁩ ⁨hours⁩ ago

    AI can scrape books and journals for info, but can’t scrape Reddit?

    source
    • General_Effort@lemmy.world ⁨2⁩ ⁨hours⁩ ago

      Reddit can be scraped just as much as online books and journals.

      source
    • hunnybubny@discuss.tchncs.de ⁨2⁩ ⁨hours⁩ ago

      Yes. Rules for thee.

      source
  • ozoned@piefed.social ⁨4⁩ ⁨hours⁩ ago

    Good plan. Keep locking down your big tech platforms, and we'll all be over here letting folks know where they can find freedom.

    source
    • aquovie@lemmy.cafe ⁨1⁩ ⁨hour⁩ ago

      Careful. Lemmy is too small to draw the attention of sophisticated, persistent abuse. As a company, Reddit has struggled with revenue and we’ve all seen those struggles quite publicly. Lemmy instances with those same challenges would probably just fold and close up.

      Federated networks give you freedom but the potential for abuse is proportional to that freedom while at the same time, federation is far more expensive taken as a whole.

      source
    • yarr@feddit.nl ⁨2⁩ ⁨hours⁩ ago

      Or… let them stay on Reddit. I like lemmy much better, and it’s possibly due to the people that are not present and the lack of commercial interest.

      source
      • ozoned@piefed.social ⁨1⁩ ⁨hour⁩ ago

        No harm in that. To each their own. :-) Everyone gets to decide at least.

        source
  • Keyboard@lemmy.world ⁨12⁩ ⁨hours⁩ ago

    I already gave up from Reddit long time ago. Deleted all

    source
    • Truscape@lemmy.blahaj.zone ⁨7⁩ ⁨hours⁩ ago

      When RIF died, Voyager became the new forum app for me.

      source
      • boonhet@sopuli.xyz ⁨7⁩ ⁨hours⁩ ago

        Apollo and Voyager for me so I straight-up retained the same UI.

        source
      • Keyboard@lemmy.world ⁨3⁩ ⁨hours⁩ ago

        Maybe I should try voyager too

        source
    • jjlinux@lemmy.zip ⁨11⁩ ⁨hours⁩ ago

      Yup, same here.

      source
      • mojofrododojo@lemmy.world ⁨8⁩ ⁨hours⁩ ago

        this is the way.

        source
  • BD89@lemmy.sdf.org ⁨3⁩ ⁨hours⁩ ago

    And I will block reddit.

    source
  • Jhex@lemmy.world ⁨2⁩ ⁨hours⁩ ago

    what’s a reddit?

    source
  • MonkderVierte@lemmy.zip ⁨5⁩ ⁨hours⁩ ago

    The company limited search crawlers to google, why are you surprised?

    source
  • DFX4509B_2@lemmy.org ⁨14⁩ ⁨hours⁩ ago

    Just more vindication for my ditching that trash heap of a platform.

    source
    • Someonelol@lemmy.dbzer0.com ⁨13⁩ ⁨hours⁩ ago

      YouTube’s already throttling users in their mobile site. They have these massive channel cards in their feeds and the video titles/thumbnails disappear after a few offerings, leaving you with the ability to blindly click on a video.

      source
      • DFX4509B_2@lemmy.org ⁨11⁩ ⁨hours⁩ ago

        I’ve declared my YT channel to be dormant starting on the 13th due to this AI age-gating crap.

        source
        • -> View More Comments
    • wanchutri@jlai.lu ⁨9⁩ ⁨hours⁩ ago

      Time to use peertube

      source
      • DFX4509B_2@lemmy.org ⁨8⁩ ⁨hours⁩ ago

        And Invidious while that’s still an option, but I have both a PeerTube and Odysee set up already.

        source
  • JakenVeina@midwest.social ⁨12⁩ ⁨hours⁩ ago

    The company says that AI companies have scraped data from the Wayback Machine, so it’s going to limit what the Wayback Machine can access.

    Yeah, wouldn’t want those AI companies to get all that data for free. Gotta make 'em pay for it.

    source
    • brygphilomena@lemmy.dbzer0.com ⁨3⁩ ⁨hours⁩ ago

      Instead of regulating tech, they are going the fuck over everyone route.

      source
  • captainastronaut@seattlelunarsociety.org ⁨17⁩ ⁨hours⁩ ago

    As long as the previous collections of archives are still intact. We probably don’t need all of their new spam posts in the wayback machine anyway

    source
    • hamFoilHat@lemmy.world ⁨16⁩ ⁨hours⁩ ago

      It is my understanding that if you block the wayback machine from indexing your site it will also delist the history as well.

      source
      • Jason2357@lemmy.ca ⁨15⁩ ⁨hours⁩ ago

        They do archive sites against the owners wishes when they consider it an important site for public archiving, like some news sites. They are in no obligation to delete the archives and hope they don’t.

        source
        • -> View More Comments
      • Natanael@infosec.pub ⁨11⁩ ⁨hours⁩ ago

        The ability to block crawling is separate from the ability to delist old pages. The latter usually happens after domains change owners

        source
    • Sxan@piefed.zip ⁨5⁩ ⁨hours⁩ ago

      LOL I should have scrolled down first You said what I said, with fewer words, first.

      source
  • HexesofVexes@lemmy.world ⁨15⁩ ⁨hours⁩ ago

    Oh no, someone might not be paying them for their user generated content (!)

    To be fair, it’s probably best that history forgets this period of the web…

    source
    • ulterno@programming.dev ⁨15⁩ ⁨hours⁩ ago

      that history forgets this period

      and thus it repeats

      source
      • WhyJiffie@sh.itjust.works ⁨50⁩ ⁨minutes⁩ ago

        don’t forget, we easily repeat what we “learned” anyway

        source
  • MadMadBunny@lemmy.ca ⁨17⁩ ⁨hours⁩ ago

    Damn you Spez.

    source
  • FalseTautology@lemmy.zip ⁨15⁩ ⁨hours⁩ ago

    I am new to Lemmy, is there a fuckreddit sub?

    source
    • morto@piefed.social ⁨14⁩ ⁨hours⁩ ago

      In a way, the entire lemmy community is the fuckreddit sub

      source
    • frongt@lemmy.zip ⁨15⁩ ⁨hours⁩ ago

      Why would you want to spend more time thinking about a dead site?

      source
      • FalseTautology@lemmy.zip ⁨14⁩ ⁨hours⁩ ago

        I just like to laugh at things I dislike. And I also like to see how bad it’s getting. Iwas in the undelete sub and it was amazing.

        source
    • lka1988@lemmy.dbzer0.com ⁨15⁩ ⁨hours⁩ ago

      Yes.

      Hi welcome to Lemmy, we hate reddit here.

      source
    • simplejack@lemmy.world ⁨15⁩ ⁨hours⁩ ago

      !reddit@lemmy.world

      source
    • Auth@lemmy.world ⁨15⁩ ⁨hours⁩ ago

      lemmyverse.net

      This is a great site to search for communities. Doesnt seem like there is one.

      source
  • Cornpop@lemmy.world ⁨14⁩ ⁨hours⁩ ago

    Time to just ignore them and scrape it anyways

    source
  • thisbenzingring@lemmy.sdf.org ⁨16⁩ ⁨hours⁩ ago

    fucking reddit…

    source
  • adespoton@lemmy.ca ⁨14⁩ ⁨hours⁩ ago

    OK, I stopped posting on Reddit but left my account and comments in place because I considered them part of the public record. If Reddit is taking that record private, it’s time for me to start removing my content from the platform.

    Does anyone know if historical Reddit content will remain in IA? If not, I’m going to have to back up years of content somewhere else.

    source
  • bathing_in_bismuth@sh.itjust.works ⁨5⁩ ⁨hours⁩ ago

    That means big news is coming, and the media doesn’t want to fuck up the reporting that is comming. Reddit preparing for mass submission of articles

    source