Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

Perplexity AI is complaining their plagiarism bot machine cannot bypass Cloudflare's firewall

⁨876⁩ ⁨likes⁩

Submitted ⁨⁨3⁩ ⁨weeks⁩ ago⁩ by ⁨Davriellelouna@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://www.searchenginejournal.com/perplexity-says-cloudflare-is-blocking-legitimate-ai-assistants/552927/

source

Comments

Sort:hotnewtop
  • Wispy2891@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    Here comes the ridiculous offer to buy Google chrome with money they don’t have: east scraping directly from the user source

    source
  • fossilesque@mander.xyz ⁨3⁩ ⁨weeks⁩ ago

    I hate that these bots ruin my read it later app. :(

    source
  • drmoose@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    It’s insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.

    Cloudflare is the biggest cancer on the web, fucking burn it.

    source
    • Dremor@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.

      Something’s muat be wrong with your setup.

      source
      • drmoose@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

        Thats not how it works. Cf uses thousands of variables to estimate a trust score and block people so just because it works for you doesn’t mean it works.

        source
        • -> View More Comments
      • COASTER1921@lemmy.ml ⁨2⁩ ⁨weeks⁩ ago

        I suspect a lot of it comes down to your ISP. Like the original commentor I also frequently can’t pass CloudFlare turnstile when on Wifi, although refreshing the page a few times usually gets me through. Worst case on my phone’s hotspot I can much more consistently pass. It’s super annoying and combined with their recent DNS outage has totally ruined any respect I had for CloudFlare.

        Interesting video on the subject: youtu.be/SasXJwyKkMI

        source
    • dodos@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      I’m on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.

      source
      • jaemo@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

        Thirded. All three (Linux, FF, nexus)

        ZERO ISSUES.

        source
      • drmoose@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

        “Wrong with my setup” - thats not how internet works.

        I’m based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.

        Either way this should be no way acceptible.

        source
        • -> View More Comments
      • Yeller_king@reddthat.com ⁨2⁩ ⁨weeks⁩ ago

        In my case, it’s usually the VPN.

        source
    • CatDogL0ver@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      It happened to me before until I did a Google search. It was my VPN web protection. It was too " over protective".

      Check your security settings, antivirus and VPN

      source
    • baronofclubs@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      omg ur a hacker

      Did you mean Edge on Windows? 'Cause if so, welcome in!

      source
  • poopkins@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    I’ve developed my own agent for assisting me with researching a topic I’m passionate about, and I ran into the exact same barrier: Cloudflare intercepts my request and is clearly checking if I’m a human using a web browser.

    So I use that as a signal that the website doesn’t want automated tools scraping their data. That’s fine with me: my agent just tells me that there might be interesting content on the site and gives me a deep link. I can extract the data and carry on my research on my own.

    source
    • IphtashuFitz@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

      I hate to break it to you but not only does Cloudflare do this sort of thing, but so does Akamai, AWS, and virtually every other CDN provider out there. And far from being awful, it’s actually protecting the web.

      We use Akamai where I work, and they inform us in real time when a request comes from a bot, and they further classify it as one of a dozen or so bots (search engine crawlers, analytics bots, advertising bots, social networks, AI bots, etc). It also informs us if it’s somebody impersonating a well known bot like Google, etc. So we can easily allow search engines to crawl our site while blocking AI bots, bots impersonating Google, and so on.

      source
      • poopkins@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

        What I meant with “things like this are awful for the web,” I meant that automation through AI is awful for the web. It takes away from the original content creators without any attribution and hits their bottom line.

        My story was supposed to be one about responsible AI, but somehow I screwed that up in my summary.

        source
  • ordnance_qf_17_pounder@reddthat.com ⁨3⁩ ⁨weeks⁩ ago

    Oh no!

    Image

    source
  • EncryptKeeper@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

    I can’t get over their CEO that looks like a nine year old. Not sure what it is about him

    source
    • DarrinBrunner@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

      I think he grew the beard to look older, but then he put on weight, and let his hair get longer. The choice of glasses style isn’t helping either. He’s not a bad looking guy, he’s just made a string of poor choices, I think.

      source
    • Darkenfolk@sh.itjust.works ⁨3⁩ ⁨weeks⁩ ago

      I think it’s the beard, it makes his cheeks look puffed up a bit. His whole expression kinda looks like a grouchy toddler.

      source
  • interdimensionalmeme@lemmy.ml ⁨3⁩ ⁨weeks⁩ ago

    Just buy cloudflare duh

    source
    • _cryptagion@lemmy.dbzer0.com ⁨3⁩ ⁨weeks⁩ ago

      The anti-AI shield and bot-fight mode are free, you don’t need to pay anything to use them.

      source
      • interdimensionalmeme@lemmy.ml ⁨3⁩ ⁨weeks⁩ ago

        No I’m telling Perplexity, they can just buy their obstacle

        People who use the things you have described, for free are themselves the products being sold
        this is implied in the price

        source
        • -> View More Comments
  • kokesh@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

    Is there some simply deployable PHP honeytrap for AI crawlers?

    source
    • blargh513@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

      Used to make tarpits with reverse proxies. Accept the connection and then set the responses for a few seconds before default TCP timeout. Doesn’t eat much resource as long as you have enough TCP connections and can reuse them effectively.

      source
    • ubergeek@lemmy.today ⁨2⁩ ⁨weeks⁩ ago

      You could probably route all requests to your site from them, back at themselves, so they DDoS themselves, and on top off it, cost them more because their endpoint needs to process things via their LLM.

      source
  • dzajew@piefed.social ⁨3⁩ ⁨weeks⁩ ago

    Cry me a river

    source
  • Jimmycrackcrack@lemmy.ml ⁨2⁩ ⁨weeks⁩ ago

    Gee that’s a real removed it ain’t it perplexity?

    source
  • Electricd@lemmybefree.net ⁨2⁩ ⁨weeks⁩ ago

    They do have a point though. I would be great to let per-prompt searches go through, but not mass scrapping

    source
    • threeganzi@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago

      Does it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?

      source
      • Electricd@lemmybefree.net ⁨2⁩ ⁨weeks⁩ ago

        I assume their script does some search engine stuff like query google or bing and then “scrap” the links they go on

        Some selenium stuff

        source
  • josefo@leminal.space ⁨3⁩ ⁨weeks⁩ ago

    I really hope Cloudflare doesn’t eventually evolve into a shitty ass company, so far I like them very much, and all this massive L for AI only improves my opinion on them.

    source
  • starchylemming@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

    next step: cloudflare sends hit squads to blow up the source of these slimy data grabber attacks

    source
  • xxce2AAb@feddit.dk ⁨3⁩ ⁨weeks⁩ ago

    Ooh, that’s though sweetheart. If the owners of those servers want you to visit, they’ll just choose another WAF than CF’s.

    All zero of them.

    source
  • tarknassus@lemmy.world ⁨2⁩ ⁨weeks⁩ ago

    I don’t see a problem here. Maybe Perplexity should consider the reasons WHY Cloudflare have a firewall…?

    source