I hate that these bots ruin my read it later app. :(
Perplexity AI is complaining their plagiarism bot machine cannot bypass Cloudflare's firewall
Submitted 3 weeks ago by Davriellelouna@lemmy.world to technology@lemmy.world
Comments
fossilesque@mander.xyz 3 weeks ago
drmoose@lemmy.world 2 weeks ago
It’s insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.
Cloudflare is the biggest cancer on the web, fucking burn it.
Dremor@lemmy.world 2 weeks ago
Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.
Something’s muat be wrong with your setup.
drmoose@lemmy.world 2 weeks ago
Thats not how it works. Cf uses thousands of variables to estimate a trust score and block people so just because it works for you doesn’t mean it works.
COASTER1921@lemmy.ml 2 weeks ago
I suspect a lot of it comes down to your ISP. Like the original commentor I also frequently can’t pass CloudFlare turnstile when on Wifi, although refreshing the page a few times usually gets me through. Worst case on my phone’s hotspot I can much more consistently pass. It’s super annoying and combined with their recent DNS outage has totally ruined any respect I had for CloudFlare.
Interesting video on the subject: youtu.be/SasXJwyKkMI
dodos@lemmy.world 2 weeks ago
I’m on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
jaemo@sh.itjust.works 2 weeks ago
Thirded. All three (Linux, FF, nexus)
ZERO ISSUES.
drmoose@lemmy.world 2 weeks ago
“Wrong with my setup” - thats not how internet works.
I’m based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.
Either way this should be no way acceptible.
Yeller_king@reddthat.com 2 weeks ago
In my case, it’s usually the VPN.
CatDogL0ver@lemmy.world 2 weeks ago
It happened to me before until I did a Google search. It was my VPN web protection. It was too " over protective".
Check your security settings, antivirus and VPN
baronofclubs@lemmy.world 2 weeks ago
omg ur a hacker
Did you mean Edge on Windows? 'Cause if so, welcome in!
poopkins@lemmy.world 2 weeks ago
I’ve developed my own agent for assisting me with researching a topic I’m passionate about, and I ran into the exact same barrier: Cloudflare intercepts my request and is clearly checking if I’m a human using a web browser.
So I use that as a signal that the website doesn’t want automated tools scraping their data. That’s fine with me: my agent just tells me that there might be interesting content on the site and gives me a deep link. I can extract the data and carry on my research on my own.
IphtashuFitz@lemmy.world 2 weeks ago
I hate to break it to you but not only does Cloudflare do this sort of thing, but so does Akamai, AWS, and virtually every other CDN provider out there. And far from being awful, it’s actually protecting the web.
We use Akamai where I work, and they inform us in real time when a request comes from a bot, and they further classify it as one of a dozen or so bots (search engine crawlers, analytics bots, advertising bots, social networks, AI bots, etc). It also informs us if it’s somebody impersonating a well known bot like Google, etc. So we can easily allow search engines to crawl our site while blocking AI bots, bots impersonating Google, and so on.
poopkins@lemmy.world 2 weeks ago
What I meant with “things like this are awful for the web,” I meant that automation through AI is awful for the web. It takes away from the original content creators without any attribution and hits their bottom line.
My story was supposed to be one about responsible AI, but somehow I screwed that up in my summary.
ordnance_qf_17_pounder@reddthat.com 3 weeks ago
EncryptKeeper@lemmy.world 3 weeks ago
I can’t get over their CEO that looks like a nine year old. Not sure what it is about him
DarrinBrunner@lemmy.world 3 weeks ago
I think he grew the beard to look older, but then he put on weight, and let his hair get longer. The choice of glasses style isn’t helping either. He’s not a bad looking guy, he’s just made a string of poor choices, I think.
Darkenfolk@sh.itjust.works 3 weeks ago
I think it’s the beard, it makes his cheeks look puffed up a bit. His whole expression kinda looks like a grouchy toddler.
interdimensionalmeme@lemmy.ml 3 weeks ago
Just buy cloudflare duh
_cryptagion@lemmy.dbzer0.com 3 weeks ago
The anti-AI shield and bot-fight mode are free, you don’t need to pay anything to use them.
interdimensionalmeme@lemmy.ml 3 weeks ago
No I’m telling Perplexity, they can just buy their obstacle
People who use the things you have described, for free are themselves the products being sold
this is implied in the price
kokesh@lemmy.world 3 weeks ago
Is there some simply deployable PHP honeytrap for AI crawlers?
blargh513@sh.itjust.works 2 weeks ago
Used to make tarpits with reverse proxies. Accept the connection and then set the responses for a few seconds before default TCP timeout. Doesn’t eat much resource as long as you have enough TCP connections and can reuse them effectively.
ubergeek@lemmy.today 2 weeks ago
You could probably route all requests to your site from them, back at themselves, so they DDoS themselves, and on top off it, cost them more because their endpoint needs to process things via their LLM.
dzajew@piefed.social 3 weeks ago
Cry me a river
Jimmycrackcrack@lemmy.ml 2 weeks ago
Gee that’s a real removed it ain’t it perplexity?
Electricd@lemmybefree.net 2 weeks ago
They do have a point though. I would be great to let per-prompt searches go through, but not mass scrapping
threeganzi@sh.itjust.works 2 weeks ago
Does it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?
Electricd@lemmybefree.net 2 weeks ago
I assume their script does some search engine stuff like query google or bing and then “scrap” the links they go on
Some selenium stuff
josefo@leminal.space 3 weeks ago
I really hope Cloudflare doesn’t eventually evolve into a shitty ass company, so far I like them very much, and all this massive L for AI only improves my opinion on them.
starchylemming@lemmy.world 3 weeks ago
next step: cloudflare sends hit squads to blow up the source of these slimy data grabber attacks
xxce2AAb@feddit.dk 3 weeks ago
Ooh, that’s though sweetheart. If the owners of those servers want you to visit, they’ll just choose another WAF than CF’s.
All zero of them.
tarknassus@lemmy.world 2 weeks ago
I don’t see a problem here. Maybe Perplexity should consider the reasons WHY Cloudflare have a firewall…?
Wispy2891@lemmy.world 2 weeks ago
Here comes the ridiculous offer to buy Google chrome with money they don’t have: east scraping directly from the user source