I hate that these bots ruin my read it later app. :(
Perplexity AI is complaining their plagiarism bot machine cannot bypass Cloudflare's firewall
Submitted 2 months ago by Davriellelouna@lemmy.world to technology@lemmy.world
Comments
fossilesque@mander.xyz 2 months ago
drmoose@lemmy.world 2 months ago
It’s insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.
Cloudflare is the biggest cancer on the web, fucking burn it.
Dremor@lemmy.world 2 months ago
Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.
Something’s muat be wrong with your setup.
drmoose@lemmy.world 2 months ago
Thats not how it works. Cf uses thousands of variables to estimate a trust score and block people so just because it works for you doesn’t mean it works.
COASTER1921@lemmy.ml 2 months ago
I suspect a lot of it comes down to your ISP. Like the original commentor I also frequently can’t pass CloudFlare turnstile when on Wifi, although refreshing the page a few times usually gets me through. Worst case on my phone’s hotspot I can much more consistently pass. It’s super annoying and combined with their recent DNS outage has totally ruined any respect I had for CloudFlare.
Interesting video on the subject: youtu.be/SasXJwyKkMI
dodos@lemmy.world 2 months ago
I’m on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
jaemo@sh.itjust.works 2 months ago
Thirded. All three (Linux, FF, nexus)
ZERO ISSUES.
drmoose@lemmy.world 2 months ago
“Wrong with my setup” - thats not how internet works.
I’m based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.
Either way this should be no way acceptible.
Yeller_king@reddthat.com 2 months ago
In my case, it’s usually the VPN.
CatDogL0ver@lemmy.world 2 months ago
It happened to me before until I did a Google search. It was my VPN web protection. It was too " over protective".
Check your security settings, antivirus and VPN
baronofclubs@lemmy.world 2 months ago
omg ur a hacker
Did you mean Edge on Windows? 'Cause if so, welcome in!
poopkins@lemmy.world 2 months ago
I’ve developed my own agent for assisting me with researching a topic I’m passionate about, and I ran into the exact same barrier: Cloudflare intercepts my request and is clearly checking if I’m a human using a web browser.
So I use that as a signal that the website doesn’t want automated tools scraping their data. That’s fine with me: my agent just tells me that there might be interesting content on the site and gives me a deep link. I can extract the data and carry on my research on my own.
IphtashuFitz@lemmy.world 2 months ago
I hate to break it to you but not only does Cloudflare do this sort of thing, but so does Akamai, AWS, and virtually every other CDN provider out there. And far from being awful, it’s actually protecting the web.
We use Akamai where I work, and they inform us in real time when a request comes from a bot, and they further classify it as one of a dozen or so bots (search engine crawlers, analytics bots, advertising bots, social networks, AI bots, etc). It also informs us if it’s somebody impersonating a well known bot like Google, etc. So we can easily allow search engines to crawl our site while blocking AI bots, bots impersonating Google, and so on.
poopkins@lemmy.world 2 months ago
What I meant with “things like this are awful for the web,” I meant that automation through AI is awful for the web. It takes away from the original content creators without any attribution and hits their bottom line.
My story was supposed to be one about responsible AI, but somehow I screwed that up in my summary.
ordnance_qf_17_pounder@reddthat.com 2 months ago
EncryptKeeper@lemmy.world 2 months ago
I can’t get over their CEO that looks like a nine year old. Not sure what it is about him
DarrinBrunner@lemmy.world 2 months ago
I think he grew the beard to look older, but then he put on weight, and let his hair get longer. The choice of glasses style isn’t helping either. He’s not a bad looking guy, he’s just made a string of poor choices, I think.
Darkenfolk@sh.itjust.works 2 months ago
I think it’s the beard, it makes his cheeks look puffed up a bit. His whole expression kinda looks like a grouchy toddler.
interdimensionalmeme@lemmy.ml 2 months ago
Just buy cloudflare duh
_cryptagion@lemmy.dbzer0.com 2 months ago
The anti-AI shield and bot-fight mode are free, you don’t need to pay anything to use them.
interdimensionalmeme@lemmy.ml 2 months ago
No I’m telling Perplexity, they can just buy their obstacle
People who use the things you have described, for free are themselves the products being sold
this is implied in the price
kokesh@lemmy.world 2 months ago
Is there some simply deployable PHP honeytrap for AI crawlers?
blargh513@sh.itjust.works 2 months ago
Used to make tarpits with reverse proxies. Accept the connection and then set the responses for a few seconds before default TCP timeout. Doesn’t eat much resource as long as you have enough TCP connections and can reuse them effectively.
ubergeek@lemmy.today 2 months ago
You could probably route all requests to your site from them, back at themselves, so they DDoS themselves, and on top off it, cost them more because their endpoint needs to process things via their LLM.
dzajew@piefed.social 2 months ago
Cry me a river
Jimmycrackcrack@lemmy.ml 2 months ago
Gee that’s a real removed it ain’t it perplexity?
Electricd@lemmybefree.net 2 months ago
They do have a point though. I would be great to let per-prompt searches go through, but not mass scrapping
threeganzi@sh.itjust.works 2 months ago
Does it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?
Electricd@lemmybefree.net 2 months ago
I assume their script does some search engine stuff like query google or bing and then “scrap” the links they go on
Some selenium stuff
josefo@leminal.space 2 months ago
I really hope Cloudflare doesn’t eventually evolve into a shitty ass company, so far I like them very much, and all this massive L for AI only improves my opinion on them.
starchylemming@lemmy.world 2 months ago
next step: cloudflare sends hit squads to blow up the source of these slimy data grabber attacks
xxce2AAb@feddit.dk 2 months ago
Ooh, that’s though sweetheart. If the owners of those servers want you to visit, they’ll just choose another WAF than CF’s.
All zero of them.
tarknassus@lemmy.world 2 months ago
I don’t see a problem here. Maybe Perplexity should consider the reasons WHY Cloudflare have a firewall…?
Wispy2891@lemmy.world 2 months ago
Here comes the ridiculous offer to buy Google chrome with money they don’t have: east scraping directly from the user source