Comment on Perplexity AI is complaining their plagiarism bot machine cannot bypass Cloudflare's firewall
FauxLiving@lemmy.world 3 weeks agoI, too, can make any argument sound silly if I want to argue in bad faith.
A user cannot physically generate as much traffic as a bot.
Just like a glass of water cannot possibly contain as much water as a swimming pool, so pretending the two are equal is ignorant in both cases.
spankmonkey@lemmy.world 3 weeks ago
You are so close to getting it!
FauxLiving@lemmy.world 3 weeks ago
And you’re not even close.
spankmonkey@lemmy.world 3 weeks ago
The AI doesn’t just do a web search and display a page, in grabs the search results and scrapes multiple pages far faster than a person could, which is bot behavior.
It doesn’t matter whether a human initiated it when the load on the website is far, far higher and more intrusive in a shorter period of time with AI compared to a human doing a web search and reading the cobtent themselves.
FauxLiving@lemmy.world 3 weeks ago
It creates web requests faster than a human could. It does not create web requests as fast as possible like a crawler does.
Websites can handle a lot of human user traffic, even if some human users are making 5x the requests of other users due to using automation tools (like LLM summarization).
A website cannot handle a single bot which can, by itself, can generate tens of millions of times as much traffic as a human.
Cloudflare’s method of detecting bots is to attempt to fingerprint the browser and user behavior to detect automations which are usually run in environments that can’t render the content. They did this because, until now, users did not use automation tools so detecting and blocking automation tools was a way to get most of the bots.
Now, users do use automation tools and so this method of classification is dated and misclassifying human generated traffic.