Comment on What steps can be taken to prevent AI training and scraping of my public facing website?

<- View Parent
fuckwit_mcbumcrumble@lemmy.dbzer0.com ⁨3⁩ ⁨days⁩ ago

How well does Anubis actually work though? I have no issues with getting past it using puppeteer. But I’m also just dicking around at home not crawling an entire website.

Cloudflare for sure doesn’t work very well at blocking puppeteer or anything that runs a full browser. It’ll stop things that only rip the raw web page, but if you’re running JS and even halfway trying it’s not an issue to get past. And let’s be real. Do you want a crawler ripping 300k of text, or 400MB of page + images + videos + whatever other unnecessary garbage are on modern web pages?

source
Sort:hotnewtop