Comment

Comment on What steps can be taken to prevent AI training and scraping of my public facing website?

fuckwit_mcbumcrumble@lemmy.dbzer0.com ⁨5⁩ ⁨months⁩ ago

How well does Anubis actually work though? I have no issues with getting past it using puppeteer. But I’m also just dicking around at home not crawling an entire website.

Cloudflare for sure doesn’t work very well at blocking puppeteer or anything that runs a full browser. It’ll stop things that only rip the raw web page, but if you’re running JS and even halfway trying it’s not an issue to get past. And let’s be real. Do you want a crawler ripping 300k of text, or 400MB of page + images + videos + whatever other unnecessary garbage are on modern web pages?

source

Sort:hotnew top

Dekkia@this.doesnotcut.it ⁨5⁩ ⁨months⁩ ago
The idea behind anubis is that a browser needs to deliver proof-of-work before accessing a website.

If you’re doing it one-off with puppeteer, your “browser” will happily do just that.

But if you’re scraping millions of websites, short challenges like this add up quickly and you’ll end up wasting lots of compute on them. As long as scrapers decide that those websites are not worth it anubis works.

source
- snoons@lemmy.ca ⁨5⁩ ⁨months⁩ ago
  The only stable invidious instance I know of is now a heck of a lot more stable thanks to it also.
  
  source