Comment

Comment on Anubis is awesome! Stopping (AI)crawlbots

I’ve been thinking about setting up Anubis to protect my blog from AI scrapers, but I’m not clear on whether this would also block search engines. It would, wouldn’t it?

source

Sort:hotnew top

RedBauble@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
You can setup the policies to allow search engines through, the default policy linked in the docs does that

source
- danielquinn@lemmy.ca ⁨4⁩ ⁨months⁩ ago
  This all appears to be based on the user agent, so wouldn’t that mean that bad-faith scrapers could just declare themselves to be typical search engine user agent?
  
  source
  - SheeEttin@lemmy.zip ⁨4⁩ ⁨months⁩ ago
    Yes. There’s no real way to differentiate.
    
    source
    SorteKanin@feddit.dk ⁨4⁩ ⁨months⁩ ago
    Actually I think most search engine bots publish a list of verified IP addresses where they crawl from, so you could check the IP of a search bot against that to know.
    
    source
  - SorteKanin@feddit.dk ⁨4⁩ ⁨months⁩ ago
    Most search engine bots publish a list of verified IP addresses where they crawl from, so you could check the IP of a search bot against that to know.
    
    source
sailorzoop@lemmy.librebun.com ⁨4⁩ ⁨months⁩ ago
I’m not entirely sure, but if you look here github.com/TecharoHQ/anubis/tree/main/data/bots
They have separate configs for each bot. github.com/TecharoHQ/anubis/…/botPolicies.json

source