I’m not entirely sure, but if you look here github.com/TecharoHQ/anubis/tree/main/data/bots
They have separate configs for each bot. github.com/TecharoHQ/anubis/…/botPolicies.json
Comment on Anubis is awesome! Stopping (AI)crawlbots
danielquinn@lemmy.ca 3 weeks ago
I’ve been thinking about setting up Anubis to protect my blog from AI scrapers, but I’m not clear on whether this would also block search engines. It would, wouldn’t it?
sailorzoop@lemmy.librebun.com 3 weeks ago
RedBauble@sh.itjust.works 3 weeks ago
You can setup the policies to allow search engines through, the default policy linked in the docs does that
danielquinn@lemmy.ca 3 weeks ago
This all appears to be based on the user agent, so wouldn’t that mean that bad-faith scrapers could just declare themselves to be typical search engine user agent?
SheeEttin@lemmy.zip 3 weeks ago
Yes. There’s no real way to differentiate.
SorteKanin@feddit.dk 3 weeks ago
Actually I think most search engine bots publish a list of verified IP addresses where they crawl from, so you could check the IP of a search bot against that to know.
SorteKanin@feddit.dk 3 weeks ago
Most search engine bots publish a list of verified IP addresses where they crawl from, so you could check the IP of a search bot against that to know.