I’m not entirely sure, but if you look here github.com/TecharoHQ/anubis/tree/main/data/bots
They have separate configs for each bot. github.com/TecharoHQ/anubis/…/botPolicies.json
Comment on Anubis is awesome! Stopping (AI)crawlbots
danielquinn@lemmy.ca 1 month ago
I’ve been thinking about setting up Anubis to protect my blog from AI scrapers, but I’m not clear on whether this would also block search engines. It would, wouldn’t it?
sailorzoop@lemmy.librebun.com 1 month ago
RedBauble@sh.itjust.works 1 month ago
You can setup the policies to allow search engines through, the default policy linked in the docs does that
danielquinn@lemmy.ca 1 month ago
This all appears to be based on the user agent, so wouldn’t that mean that bad-faith scrapers could just declare themselves to be typical search engine user agent?
SheeEttin@lemmy.zip 1 month ago
Yes. There’s no real way to differentiate.
SorteKanin@feddit.dk 1 month ago
Actually I think most search engine bots publish a list of verified IP addresses where they crawl from, so you could check the IP of a search bot against that to know.
SorteKanin@feddit.dk 1 month ago
Most search engine bots publish a list of verified IP addresses where they crawl from, so you could check the IP of a search bot against that to know.