I’m not entirely sure, but if you look here github.com/TecharoHQ/anubis/tree/main/data/bots
They have separate configs for each bot. github.com/TecharoHQ/anubis/…/botPolicies.json
Comment on Anubis is awesome! Stopping (AI)crawlbots
danielquinn@lemmy.ca 2 months ago
I’ve been thinking about setting up Anubis to protect my blog from AI scrapers, but I’m not clear on whether this would also block search engines. It would, wouldn’t it?
sailorzoop@lemmy.librebun.com 2 months ago
RedBauble@sh.itjust.works 2 months ago
You can setup the policies to allow search engines through, the default policy linked in the docs does that
danielquinn@lemmy.ca 2 months ago
This all appears to be based on the user agent, so wouldn’t that mean that bad-faith scrapers could just declare themselves to be typical search engine user agent?
SheeEttin@lemmy.zip 2 months ago
Yes. There’s no real way to differentiate.
SorteKanin@feddit.dk 2 months ago
Actually I think most search engine bots publish a list of verified IP addresses where they crawl from, so you could check the IP of a search bot against that to know.
SorteKanin@feddit.dk 2 months ago
Most search engine bots publish a list of verified IP addresses where they crawl from, so you could check the IP of a search bot against that to know.