I’m not entirely sure, but if you look here github.com/TecharoHQ/anubis/tree/main/data/bots
They have separate configs for each bot. github.com/TecharoHQ/anubis/…/botPolicies.json
Comment on Anubis is awesome! Stopping (AI)crawlbots
danielquinn@lemmy.ca 5 days ago
I’ve been thinking about setting up Anubis to protect my blog from AI scrapers, but I’m not clear on whether this would also block search engines. It would, wouldn’t it?
sailorzoop@lemmy.librebun.com 5 days ago
RedBauble@sh.itjust.works 5 days ago
You can setup the policies to allow search engines through, the default policy linked in the docs does that
danielquinn@lemmy.ca 5 days ago
This all appears to be based on the user agent, so wouldn’t that mean that bad-faith scrapers could just declare themselves to be typical search engine user agent?
SheeEttin@lemmy.zip 5 days ago
Yes. There’s no real way to differentiate.
SorteKanin@feddit.dk 4 days ago
Actually I think most search engine bots publish a list of verified IP addresses where they crawl from, so you could check the IP of a search bot against that to know.
SorteKanin@feddit.dk 4 days ago
Most search engine bots publish a list of verified IP addresses where they crawl from, so you could check the IP of a search bot against that to know.