See your brain went immediately to a solution based on knowing how something works. That’s not in the AI wheelhouse.
Comment on Anubis is awesome! Stopping (AI)crawlbots
Mora@pawb.social 3 days ago
Besides that point: why tf do they even crawl lemmy. They could just as well create a “read only” instance with an accoubt that subsribes to all communities … and the other instances would send their data. Oh, right, AI has to be as unethical as possible for most companies for some reason.
ZombiFrancis@sh.itjust.works 3 days ago
dan@upvote.au 3 days ago
They’re likely not intentionally crawling Lemmy. They’re probably just crawling all sites they can find.
AmbitiousProcess@piefed.social 3 days ago
Because the easiest solution for them is a simple web scraper. If they don't give a shit about ethics, then something that just crawls every page it can find is loads easier for them to set up than a custom implementation to get torrent downloads for wikipedia, making lemmy/mastodon/pixelfed instances for the fediverse, using rss feeds and checking if they have full or only partial articles, implementing proper checks to prevent double (or more) downloading of the same content, etc.
wizardbeard@lemmy.dbzer0.com 3 days ago
They crawl wikipedia too, and are adding significant extra load on their servers, even though Wikipedia has a regularly updated torrent to download all its content.