Comment

Comment on Anubis is awesome! Stopping (AI)crawlbots

Mora@pawb.social ⁨4⁩ ⁨months⁩ ago

Besides that point: why tf do they even crawl lemmy. They could just as well create a “read only” instance with an accoubt that subsribes to all communities … and the other instances would send their data. Oh, right, AI has to be as unethical as possible for most companies for some reason.

source

Sort:hotnew top

wizardbeard@lemmy.dbzer0.com ⁨4⁩ ⁨months⁩ ago
They crawl wikipedia too, and are adding significant extra load on their servers, even though Wikipedia has a regularly updated torrent to download all its content.

source
ZombiFrancis@sh.itjust.works ⁨4⁩ ⁨months⁩ ago
See your brain went immediately to a solution based on knowing how something works. That’s not in the AI wheelhouse.

source
dan@upvote.au ⁨4⁩ ⁨months⁩ ago
They’re likely not intentionally crawling Lemmy. They’re probably just crawling all sites they can find.

source
AmbitiousProcess@piefed.social ⁨4⁩ ⁨months⁩ ago
Because the easiest solution for them is a simple web scraper. If they don't give a shit about ethics, then something that just crawls every page it can find is loads easier for them to set up than a custom implementation to get torrent downloads for wikipedia, making lemmy/mastodon/pixelfed instances for the fediverse, using rss feeds and checking if they have full or only partial articles, implementing proper checks to prevent double (or more) downloading of the same content, etc.

source