At least in the beginning the scrapers just used curl with a different user agent. Forcing them to use a headless client is already a 100x increase in resources for them. That in itself is already a small victory and so far it is working beautifully.
Comment on Anubis is awesome and I want to talk aout it
___qwertz___@feddit.org 4 months agoFunnily enough, PoW was a hot topic in academia around the late 90s / early 2000, and it’s somewhat clear that the autor of Anubis has not read much about the discussion back then.
There was a paper called “Proof of work does not work” (or similar, can’t be bothered to look it up) that argued that PoW can not work for spam protection, because you have to support both low-powered consumer devices while blocking spammers with heavy hardware. And that is very valid concern. Then there was a paper arguing that PoW can still work, as long as you scale the difficulty in such a way that a legit user (e.g. only sending one email) has a low difficulty, while a spammer (sending thousands of emails) has a high difficulty.
The idea of blocking known bad actors actually is used in email quite a lot in forms of DNS block lists (DNSBLs) such as spamhaus (this has nothing to do with PoW, but such a distributed list could be used to determine PoW difficulty).
Anubis on the other hand does nothing like that and a bot developed to pass Anubis would do so trivially.
Sorry for long text.
Flipper@feddit.org 4 months ago
sudo@programming.dev 4 months ago
Well in most cases it would by Python requests not curl. But yes, forcing them to use a browser is the real cost. Not just in CPU time but in programmer labor. PoW is overkill for that though.
sudo@programming.dev 4 months ago
Telling a legit user from a fake user is the entire game. If you can do that you just block the fake user. Professional bot blockers like Cloudflare or Akamai have machine learning systems to analyze trends in network traffic and serve JS challenges to suspicious clients. Last I checked, all Anubis uses is User-Agent filters, which is extremely behind the curve. Bots are able to get down to faking TLS fingerprints and matching them with User-Agents.