Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.
because anime catgirls are the best
Submitted 7 months ago by tofu@lemmy.nocturnal.garden to selfhosted@lemmy.world
https://lock.cmpxchg8b.com/anubis.html
Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.
because anime catgirls are the best
Anubis is no challenge like a captcha. Anubis is a ressource waster, forcing crawler to resolve a crypto challenge (basically like mining bitcoin) before being allowed in. That how it defends so well against bots, as they do not want to waste their resources on needless computing, they just cancel the page loading before it even happen, and go crawl elsewhere.
No, it works because the scraper bots don’t have it implemented yet. Of course the companies would rather not spend additional compute resources, but their pockets are deep and some already adapted and solve the challenges.
The point was never that Anubis challenges are something scrapers can’t get past. The point is it’s expensive to do so.
To solve it or not do not change that they have to use more resources for crawling, which is the objective here. And by contrast, the website sees a lot less load compared to before the use of Anubis. In any case, I see it as a win.
But despite that, it has its detractors, like any solution that becomes popular.
But let’s be honest, what are the arguments against it?
It takes a bit longer to access for the first time? Sure, but that’s not like you have to click anything or write anything.
It executes foreign code on your machine? Literally 90% of the web does these days. Just disable JavaScript to see how many website is still functional. I’d be surprised if even a handful does.
The only people having any advantages at not having Anubis are web crawler, be it ai bots, indexing bots, or script kiddies trying to find a vulnerable target.
Sometimes I think. Imagine if a company like google or facebook would implement something like anubis. And suddenly most people’s browsers would start solving cpu intensive constant cryptographic challenges. People would be outraged by the wasted energy. But somehow “cool small company” does it and it’s fine.
I do not think anubis system is sustainable for all the people to use it, it’s just too wasteful energy wise.
What alternatives do you propose?
Captcha.
It does all Anubis does. If a scrapper wants to solve it automatically it’s computer intensive, they have to run AI inference, but for the user it’s just a little time consuming.
With captchas you don’t run aggressive software unauthorized on anyone’s computer.
Solution did exist. But Anubis is “trendy” and they are masters in PR within some specific circles of people who always wants the lastest most trendiest thing.
But good old captcha would achieve the same result as Anubis, in a more sustainable way.
Anubis sucks
However, there is not many other options.
What sucks about Anubis?
The implementation
It runs JavaScript and the actual algorithm could use improvement.
Yeah but at least Anubis is cute.
I’ll take sucks but cute over dead internet and endless swarmings of zergling crawlers.
New developments: just a few hours before I post this comment, The Register posted an article about AI crawler traffic. www.theregister.com/2025/…/ai_crawler_traffic/
Anubis’ developer was interviewed and they posted the responses on their website: xeiaso.net/notes/2025/el-reg-responses/
In particular:
Fastly’s claims that 80% of bot traffic is now AI crawlers
In some cases for open source projects, we’ve seen upwards of 95% of traffic being AI crawlers. For one, deploying Anubis almost instantly caused server load to crater by so much that it made them think they accidentally took their site offline. One of my customers had their power bills drop by a significant fraction after deploying Anubis. It’s nuts.
So, yeah. If we believe Xe, OOP’s article is complete hogwash.
Cool article, thanks for linking! Not sure about that being a new development though, it’s just results, but we already knew it’s working. The question is, what’s going to work once the scrapers adapt?
There are some sites where Anubis just won’t let me through. Like, I just get immediately bounced.
So RIP dwarf fortress forums. I liked you.
I don’t get it, I thought it allows all browser with JavaScript enabled.
I, too get blocked by certain sites. I think it’s a configuration thing, where it does not like my combination of uBlock/NoScript, even when I explicitly allow their scripts…
I love that domain name.
I’m constantly unable to access Anubis sites on my primary mobile browser and have to switch over to Fennec.
Have you tried accessing it by using Nyarch?
This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity.
Well it doesnt fucking matter what “makes sense to you” because it is working…
Its being deployed by people who had their sites DDoS’d to shit by crawlers and they are very happy with the results so what even is the point of trying to argue here?
It’s working because it’s not very used. It’s sort of a “pirate seagull” theory. As long a few people use it it works. Because scrappers don’t really count on Anubis so they don’t implement systems to surpass it.
If it were to become more common it would be really easy to implement systems that would defeat the purpose.
As of right now sites are ok because scrappers just send https requests and expect a full response. Of someone wants to bypass Anubis protection they would need to take into account that they will receive a cryptographic challenge and have to solve it.
The thing is that cryptographic challenges can be very optimized. They are designed to run in a very inefficient environment as it is a browser. But if someone would take the challenge and solve it in a better environment using CUDA or something like that it would take a fraction of the energy defeating the purpose of “being so costly that it’s not worth scrapping”.
At this point it’s only a matter of time that we start seeing scrappers like that. Specially if more and more sites start using Anubis.
Yeah, well-written stuff. I think Anubis will come and go. This beautifully demonstrates and, best of all, quantifies the negligence of Anubis.
It’s very interesting to try to think of what would work, even conceptually. Some sort of purely client-side captcha type of thing perhaps. I keep thinking about it in half-assed ways for minutes at a time.
Maybe something that scrambles the characters of the site according to some random “offset” of some sort, e.g maybe randomly selecting a modulus size and an offset to cycle them, or even just a good ol’ cipher. And the “captcha” consists of a slider that adjusts the offset. You as the viewer know it’s solved when the text becomes something sensical - so there’s no need for the client code to store a readable key that could be used to auto-undo the scrambling. You could maybe even have some values of the slider randomly chosen to produce English text if the crawler got smart enough to check for legibility (not sure how to hide which slider positions would be these red herring ones though) - which could maybe be enough to trick the crawler into picking up junk text sometimes.
Anubis is more of a economic solution. It doesn’t stop bots but it does make companies pay more to access content instead of having server operators foot the bill.
I’m sure you meant to sound more analytical than anything… but this really comes off as arrogant.
You make the claim that Anubis is negligent and come and go, and then admit ton only spending minutes at a time thinking of solutions yourself, which you then just sorta spout. It’s fun to think about solutions to this problem collectively, but can you honestly believe that Anubis is negligent when it’s so clearly working and when the author has been so extremely clear about their own perception of its pitfalls and hasty development (go read their blog, it’s a fun time).
By negligence, I meant that the cost is negligible to the companies running scrapers, not that the solution itself is negligent. I should have said “negligibility” of Anubis, sorry - that was poor clarity on my part.
But I do think that the cost of it is indeed negligible, as the article shows. It doesn’t really matter if the author is biased or not, their analysis of the costs seems reasonable. I would need a counter-argument against that to think they were wrong. Just because they’re biased isn’t enough to discount the quantification they attempted to bring to the debate.
Also, I don’t think there’s any hypocrisy in me saying I’ve only thought about other solutions here and there - I’m not maintaining an anti-scraping library. And there’s already been indications that scrapers are just accepting the cost Anubis on Codeberg, right? So I’m not trying to say I’m some sort of tech genius who has the right idea here, but from what Codeberg was saying, and from the numbers in this article, it sure looks like Anubis isn’t the right idea. I am indeed only having fun with my suggestions, not making whole libraries out of them and pronouncing them to be solutions. I personally haven’t seen evidence that Anubis is so clearly working?
That kind of captcha is trivial to bypass via frequency analysis. Text that looks like language, as opposed to random noise, is very statistically recognisable.
Not to mention it relies on security though obscurity
It wouldn’t be that hard to figure out and bypass
The current version of Anubis was made as a quick “good enough” solution to an emergency. The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of scraper requests.
Exactly my thoughts too. Lots of theory about why it won’t work, but not looking at the fact that if people use it, maybe it does work, and when it won’t work, they will stop using it.
The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked
This post was originally written for ycombinator “Hacker” News which is vehemently against people hacking things together for greater good, and more importantly for free.
It’s more of a corporate PR release site and if you aren’t known by the “community”, calling out solutions they can’t profit off of brings all the tech-bros to the yard for engagement.
The problem is that the purpose of Anubis was to make crawling more computationally expensive and that crawlers are apparently increasingly prepared to accept that additional cost. One option would be to pile some required cycles on top of what’s currently asked, but it’s a balancing act before it starts to really be an annoyance for the meat popsicle users.
That’s why the developer is working on a better detection mechanism. xeiaso.net/…/avoiding-becoming-peg-dependency/
And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.
I feel people that complain about Anubis have never had their server overheat and shut down on a almost daily basis because of AI scrapers 🤦
Out of curiosity, what’s the issue with Cloudflair? Aside from the constant worry they may strong arm you into their enterprise pricing if you’re site is too popular lol. I understand support open source, but why not let companies handle the expensive bits as long as they’re willing?
I guess I can answer my own question. If the point of the Fediverse is to remove a single point of failure, then I suppose Cloidflare could become a single point to take down the network. Still, we could always pivot away from those types of services later, right?
I still think captchas are a better solution.
In order to surpass them they have to run AI inference which is also comes with compute costs. But for legitimate users you don’t run unauthorized intensive tasks on their hardware.
Is there a reason other than avoiding infrastructure centralization not to put a web server behind cloudflare?
Unless you have a dirty heatsink, no amount of hammering would make the server overheat
Yeah, I’m just wondering what’s going to follow.
Klear@quokk.au 7 months ago
Did the author only now discovered cryptography? It's lime cryptocurrency, just without currency, what a concept!
SkaveRat@discuss.tchncs.de 7 months ago
It’s a perfectly valid way to explain it, though
If you try to show up with “cryptography” as an explanation, people will think of encrypting messages, not proof of work
“Cryptocurrency with the currency” really is the perfect single sentence explanation
ChaoticEntropy@feddit.uk 7 months ago
It’s quite similar.