Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.
I love that domain name.
Submitted 14 hours ago by tofu@lemmy.nocturnal.garden to selfhosted@lemmy.world
https://lock.cmpxchg8b.com/anubis.html
Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.
I love that domain name.
The current version of Anubis was made as a quick “good enough” solution to an emergency. The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of scraper requests.
And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.
I feel people that complain about Anubis have never had their server overheat and shut down on a almost daily basis because of AI scrapers 🤦
Yeah, I’m just wondering what’s going to follow.
Is there a reason other than avoiding infrastructure centralization not to put a web server behind cloudflare?
Unless you have a dirty heatsink, no amount of hammering would make the server overheat
The problem is that the purpose of Anubis was to make crawling more computationally expensive and that crawlers are apparently increasingly prepared to accept that additional cost. One option would be to pile some required cycles on top of what’s currently asked, but it’s a balancing act before it starts to really be an annoyance for the meat popsicle users.
That’s why the developer is working on a better detection mechanism. xeiaso.net/…/avoiding-becoming-peg-dependency/
This… makes no sense to me. Almost by definition, an AI vendor will have a datacenter full of compute capacity.
Well it doesnt fucking matter what “makes sense to you” because it is working…
Its being deployed by people who had their sites DDoS’d to shit by crawlers and they are very happy with the results so what even is the point of trying to argue here?
I’m constantly unable to access Anubis sites on my primary mobile browser and have to switch over to Fennec.
Have you tried accessing it by using Nyarch?
Yeah, well-written stuff. I think Anubis will come and go. This beautifully demonstrates and, best of all, quantifies the negligence of Anubis.
It’s very interesting to try to think of what would work, even conceptually. Some sort of purely client-side captcha type of thing perhaps. I keep thinking about it in half-assed ways for minutes at a time.
Maybe something that scrambles the characters of the site according to some random “offset” of some sort, e.g maybe randomly selecting a modulus size and an offset to cycle them, or even just a good ol’ cipher. And the “captcha” consists of a slider that adjusts the offset. You as the viewer know it’s solved when the text becomes something sensical - so there’s no need for the client code to store a readable key that could be used to auto-undo the scrambling. You could maybe even have some values of the slider randomly chosen to produce English text if the crawler got smart enough to check for legibility (not sure how to hide which slider positions would be these red herring ones though) - which could maybe be enough to trick the crawler into picking up junk text sometimes.
That kind of captcha is trivial to bypass via frequency analysis. Text that looks like language, as opposed to random noise, is very statistically recognisable.
CrackedLinuxISO@lemmy.dbzer0.com 1 minute ago
There are some sites where Anubis just won’t let me through. Like, I just get immediately bounced.
So RIP dwarf fortress forums. I liked you.