Comment

Comment on Why are anime catgirls blocking my access to the Linux kernel?

mfed1122@discuss.tchncs.de ⁨4⁩ ⁨months⁩ ago

Yeah, well-written stuff. I think Anubis will come and go. This beautifully demonstrates and, best of all, quantifies the negligence of Anubis.

It’s very interesting to try to think of what would work, even conceptually. Some sort of purely client-side captcha type of thing perhaps. I keep thinking about it in half-assed ways for minutes at a time.

Maybe something that scrambles the characters of the site according to some random “offset” of some sort, e.g maybe randomly selecting a modulus size and an offset to cycle them, or even just a good ol’ cipher. And the “captcha” consists of a slider that adjusts the offset. You as the viewer know it’s solved when the text becomes something sensical - so there’s no need for the client code to store a readable key that could be used to auto-undo the scrambling. You could maybe even have some values of the slider randomly chosen to produce English text if the crawler got smart enough to check for legibility (not sure how to hide which slider positions would be these red herring ones though) - which could maybe be enough to trick the crawler into picking up junk text sometimes.

source

Sort:hotnew top

JadedBlueEyes@programming.dev ⁨4⁩ ⁨months⁩ ago
That kind of captcha is trivial to bypass via frequency analysis. Text that looks like language, as opposed to random noise, is very statistically recognisable.

source
- possiblylinux127@lemmy.zip ⁨3⁩ ⁨months⁩ ago
  Not to mention it relies on security though obscurity
  
  It wouldn’t be that hard to figure out and bypass
  
  source
possiblylinux127@lemmy.zip ⁨3⁩ ⁨months⁩ ago
Anubis is more of a economic solution. It doesn’t stop bots but it does make companies pay more to access content instead of having server operators foot the bill.

source
dabe@lemmy.zip ⁨3⁩ ⁨months⁩ ago
I’m sure you meant to sound more analytical than anything… but this really comes off as arrogant.

You make the claim that Anubis is negligent and come and go, and then admit ton only spending minutes at a time thinking of solutions yourself, which you then just sorta spout. It’s fun to think about solutions to this problem collectively, but can you honestly believe that Anubis is negligent when it’s so clearly working and when the author has been so extremely clear about their own perception of its pitfalls and hasty development (go read their blog, it’s a fun time).

source
- mfed1122@discuss.tchncs.de ⁨3⁩ ⁨months⁩ ago
  By negligence, I meant that the cost is negligible to the companies running scrapers, not that the solution itself is negligent. I should have said “negligibility” of Anubis, sorry - that was poor clarity on my part.
  
  But I do think that the cost of it is indeed negligible, as the article shows. It doesn’t really matter if the author is biased or not, their analysis of the costs seems reasonable. I would need a counter-argument against that to think they were wrong. Just because they’re biased isn’t enough to discount the quantification they attempted to bring to the debate.
  
  Also, I don’t think there’s any hypocrisy in me saying I’ve only thought about other solutions here and there - I’m not maintaining an anti-scraping library. And there’s already been indications that scrapers are just accepting the cost Anubis on Codeberg, right? So I’m not trying to say I’m some sort of tech genius who has the right idea here, but from what Codeberg was saying, and from the numbers in this article, it sure looks like Anubis isn’t the right idea. I am indeed only having fun with my suggestions, not making whole libraries out of them and pronouncing them to be solutions. I personally haven’t seen evidence that Anubis is so clearly working?
  
  source
  - dabe@lemmy.zip ⁨3⁩ ⁨months⁩ ago
    Well I can agree on the fact that the arms race situation we’re in sucks. It’s an old problem, seen in malware attacks and defenses. I’m just glad we have people fighting on our side in their spare time :’)
    
    And it’s all good on the tone, thank you for your clarifications
    
    source