Comment

Comment on Anubis is awesome and I want to talk aout it

A_norny_mousse@feddit.org ⁨3⁩ ⁨months⁩ ago

At the time of commenting, this post is 8h old. I read all the top comments, many of them critical of Anubis.

I run a small website and don’t have problems with bots. Of course I know what a DDOS is - maybe that’s the only use case where something like Anubis would help, instead of the strictly server-side solution I deploy?

As a strictly server-side (and low computing) solution I use CrowdSec (it seems to work with caddy btw). It took a little setting up, but it does the job.

Am I missing something here? Why wouldn’t that be enough? Why do I need to heckle my visitors?

Despite all that I still had a problem with bots knocking on my ports

By the time Anubis gets to work, the knocking already happened so I don’t really understand this argument.

spamming my logs.

If spamming the logs is the only concern here, I rest my case.
Otherwise, if the system is set up to reject a certain type of requests, these are microsecond transactions of no (DDOS exception) harm.

source

Sort:hotnew top

poVoq@slrpnk.net ⁨3⁩ ⁨months⁩ ago
AI scraping is a massive issue for specific types of websites, such has git forges, wikis and to a lesser extend Lemmy etc, that rely on complex database operations that can not be easily cached. Unless you massively overprovision your infrastructure these web-applications come to a grinding halt by constantly maxing out the available CPU power.

The vast majority of the critical commenters here seem to talk from a point of total ignorance about this, or assume operators of such web applications have time for hyperviligance to constantly monitor and manually block AI scrapers (that do their best to circumvent more basic blocks). The realistic options for such operators are right now: Anubis (or similar), Cloudflare or shutting down their servers. Of these Anubis is clearly the least bad option.

source
- chunes@lemmy.world ⁨3⁩ ⁨months⁩ ago
  Sounds like maybe webapps are a bad idea then.
  
  If they need dynamism, how about releasing a desktop application?
  
  source
SmokeyDope@piefed.social ⁨3⁩ ⁨months⁩ ago
Something to understand is that computer nerds who frequent places like this or hacker fourms have predisposition to be loud mouth pessimist with an ego types. Any time theres a working but imperfect solution to a complex problem but its not mathematically perfect, you’ll then have someone loudly declare that that solution makes no sense, write a two paragraph argument on halfassed assumptions that are often misrepresenting the problem out of ignorance or intention, link to a blogspam article written by a hypernerd who they got their talking points from, then laud about wondering how their obviously better solution isnt clearer to the stupid ignorant cretins. Thats just how it is.

If crowdsec works for you thats great but also its a corporate product whos premium sub tier starts at 900$/month not exactly a pure self hosted solution.

I’m not a hypernerd, still figuring all this out among the myriad of possible solutions with different complexity and setup times. All the self hosters in my internet circle started adopting anubis so I wanted to try it. Anubis was relatively plug and play with prebuilt packages and great install guide documentation.

Allow me to expand on the problem I was having. It wasnt just that I was getting a knock or two, its that I was getting 40 knocks every few seconds scraping every page and searching for a bunch that didnt exist that would allow exploit points in unsecured production vps systems.

On a computational level the constant network activity of bytes from webpage, zip files and images downloaded from scrapers pollutes traffic. Anubis stops this by trapping them in a landing page that transmits very little information from the server side. By traping the bot in an Anubis page which spams that 40 times on a single open connection before it gives up, it reduces overall network activity/ data transfered which is often billed as a metered thing as well as the logs.

And this isnt all or nothing. You don’t have to pester all your visitors, only those with sketchy clients. Anubis uses a weighted priority which grades how legit a browser client is. Most regular connections get through without triggering, weird connections get various grades of checks by how sketchy they are. Some checks dont require proof of work or JavaScript.

On a psychological level it gives me a bit of relief knowing that the bots are getting properly sinkholed and I’m punishing/wasting the compute of some asshole trying to find exploits my system to expand their botnet. And a bit of pride knowing I did this myself on my own hardware without having to cop out to a corporate product.

Its nice that people of different skill levels and philosophies have options to work with. One tool can often complement another too. Anubis worked for what I wanted, filtering out bots from wasting network bandwith and giving me peace of mind where before I had no protection. All while not being noticeable for most people because I have the ability to configure it to not heckle every client every 5 minutes like some sites want to do.

source
- A_norny_mousse@feddit.org ⁨3⁩ ⁨months⁩ ago
  If crowdsec works for you thats great but also its a corporate product
  
  It’s also fully FLOSS with dozens of contributors (not to speak of the community-driven blocklists). If they make money with it (presumably for large firms), great.
  
  not exactly a pure self hosted solution.
  
  Why? I host it, I run it. It’s even in Debian repos, but I choose their own more up-to-date ones.
  
  All the self hosters in my internet circle started adopting anubis so I wanted to try it. Anubis was relatively plug and play with prebuilt packages
  
  Yeah…
  
  Allow me to expand on the problem I was having. It wasnt just that I was getting a knock or two, its that I was getting 40 knocks every few seconds scraping every page and searching for a bunch that didnt exist that would allow exploit points in unsecured production vps systems.
  
  Again, a properly set up WAF will deal with this pronto
  
  You should not have exploit points in unsecured production systems, full stop.
  
  On a computational level the constant network activity of bytes from webpage, zip files and images downloaded from scrapers pollutes traffic. Anubis stops this by trapping them in a landing page that transmits very little information from the server side.
  
  And instead you leave the computations to your clients. Which becomes a problem on slow hardware.
  
  Again, with a properly set up WAF there’s no “traffic pollution” or “downloading of zip files”.
  
  Anubis uses a weighted priority which grades how legit a browser client is.
  
  And apart from the user agent and a few other responses, all of which are easily spoofed, this means “do some javascript stuff on the local client” (there’s a link to an article here somewhere that explains this well) which is much less trivial than you make it sound.
  
  source
  - SmokeyDope@piefed.social ⁨3⁩ ⁨months⁩ ago
    
    why? I run it.
    
    Mmm how to say this. i suppose what I’m getting at is like a philosophy of development and known behaviors of corporate products.
    
    So, here’s what I understand about crowdsec. Its essentially like a centralized collection of continuously updated iptable rules and botscanning detectors that clients install locally.
    
    In a way its crowd sourcing is like a centralized mesh network each client is a scanner node which phones home threat data to the corporate home which updates that.
    
    Notice the optimal word, centralized. The company owns that central home and its their proprietary black box to do what they want with. And so you know what for profit companies like to do to their services over time? Enshittify them by
    
    adding subscription tier price models
    
    putting once free features behind paywalls,
    
    change data sharing requirements as a condition for free access
    
    restricting free api access tighter and tighter to encourage paid tiers,
    
    making paid tiers cost more to do less.
    
    Intentionally ruining features in one service to drive power users to use a different.
    
    They can and do use these tactics to drive up profit or reduce overhead once a critical mass has been reached. I do not expect alturism and respect for usersfrom corporations, I expect bean counters using alturism as a vehicle to attract users in the growing phase and then flip the switch in their tos to go full penny pinching once they’re too big to fail.
    
    At the end of the day its not the thousands of anonymous users contributing their logs or Foss voulenteers on git getting a quarterly payout. They’re the product and free compute + live action pen testing ginnea pigs, no matter what PR they spin saying how much they care about the security of the plebs using their network for free.
    
    Its always about maximizing the money with these people your security can get fucked if they dont get some use out of you. Expect at some point the tos will change so that anonymized data sharing is no longer an option for free tier.
    
    What happens if the company goes bankrupt? Does it just stop working when their central servers shut down? Does their open source security have the possibility of being forked and run from local servers?
    
    It doesnt have to be like this. Peer to peer Decentralized mesh networks like YaCy already show its possible for a crowdsourced network of users can all contribute to an open database. Something that can be completely run as a local Node which federates and updates the information in global node. Something like it that updates a global iptables is already a step in the right direction. In that theoretical system there is no central monopoly its like the fediverse everyone contributes to hosting the global network as a mesh which altruistic hobbyist can contribute free compute to on their own terms.
    
    https://github.com/yacy/yacy_search_server
    
    I"I dont see anything wrong with people getting paid” is something I see often on discussions. Theres nothing wrong with people who do work and make contributions getting paid. What’s wrong is it isnt the open source community on github or the users contributing their precious data getting paid, its a for profit centralized monopoly that controls access to the network which the open source community built for free out of alturism.
    
    The pattern is nearly always the same. The thing that once worked well and which you relied on gets slowly worse each ToS update, while their pricing inches just a dollar higher each quarter, and you get less and less control over how you get to use their product. Its pattern recognition.
    
    The only solution is to cut the head off the snake. If I can’t fully host all of the components, see the source code of the mechanisms at all layers, own a local copy of the global database, then its not really mine.
    
    Again, it’s a philosophy thing. Its very easy to look at all that, shrug, and go “whatever not my problem I’ll just switch If it becomes an issue”. But the problem festers the longer its ignored or enabled for convinence. The community needs to truly own the services they run on every level, it has to be open, and for profit bean counters can’t be part of the equation especially for hosting. There are homelab hobbyist out there who will happily eat cents on a electric bill to serve an open service to a community, get 10,000 of them on a truly open source decentralized mesh network and you can accomplish great things without fear of being the product.
    
    source
daniskarma@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
You are right. For most self-hosting usecases anubis is not only irrelevant, but it actually works against you. False sense of security and making your devices do extra work for nothing.

Anubis is though for public facing services that may get ddos or AI scrapped by some not targeted bot (for a target bot it’s trivial to get over Anubis in order to scrap).

source
Pastime0293@discuss.tchncs.de ⁨3⁩ ⁨months⁩ ago
I also used CrowdSec for almost a year, but as AI scrapers became more aggressive, CrowdSec alone wasn’t enough. The scrapers used distributed IP ranges and spoofed user agents, making them hard to detect and costing my Forgejo instance a lot in expensive routes. I tried custom CrowdSec rules but hit its limits.

Then I discovered Anubis. It’s been an excellent complement to CrowdSec — I now run both. In my experience they work very well together, so the question isn’t “A or B?” but rather “How can I combine them, if needed?”

source
quick_snail@feddit.nl ⁨3⁩ ⁨months⁩ ago
With varnish and wazuh, I’ve never had a need for Anubis.

My first recommendation for anyone struggling with bots is to fix their cache.

source
- kalleboo@lemmy.world ⁨3⁩ ⁨months⁩ ago
  Anubis was originally created to protect git web interfaces since they have a lot of heavy-to-compute URLs that aren’t feasible to cache (revision diffs, zip downloads etc).
  
  After that I think it got adopted by a lot of people who didn’t actually need it, they just don’t like seeing AI scrapers in their logs.
  
  source
  - quick_snail@feddit.nl ⁨3⁩ ⁨months⁩ ago
    Yes!
    
    Also, another very simple solution is to authwall expensive pages that can’t be cached.
    
    source