Comment

Comment on AI crawlers cause Wikimedia Commons bandwidth demands to surge 50%.

Laws should be passed in all countries that AI crawlers should request permission before crawling whatever target site. I haver no pity to AI “thiefs” that get their models poisoned. F…ing plague, wasn’t enough the adware and spyware…

source

Sort:hotnew top

chrash0@lemmy.world ⁨8⁩ ⁨months⁩ ago
i doubt the recent uptick in traffic is from “stealing data” for training but rather from agents scraping them for context, eg Edge Copilot, Google’s AI search, SearchGPT, etc.

poisoning the data will likely not help in this situation since there’s a human on the other side that will just do the same search again given unsatisfactory results. like how retries and timeouts can cause huge outages for web scale companies, poisoning search results will likely cause this type of traffic to increase and further increase the chances of DoS and higher bandwidth usage.

source
- TheBlackLounge@lemm.ee ⁨8⁩ ⁨months⁩ ago
  So? Break context scrapers till they give up, on your site or completely.
  
  source
  - chrash0@lemmy.world ⁨8⁩ ⁨months⁩ ago
    easily said
    
    source
catloaf@lemm.ee ⁨8⁩ ⁨months⁩ ago
An HTTP request is a request. Servers are free to rate limit or deny access

source
- taladar@sh.itjust.works ⁨8⁩ ⁨months⁩ ago
  Rate limiting in itself requires resources that are not always available. For one thing you can only rate limit individuals you can identify so you need to keep data about past requests in memory and attach counters to them and even then that won’t help if the requests come from IPs that are easily changed.
  
  source
- grysbok@lemmy.sdf.org ⁨8⁩ ⁨months⁩ ago
  Bots lie about who they are, ignore robots.txt, and come from a gazillion different IPs.
  
  source
  - catloaf@lemm.ee ⁨8⁩ ⁨months⁩ ago
    That’s what ddos protection is for.
    
    source