Not every time, but far too often. They don’t seem to care that they’re discriminating against people with AV impairment, plus locking out some secure browsers.
archive.is is not related to the internet archive and I believe is run by a solo dev with private funding.
I looked into who runs it a bit and oh wow, it’s far far worse than that. If you get a captcha from archive.is / archive.ph / archive.today and allow it scripting permission, it seems to use your browser as part of a DDoS attack. See infosec.exchange/@iampytest1/115902693235671566 and linked pages.
Sometimes I’m able to get around it by tweaking some ublock permissions, but once I was surprised to discover that changing my user-agent with user-agent switcher seemed to do the trick. It’s really strange. Cloudflare’s captcha loops are inscrutable.
LLM-driven web scraping is intense for some sites, so their bot detection software is tuned in a way that creates a lot of false positives.
Obscuring your browser fingerprint, or blocking javascript, or using an unusual user-agent string can trigger a captcha challenge.
If you’re not doing that and seeing a site suddenly start giving your captchas then they may be being DDoS’d by scrapers and are challenging all clients.
A site that archives content is especially vulnerable because they have a lot of the data that is useful for AI training.
It is incredibly annoying, but until we have a robust way of proving identity that can’t be gamed by bad actors we’re stuck with individual user challenges.
I haven’t faced a captcha but, it just took a solid 2 minutes to resolve and load the article for me. Maybe they have something else happening behind the scenes impacting performance so they are locking down certain routes?
Arcane2077@sh.itjust.works 3 days ago
Anyone else facing captcha loops whenever they try to view an archive.is link? Haven’t been able to read subscriber only articles for months now
mjr@infosec.pub 3 days ago
Not every time, but far too often. They don’t seem to care that they’re discriminating against people with AV impairment, plus locking out some secure browsers.
ilovepiracy@lemmy.dbzer0.com 2 days ago
Just a heads up, archive.is is not related to the internet archive and I believe is run by a solo dev with private funding.
mjr@infosec.pub 2 days ago
I looked into who runs it a bit and oh wow, it’s far far worse than that. If you get a captcha from archive.is / archive.ph / archive.today and allow it scripting permission, it seems to use your browser as part of a DDoS attack. See infosec.exchange/@iampytest1/115902693235671566 and linked pages.
Arcane2077@sh.itjust.works 3 days ago
Dang, yeah it’s probably my strict browser settings. Thanks for the confirmation of shared experience.
cecilkorik@piefed.ca 3 days ago
Sometimes I’m able to get around it by tweaking some ublock permissions, but once I was surprised to discover that changing my user-agent with user-agent switcher seemed to do the trick. It’s really strange. Cloudflare’s captcha loops are inscrutable.
FauxLiving@lemmy.world 3 days ago
LLM-driven web scraping is intense for some sites, so their bot detection software is tuned in a way that creates a lot of false positives.
Obscuring your browser fingerprint, or blocking javascript, or using an unusual user-agent string can trigger a captcha challenge.
If you’re not doing that and seeing a site suddenly start giving your captchas then they may be being DDoS’d by scrapers and are challenging all clients.
A site that archives content is especially vulnerable because they have a lot of the data that is useful for AI training.
It is incredibly annoying, but until we have a robust way of proving identity that can’t be gamed by bad actors we’re stuck with individual user challenges.
vikingtons@lemmy.world 3 days ago
No but I do get about three or four challenges
Axolotl_cpp@feddit.it 3 days ago
I don’t have this problem; You probably are using TOR or a VPN and it triggered the captcha
MadMadBunny@lemmy.ca 3 days ago
Nope
Pika@sh.itjust.works 3 days ago
I haven’t faced a captcha but, it just took a solid 2 minutes to resolve and load the article for me. Maybe they have something else happening behind the scenes impacting performance so they are locking down certain routes?