Google “spider trap website” or something.
Comment on OpenAI and Anthropic are ignoring an established rule that prevents bots scraping online content
wagoner@infosec.pub 7 months ago
Novice web site owner/coder here: wondering if I can block them somehow via IP address in addition to robots.txt. Server firewall rule? Remember, I said I was a novice…
conciselyverbose@sh.itjust.works 7 months ago
IndustryStandard@lemmy.world 7 months ago
You can block an IP but first you would need to know which IPs are scrapers. And they could just use a VPN to bypass IP blocks.
balder1991@lemmy.world 7 months ago
IndustryStandard@lemmy.world 7 months ago
Yes, the less expensive VPNs especially have a lot of users using the same IP addresses.
You can get a VPN with private IP’s but this is more expensive. For a company of OpenAI’s size that would be a drop in the bucket though.