Github, acquired by Microsoft, is now forcing AI on its user base.
Comment on Alternative to github pages?
csm10495@sh.itjust.works 2 days agoIn what way? Anything on the public internet is likely being used for AI training. I guess by using free GitHub you can’t object to training.
Then again anywhere you host you sort of run into the same problem. You can use robots.txt, but things don’t have to listen to it.
MadMadBunny@lemmy.ca 1 day ago
iveseenthat@reddthat.com 22 hours ago
That’s one of my main drivers to stay away from GH
jqubed@lemmy.world 1 day ago
Self-hosting there are some ways to fight back, or depending on your opinions on Cloudflare it seems they’re fairly effective at blocking the AI crawlers.
AmbiguousProps@lemmy.today 1 day ago
Yep, on top of simply blocking, if you’re self hosting or using cloudflare, you can enable AI tarpits.
iveseenthat@reddthat.com 22 hours ago
How do I do this? I don’t mind (and may prefer) to host not at home. My main concern with GH is that you become an AI snack whether you like it or not.
AmbiguousProps@lemmy.today 21 hours ago
Which part? If you’re wanting to use cloudflare pages, it’s relatively straightforward. You can follow this and get up & running pretty quickly: hongkiat.com/…/host-static-website-cloudflare-pag…
If you’re asking about the tarpits, there’s two ways (generally) to accomplish that. Even if you don’t use cloudflare pages to host your site directly (if you use nginx on your server, for example), you can still enable AI tarpits for your entire domain, so long as you use cloudflare for your DNS provider: blog.cloudflare.com/ai-labyrinth/#how-to-use-ai-l…
If you want to do it all locally, you could instead setup iocaine or nepenthes which are both self hosted and can integrate with various webserver software. Obviously, cloudflare’s tarpits are stupid simple to setup compared to these, but these give you greater control of exactly how you’re poisoning the well and trapping crawlers.