Comment on Intentionally corrupting LLM training data?
kamstrup@programming.dev 1 year ago
You should probably change page content entirely, server sizey, based on the user agent og request IP.
Using CSS to change layout based on the request has long since been “fixed” by smart crawlers. Even hacks that use JS to show/hide content is mostly handled by crawlers.
colonial@lemmy.world 1 year ago
I won’t be using CSS or JS. I control the entire stack, so I can do a server-side check -
GPTBot
user agents get random garbage, everyone else gets the real deal.Obviously this relies on OpenAI not masking their user agent, but I think webmasters would notice a conspicuous lack of hits if they did that.