Comment on I was wrong about robots.txt
thedruid@lemmy.world 1 day ago
So. If I can add something here for everyone’s benefit
No search engine really obeys robots.txt
Their publicly acknowledged crawlers do, but they have other crawlers that aren’t know that ignore the file.
Google knows every inch of your site, allowed or not.
See, just because a search engine says it doesn’t know, doesn’t mean it hasn’t crawled. Just doesn’t display the results based on your settings.
ell1e@leminal.space 1 day ago
And allowing the public crawler might also have it feed their AI: arstechnica.com/…/cloudflare-wants-google-to-chan…