set the date filter to something recent, test site:reddit.com df:w
(results from last week only) gives 0 hours hits
Comment on Google Is the Only Search Engine That Works on Reddit Now Thanks to AI Deal
woelkchen@lemmy.world 6 months ago
test site:reddit.com
works fine from DDG for me.
itslilith@lemmy.blahaj.zone 6 months ago
tal@lemmy.today 6 months ago
Robots.txt lets you ask specific user-agents not to index the site. My guess is that that’s how they restricted it. I don’t know how those changes are reflected in existing indexed pages – don’t know if there’s any standard there – but it’ll stop crawlers from examining new pages.
squidspinachfootball@lemm.ee 6 months ago
iirc, isn’t robots.txt more of a gentlemen’s agreement? I vaguely recall bots being able to crawl a site regardless, it’s just that most devs respect robots.txt and don’t. Could be wrong though, happy to be corrected.
tal@lemmy.today 6 months ago
Sure, you can write software that violates the spec. But I mean, that’d be true for anything that Reddit can do on their end. Even if they block responses, software can always try hard to impersonate users and scrape websites. You could go through a VPN, pretend to be a browser being linked to a page.
But major search engines will follow the spec.
squidspinachfootball@lemm.ee 6 months ago
That’s a good point, it’s probably way less load and overhead if Reddit and Google just sent info back and forth instead of scraping. Good way for Google to keep their spot as the favoured search engine and beat the competition too, since everything that comes up these days are articles full of SEO nonsense at best, then AI generated nonsense at worst. If nobody else can read the actual human responses, Google has a huge leg up. Also interesting to see that Google’s honouring the txt file even when nobody’s holding them to it.
I had no idea Twitter’s search updated their index immediately after a comment is posted though. That’s a lot of updates considering the amount of posts they get daily.
eager_eagle@lemmy.world 6 months ago