Comment on Google Is the Only Search Engine That Works on Reddit Now Thanks to AI Deal
squidspinachfootball@lemm.ee 3 months agoiirc, isn’t robots.txt more of a gentlemen’s agreement? I vaguely recall bots being able to crawl a site regardless, it’s just that most devs respect robots.txt and don’t. Could be wrong though, happy to be corrected.
tal@lemmy.today 3 months ago
Sure, you can write software that violates the spec. But I mean, that’d be true for anything that Reddit can do on their end. Even if they block responses, software can always try hard to impersonate users and scrape websites. You could go through a VPN, pretend to be a browser being linked to a page.
But major search engines will follow the spec.
squidspinachfootball@lemm.ee 3 months ago
That’s a good point, it’s probably way less load and overhead if Reddit and Google just sent info back and forth instead of scraping. Good way for Google to keep their spot as the favoured search engine and beat the competition too, since everything that comes up these days are articles full of SEO nonsense at best, then AI generated nonsense at worst. If nobody else can read the actual human responses, Google has a huge leg up. Also interesting to see that Google’s honouring the txt file even when nobody’s holding them to it.
I had no idea Twitter’s search updated their index immediately after a comment is posted though. That’s a lot of updates considering the amount of posts they get daily.
tal@lemmy.today 3 months ago
While I never had a Twitter account, it’s the major reason that I used the service anonymously. In an unfolding event, like a natural disaster or something, it was absolutely unparalleled in its ability to rapidly comb through enormous amounts of information being plonked in by people around the world. I understand that Mastodon, unfortunately, doesn’t have a full-text search feature, just searching based on exact hashtags. Actually…hmm. I was just talking about Kagi’s search lens for the Threadiverse in another comment that I saw. I wonder if Kagi actually indexes Mastodon as well? That’d provide for similar functionality.
investigates
No, it looks like they only do the Reddit-alike Threadiverse (lemmy, kbin, mbin, etc), for which they use the term “Fediverse Forums”.
investigates further
It does look like they index in real time, though, or at least quickly – they probably are one of the people out there with an instance slurping up everything out there. I was able to find your comment on that search lens.
Yeah, I’m sure that however Twitter built it, they specifically designed it around permitting inexpensive index updates.