I think it’s just a new world for spam.
At some point, probably soon, AI content will generate so much data it becomes untenable to store all the scraped data.
We’ll also reach a point where it becomes much more costly to parse the data for AI spam+trustworthiness+topics. If you need LLMs just to filter spam, that is a large step up in costs and infrastructure vs current methods.
When that happens what happens to search? The quality will have to degrade or the margins will drop off sharply.
ColeSloth@discuss.tchncs.de 10 months ago
They have already been trying to use ai to combat and identify ai in college and highschool papers. So far it’s been severely ineffective. AI has gotten pretty good at writing out a sentence or two that looks like it’s real. If ai improves enough I doubt they’ll be much of a way to identify it all.
lloram239@feddit.de 10 months ago
It’s not about identifying AI or even spam, but about extracting useful information. Are the claims made in a source backed by other sources? Do they violate information from trusted sources? That’s all stuff that an AI can reason about and then discard the source as junk or condense it down to the useful information in it.
Basically you completely skip browsing the Web yourself and just use the AI to find you what you want. Think of it like some IMDB or Wikipedia, but covering everything and written and curated by AI. When the AI doesn’t already know some fact, it goes crawling the Web and finding it out for you, expanding its knowledge base in the process.
At the moment there are still some technical hurdles, the AI systems we have are all still a little to stupid for this. But that seems to be the direction we are heading, things like summarizer bots already do a pretty good job and ChatGPT is reasonably good at answering basic questions. Only a matter of time until it gets good enough that you couldn’t do a better job yourself.
ColeSloth@discuss.tchncs.de 10 months ago
You’re looking at it in a flawed manner. AI has already been making up sources and names to state things as facts. If there’s a hundred websites for claiming the earth is flat and you ask an ai if the earth is flat, it may tell you it is flat and source those websites. It’s already been happening. Then imagine more opinionated things than hard observable scientific facts. Imagine a government using AI to shape opinion and claim there was no form of insurrection on Jan 6th. Thousands of websites and comments could quickly be fabricated to confirm that it was all made up. Burying the truth into obscurity.
lloram239@feddit.de 10 months ago
You have plenty of literature that can act as ground truth. This is not a terribly hard problem to solve, it just requires actually focusing on it. Which so far simply hasn’t been done. ChatGPT is just the first “look, this can generate text”. It was never meant to do anything useful by itself or stick to the truth. That all still has to be developed. ChatGPT simply demonstrates that LLM can process natural language really well. It’s the first step in this, not the last.