Beyond the copyright issues and energy issues, AI does some serious damage to your ability to do actual hard research. And I'm not just talking about "AI brain."
Let's say you're looking to solve a programming problem. If you use a search engine and look up the question or a string of keywords, what do you usually do? You look through each link that comes up and judge books by their covers (to an extent). "Do these look like reputable sites? Have I heard of any of them before?" You scroll click a bunch of them and read through them. Now you evaluate their contents. "Have I already tried this info? Oh this answer is from 15 years ago, it might be outdated." Then you pare down your links to a smaller number and try the solution each one provides, one at a time.
Now let's say you use an AI to do the same thing. You pray to the Oracle, and the Oracle responds with a single answer. It's a total soup of its training data. You can't tell where specifically it got any of this info. You just have to trust it on faith. You try it, maybe it works, maybe it doesn't. If it doesn't, you have to write a new prayer try again.
Even running a local model means you can't discern the source material from the output. This isn't Garbage In Garbage Out, but Stew In Soup Out. You can feed an AI a corpus of perfectly useful information, but it will churn everthing into a single liquidy mass at the end. And because the process is destructive, you can't un-soup the output. You've robbed yourself of the ability to learn from the input, and put all your faith into the Oracle.
Mika@sopuli.xyz 7 months ago
You actually can, and you should be. And the process is not destructive since you can always undo in tools like cursor, or discard in git.
Besides, you can steer a good coding LLM in a right direction. The better you understand what are you doing - the better.
HarkMahlberg@kbin.earth 7 months ago
You misunderstood, I wasn't saying you can't Ctrl Z after using the output, but that the process of training an AI on a corpus yields a black box. This process can't be reverse engineered to see how it came up with it's answers.
It can't tell you how much of one source it used over another. It can't tell you what it's priorities are in evaluating data... not without the risk of hallucinating on you when you ask it.