Beyond the copyright issues and energy issues, AI does some serious damage to your ability to do actual hard research. And I'm not just talking about "AI brain."
Let's say you're looking to solve a programming problem. If you use a search engine and look up the question or a string of keywords, what do you usually do? You look through each link that comes up and judge books by their covers (to an extent). "Do these look like reputable sites? Have I heard of any of them before?" You scroll click a bunch of them and read through them. Now you evaluate their contents. "Have I already tried this info? Oh this answer is from 15 years ago, it might be outdated." Then you pare down your links to a smaller number and try the solution each one provides, one at a time.
Now let's say you use an AI to do the same thing. You pray to the Oracle, and the Oracle responds with a single answer. It's a total soup of its training data. You can't tell where specifically it got any of this info. You just have to trust it on faith. You try it, maybe it works, maybe it doesn't. If it doesn't, you have to write a new prayer try again.
Even running a local model means you can't discern the source material from the output. This isn't Garbage In Garbage Out, but Stew In Soup Out. You can feed an AI a corpus of perfectly useful information, but it will churn everthing into a single liquidy mass at the end. And because the process is destructive, you can't un-soup the output. You've robbed yourself of the ability to learn from the input, and put all your faith into the Oracle.
very_well_lost@lemmy.world 1 day ago
lol
onslaught545@lemmy.zip 1 day ago
Not all LLMs are the same. You can absolutely take a neural network model and train it yourself on your own dataset that doesn’t violate copyright.
Mika@sopuli.xyz 1 day ago
I can almost guarantee that hundred billion params LLMs are not trained on that, and are trained on the whole web scraped to the furthest extent.
The only sane and ethical solution going forward is to force to opensource all LLMs. Use the datasets generated by humanity - give back to humanity.
Skullgrid@lemmy.world 1 day ago
Jesus fucking christ. There are SO GODDAMN MANY open source LLMs, even from fucking scumbags like facebook. I get that there’s subtleties to the argument on the ProAI vs AntiAI side, but you guys just screech and scream.
github.com/eugeneyan/open-llms
Mika@sopuli.xyz 1 day ago
Besides, the article is about image gen AI, not LLMs.
riskable@programming.dev 1 day ago
Training an AI is orthogonal to copyright since the process of training doesn’t involve distribution.
You can train an AI with whatever TF you want without anyone’s consent. That’s perfectly legal fair use. It’s no different than if you copy a song from your PC to your phone.
Copyright really only comes into play when someone uses an AI to distribute a derivative of someone’s copyrighted work. Even then, it’s really the end user that is even capable of doing such a thing by uploading the output of the AI somewhere.