scraping the web to create a dataset isn’t plagiarism, same with training a model on said scraped data, and calculating which words should come in what order isn’t plagiarism too. I agree that datasets should be ethically sourced, but scraping the web is something that allowed such things as the search engine to be created, which made the web a lot more useful. Was creating google irresponsible?
You seem to be handwaving all concerns about the actual tech, but I think the fact that “training” is literally just plagiarism, and the absolutely bonkers energy costs for doing so, do squarely position LLMs as doing more harm than good in most cases.
The innocent tech here is the concept of the neural net itself, but unless they’re being trained on a constrained corpus of data and then used to analyze that or analogous data in a responsible and limited fashion then I think it’s somewhere on a spectrum between “irresponsible” and “actually evil”.0
a_wild_mimic_appears@lemmy.dbzer0.com 3 days ago
verdigris@lemmy.ml 3 days ago
This is a wild take. You can get chatbots to vomit out entire paragraphs of published works verbatim. There is functionally no mechanism to a chatbot other than looking at a bunch existing texts, picking one randomly, and copying the next word from it. There’s no internal processing or logic that you could call creative, it’s just sticking one Lego at a time onto a tower, and every Lego is someone’s unpaid intellectual property.
There is no definition of plagiarism or copyright that LLMs don’t bite extremely hard. They’re just getting away with it because of the billions of dollars of capital pushing the tech. I am hypothetically very much for the complete abolition of copyright and free usage of information, but a) that means everyone can copy stuff freely, instead of just AI companies, and b) it first requires an actually functional society that provides for the needs of its citizens so they can have the time to do stuff like create art without needing to make a livable profit at it.
a_wild_mimic_appears@lemmy.dbzer0.com 2 days ago
The energy costs are overblown. An response costs about 3Wh, which is about 1 minute of runtime for a 200W Pc, or 10 Seconds of a 1000W microwave. See the calculations made here and below for the energy costs. if you want to save energy, go vegan and ditch your car; completely disbanding ChatGPT amounts for 0,0017% of the CO2 Reduction during Covid 2020 (this guy gave the numbers, but had an error in magnitude, which i fixed in my reply, calculator output is attached. It would help climate activists if they concentrated on something that is worthwhile to criticize.
If i read a book, and use phrases out of that book in my communication, it is covered under fair use - the same should be applicable for scraping the web, or else we can close the internet archive next. Since LLM output isn’t copyrightable, i see no issues with that - and copyright law in the US is an abomination which is only useful for big companies to use as a weapon, small artists don’t really profit from that.
verdigris@lemmy.ml 2 days ago
The costs for responses are overblown, but the costs for training are not.
SugarCatDestroyer@lemmy.world 3 days ago
If the world is ruled by psychopaths who seek absolute power for the sake of even more power, then the very existence of such technologies will lead to very sad consequences and, perhaps, most likely, even to slavery. Have you heard of technofeudalism?
verdigris@lemmy.ml 3 days ago
Okay sure but in many cases the tech in question is actually useful for lots of other stuff besides repression. I don’t think that’s the case with LLMs. They have a tiny bit of actually usefulness that’s completely overshadowed by the insane skyscrapers of hype and lies that have been built up around their “capabilities”.
With “AI” I don’t see any reason to go through such gymnastics. The value in the tech is non-existent for anyone who isn’t either a researcher dealing with impractically large and unwieldy datasets, or of course a grifter looking to profit off of bigger idiots than themselves. It has never and will never be a useful tool for the average person, so why defend it?
a_wild_mimic_appears@lemmy.dbzer0.com 3 days ago
I am an average person, and my GPU is running a chatbot which currently gives me a course in Regular Expressions. My GPU also generates images for me from time to time when i need an image, because i am crappy at drawing. There are a lot of uses for the technology.
verdigris@lemmy.ml 3 days ago
Okay so you could have just looked up one of dozens of resources on regex. The images you “need” are likely bad copies of images that already exist, or they’re weird collages of copied subject matter.
My point isn’t that there’s nothing they can do at all, it’s that nothing they can do is worth the energy cost. You’re spending tons of energy to effectively chew up information already on the web and have it vomited back to you in a slightly different form, when you could have just looked up the information directly. It doesn’t save time, because you have to double check everything. The images are also plagiarized, and you could be paying an artist if they’re something important, or improving your artistic abilities if they aren’t. I struggle to think of many cases where one of those options is unfeasible, it’s just the “easy” way out (because the energy costs are obfuscated) to have a machine crunch up some existing art to get a approximation of what you want.
SugarCatDestroyer@lemmy.world 3 days ago
There’s nothing to defend. Tell me, would you defend someone who is a threat to you and deprives you of the ability to create, making art unnecessary? No, you would go and kill him while this bastard hasn’t grown up. Well, what’s the point of defending a bullet that will kill you? Are you crazy?