Is the only imaginable system for AI to exist one in which every website operator, or musician, artist, writer, etc has no say in how their data is used? Is it possible to have a more consensual arrangement?
As far as the question about ethics, there is a lot of ground to cover on that. A lot of it is being discussed. I’ll basically reiterate what I said that pertains to data rights. I believe they are pretty fundamental to human rights, for a lot of reasons. AI is killing open source, and claiming the whole of human experience for its own training purposes. I find that unethical.
GunnarGrop@lemmy.ml 14 hours ago
Much of it might be freely available data, but there’s a huge difference between you accessing a website for data and an LLM doing the same thing. We’ve had bots scraping websites since the 90’s, it’s not a new thing. And since scraping bots have existed we’ve developed a standard on the web to deal with it, called “robots.txt”. A text file telling bots what they are allowed to do on websites and how they should behave.
LLM’s are notorious for disrespecting this, leading to situations where small companies and organisations will have their websites scraped so thoroughly and frequently that they can’t even stay online anymore, as well as skyrocketing their operational costs. In the last few years we’ve had to develop ways just to protect ourselves against this. See the “Anubis” project.
Hence, it’s much more important that LLM’s follow the rules than you and me doing so on an individual level.
It’s the difference between you killing a couple of bees in your home versus an industry specialising in exterminating bees at scale. The efficiency is a big factor.