OpenAI is definitely not the one arguing that they have stole data to train their AIs, and Disney will be fine whether AI requires owning the rights to training materials or not. Small artists, the ones protesting the most against it, will not. They are already seeing jobs and commission opportunities declining due to it.
Being publicly available in some form is not a permission to use and reproduce those works however you feel like. Only the real owner have the right to decide. We on the internet have always been a bit blasé about it, sometimes deservedly, but as we get to a point we are driving away the very same artists that we enjoy and get inspired by, maybe we should be a bit more understanding about their position.
givesomefucks@lemmy.world 10 months ago
And using publicly available data to train gets you a shitty chatbot…
Hell, even using copyrighted data to train isn’t that great
webghost0101@sopuli.xyz 10 months ago
The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn’t copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.
This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.
The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.
be_excellent_to_each_other@kbin.social 10 months ago
So then we as a society aren't ready to untangle the mess of our infancy in the digital age. ChatGPT isn't something we must have at all costs, it's something we should have when we can deploy it while still respecting the rights of people who have made the content being used to train it.
assa123@lemmy.world 10 months ago
I would go even further and say that we should have it until we can be sure it will respect others’ rights. All kind of rights, not only Copyright. Unlike Bing at the beginning, with all it’s bullying and menaces, or Chatgpt regurgitating private information gathered from God knows where.
The problem with waiting is the arms race with other governments. I feel it’s similar to fossil fuels, but all governments need to take the risk of being disadvantaged. Damned prisoner’s dilemma.
RainfallSonata@lemmy.world 10 months ago
I didn’t want any of this shit. IDGAF if we don’t have AI. I’m still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.
RememberTheApollo@lemmy.world 10 months ago
It doesn’t matter what you want. What matters is if corporations can extract $ from you, gain an efficiency, or cut their workforce using it.
That’s what the drive for AI is all about.
myslsl@lemmy.world 10 months ago
Machine learning techniques are often thought of as fancy function approximation tools (i.e. for regression and classification problems). They are tools that receive a set of values and spit out some discrete or possibly continuous prediction value.
One use case is that there are a lot of really hard+important problems within CS that we can’t solve efficiently exactly (lookup TSP, SOP, SAT and so on) but that we can solve using heuristics or approximations in reasonable time. Often the accuracy of the heuristic even determines the efficiency of our solution.
Additionally, sometimes we want predictions for other reasons. For example, software that relies on user preference, that predicts home values, that predicts the safety of an engineering plan, that predicts the likelihood that a person has cancer, that predicts the likelihood that an object in a video frame is a human etc.
These tools have legitamite and important use cases it’s just that a lot of the hype now is centered around the dumbest possible uses and a bunch of idiots trying to make money regardless of any associated ethical concerns or consequences.
webghost0101@sopuli.xyz 10 months ago
A perfectly valid stance to take.
Grimy@lemmy.world 10 months ago
You don’t have to use it. You can even disconnect from the internet completely.
Whats the benefit of stopping me from using it?
TwilightVulpine@lemmy.world 10 months ago
It’s not like all this data was randomly dumped at the AIs. For data sets to serve as good training materials they need contextual information so that the AI can discern patterns and replicate them when prompted.
We see this when you can literally prompt AIs with whose style you want it to emulate. Meaning that the data it was fed had such information.
Midjourney is facing extra backlash from artists after a spreadsheet was leaked containing a list of artist styles their AI was trained on. Meaning they can keep track of it and they trained the AI with those artists’ works deliberately. They simply pretend this is impossible to figure out so that they might not be liable to seek permission and compensate the artists whose works were used.
givesomefucks@lemmy.world 10 months ago
That’s insane logic…
Like you’re essentially saying I can copy/paste any article without a paywall to my own blog and sell adspace on it…
And your still saying OpenAI is trying to make AI companies pay?
Like, do you think AI runs off free cloud services? The hardware is insanely expensive.
And OpenAI is trying to argue the opposite, that AI companies shouldn’t have to pay to use copyrighted works.
You have zero idea what is going on, but you are really confident you do
webghost0101@sopuli.xyz 10 months ago
I clarified the comment above which was misunderstood, whether it makes a moral/sane argument is subjective and i am not covering that.
I am not sure why you think there is a claim that openAI is trying to make companies pay, on the contrary the comment i was clarifying (so not my opinion/words) states that openAI is making an argument that anyone should be able to use copyrighted materials for free to train AI.
The costs of running an online service like chatgpt is wildly besides the argument presented. You can run your own open source large language models at home about as well as you can run Bethesda’s Starfield on a same spec’d PC
Those Open source large language models are trained on the same collections of data including copyrighted data.
The logic being used here is:
The Ethical dilemma as i understand it is:
tourist@lemmy.world 10 months ago
if someone said this to me I’d cry
Grimy@lemmy.world 10 months ago
If the data has to be paid for, openAI will gladly do it with a smile on their face. It guarantees them a monopoly and ownership of the economy.
Paying more but having no competition except google is a good deal for them.
givesomefucks@lemmy.world 10 months ago
Eh, the issue is lots of people wouldn’t be willing to sell tho.
Like, you think an author wants the chatbot to read their collected works and use that? Regardless of if it’s quoting full texts or “creating” text in their style.
No author is going to want that.
And if it’s up to publishers, they likely won’t either. Why take one small payday if that could potentially lead to loss of sales a few years down the row.
It’s not like the people making the chat it’s just need to buy a retail copy of the text to be in the legal clear.
Grimy@lemmy.world 10 months ago
The publisher’s will absolutely sell imo. They just publish, the book will be worth the same with or without the help of AI to write it.
I guess there is a possibility that people start replacing bought books with personalized book llm outputs but that strikes me as unlikely.
CIA_chatbot@lemmy.world 10 months ago
Hey man, that’s damn hurtful
dependencyinjection@discuss.tchncs.de 10 months ago
I’m not sure if someone else has brought this up, but I could see OpenAI and other early adopters pushing for tighter controls of training data as a means to be the only players in town. You can’t build your own competing AI because you won’t have the same amount of data as us and we’ll corner the market.
LWD@lemm.ee 10 months ago
Maybe Grimy does have concerns, but they’ve never used the words “open source” outside of talking about AI.
Grimy@lemmy.world 10 months ago
It’s current and it’s the only open source project that’s under direct threat? I am both a fan of open source and of generative AI, not sure what that changes in the validity of my arguments.
This isn’t a gotcha but pure rhetoric, which is on par with you. Attack my arguments, or just ignore me the moment it becomes clear you can’t insult yourself out of a debate like you did last time.
I’m not even sure what exactly you are implying but I am not impressed.
LWD@lemm.ee 10 months ago
That your comment history makes you indistinguishable from a concern troll.