I work in generative AI, specifically curated training sets.
The issue is training on “licensed materials”. If that happened with all AI, no one would have an issue. But its disingenuous to suggest that’s how most AI is currently being trained. A lot of materials have been scraped off of the web, especially for image generation, meaning some portion of the training data was used without the author’s consent or, often, even their knowledge. It’s important to note that scraping training data in this way usually breaks a TOS.
The amount of people I’ve seen supporting AI usage in this context is staggering, with one commenter even telling me it was about the “greed” of the artists, whose work may be in a training set without consent, wanting royalties for slightly changing a parameter with their art (that is, of course, a strawman fallacy).
To me, the only issue here is handling the ethics of what goes into training data and what doesn’t. Authors should have the choice of their materials not being used. Adobe understood this, which is why Firefly being trained on explicitly licensed materials makes it a different beast, to which you allude.
But it’s clear a lot of people don’t understand why using data without consent is a bad thing in this context, and for that reason, some other people will choose not to support companies using it until the issue is resolved. It seems quite reasonable to me.
goldenbug@kbin.social 1 year ago
If you pose the ignorance of a person replying, the custom should be to explain the concepts they do not understand.