Training data IS a massive industry already. You don’t see it because you probably don’t work in a field directly dealing with it. I work in medtech and millions and millions of dollars are spent acquiring training data every year. Should some new unique IP right be found on using otherwise legally rendered data to train AI, it is almost certainly going to be contracted away to hosting platforms via totally sound ToS and then further monetized such that only large and we’ll funded corporate entities can utilize it.
Comment on The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates
mriormro@lemmy.world 2 months agoI love that the collectivist ideal of sharing all that we’ve created for the betterment of humanity is being twisted into this disgusting display of corporate greed and overreach. OpenAI doesn’t need shit. They don’t have an inherent right to exist but must constantly make the case for it’s existence.
The bottom line is that if corporations need data that they themselves cannot create in order to build and sell a service then they must pay for it. One way or another.
I see this all as parallels with how aquifers and water rights have been handled and I’d argue we’ve fucked that up as well.
FatCrab@lemmy.one 2 months ago
Eccitaze@yiffit.net 2 months ago
unique
“unique new IP right?” Bruh you’re talking about basic fucking intellectual property law. Just because someone posts something publicly on the internet doesn’t mean that it can be used for whatever anybody likes. This is so well-established, that every major art gallery and social media website has a clause in their terms of service stating that you are granting them a license to redistribute that content. And most websites also explicitly state that when you upload your work to their site that you still retain your copyright of that work.
For example (emphasis mine):
4.1 When you upload content to Fur Affinity via our services, you grant us a non-exclusive, worldwide, royalty-free, sublicensable, transferable right and license to use, host, store, cache, reproduce, publish, display (publicly or otherwise), perform (publicly or otherwise), distribute, transmit, modify, adapt, and create derivative works of, that content. These permissions are purely for the limited purposes of allowing us to provide our services in accordance with their functionality (hosting and display), improve them, and develop new services. These permissions do not transfer the rights of your content or allow us to create any deviations of that content outside the aforementioned purposes.
Posting Content
You keep copyright of any content posted to Inkbunny. For us to provide these services to you, you grant Inkbunny non-exclusive, royalty-free license to use and archive your artwork in accordance with this agreement.
When you submit artwork or other content to Inkbunny, you represent and warrant that:
* you own copyright to the content, or that you have permission to use the content, and that you have the right to display, reproduce and sell the content. You license Inkbunny to use the content in accordance with this agreement;
- Copyright in Your Content
DeviantArt does not claim ownership rights in Your Content. For the sole purpose of enabling us to make your Content available through the Service, you grant DeviantArt a non-exclusive, royalty-free license to reproduce, distribute, re-format, store, prepare derivative works based on, and publicly display and perform Your Content. Please note that when you upload Content, third parties will be able to copy, distribute and display your Content using readily available tools on their computers for this purpose although other than by linking to your Content on DeviantArt any use by a third party of your Content could violate paragraph 4 of these Terms and Conditions unless the third party receives permission from you by license.
When you upload content to e621 via our services, you grant us a non-exclusive, worldwide, royalty-free, sublicensable, transferable right and license to use, host, store, cache, reproduce, publish, display (publicly or otherwise), perform (publicly or otherwise), distribute, transmit, downsample, convert, adapt, and create derivative works of, that content. These permissions are purely for the limited purposes of allowing us to provide our services in accordance with their functionality (hosting and display), improve them, and develop new services. These permissions do not transfer the rights of your content or allow us to create any deviations of that content outside the aforementioned purposes.
Your Rights and Grant of Rights in the Content
You retain your rights to any Content you submit, post or display on or through the Services. What’s yours is yours — you own your Content (and your incorporated audio, photos and videos are considered part of the Content).
By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods now known or later developed (for clarity, these rights include, for example, curating, transforming, and translating). This license authorizes us to make your Content available to the rest of the world and to let others do the same.
The permissions you give us We need certain permissions from you to provide our services:
-
Permission to use content you create and share: Some content that you share or upload, such as photos or videos, may be protected by intellectual property laws.
-
You retain ownership of the intellectual property rights (things like copyright or trademarks) in any such content that you create and share on Facebook and other Meta Company Products you use. Nothing in these Terms takes away the rights you have to your own content. You are free to share your content with anyone else, wherever you want.
-
However, to provide our services we need you to give us some legal permissions (known as a “license”) to use this content. This is solely for the purposes of providing and improving our Products and services as described in Section 1 above.
-
Specifically, when you share, post, or upload content that is covered by intellectual property rights on or in connection with our Products, you grant us a non-exclusive, transferable, sub-licensable, royalty-free, and worldwide license to host, use, distribute, modify, run, copy, publicly perform or display, translate, and create derivative works of your content (consistent with your privacy and application settings). This means, for example, that if you share a photo on Facebook, you give us permission to store, copy, and share it with others (again, consistent with your settings) such as Meta Products or service providers that support those products and services. This license will end when your content is deleted from our systems.
I could go on, but I think I’ve made my point very clear: Every social media website and art gallery is built on an assumption that the person uploading art A) retains the copyright over the items they upload, B) that other people and organizations have NO rights to copyrighted works unless explicitly stated otherwise, and C) that 3rd parties accessing this material do not have any rights to uploaded works, since they never negotiated a license to use these works.
FatCrab@lemmy.one 2 months ago
You are misunderstanding what I’m getting at and unfortunately no this isn’t just straightforwardly copyright law whatsoever. The training content does not need to be copied. It isn’t saved in a database somewhere (as part of the training…downloading pirated texts is a whole other issue completely removed from the inherent processes of training a model), relationships are extracted from the material, however it is presented. So the copyright extends to the right of displaying the material in the first place. If your initial display/access to the training content is non-infringing, the mere extraction of relationships between components is not itself making a copy nor is it making a derivative work in any way we haven’t historically considered it. Effectively, it’s the difference between looking at material and making intensive notes of how different parts of the material relate to each other and looking at a material and reproducing as much of it as possible for your own records.
Eccitaze@yiffit.net 2 months ago
FFS, the issue is not that the AI model “copies” the copyrighted works when it trains on them–I agree that after an AI model is trained, it does not meaningfully retain the copyrighted work. The problem is that the reproduction of the copyrighted work–i.e. downloading the work to the computer, and then using that reproduction as part of AI model training–is being done for a commercial purpose that infringes copyright.
If I went to DeviantArt and downloaded a random piece of art to my hard drive for my own personal enjoyment, that is a non-infringing reproduction. If I then took that same piece of art, and uploaded it to a service that prints it on a T-shirt, the act of uploading it to the T-shirt printing service’s server would be infringing, since it is no longer being reproduced for personal enjoyment, but the unlawful reproduction of copyrighted material for commercial purpose. Similarly, if I downloaded a piece of art and used it to print my own T-shirts for sale, using all my own computers and equipment, that would also be infringing. This is straightforward, non-controversial copyright law.
The exact same logic applies to AI training. You can try to camouflage the infringement with flowery language like “mere extraction of relationships between components,” but the purpose and intent behind AI companies reproducing copyrighted works via web scraping and downloading copyrighted data to their servers is to build and provide a commercial, for-profit service that is designed to replace the people whose work is being infringed. Full stop.
VoterFrog@lemmy.world 2 months ago
They do, though. They purchase data sets from people with licenses, use open source data sets, and/or scrape publicly available data themselves. Worst case they could download pirated data sets, but that’s copyright infringement committed by the entity distributing the data without a license.
Beyond that, copyright doesn’t protect the work from being used to create something else, as long as you’re not distributing significant portions of it. Movie and book reviewers won that legal battle long ago.