Comment

Comment on "Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?"

Star@sopuli.xyz ⁨1⁩ ⁨year⁩ ago

It’s so ridiculous when corporations steal everyone’s work for their profit, no one bats an eye but when a group of individuals do the same to make education and knowledge free for everyone it’s somehow illegal, unethical, immoral and what not.

source

Sort:hotnew top

Grimy@lemmy.world ⁨1⁩ ⁨year⁩ ago
Using publically available data to train isn’t stealing.

Daily reminder that the ones pushing this narrative are literally corporation like OpenAI. If you can’t use copyright materials freely to train on, it brings up the cost in such a way that only a handful of companies can afford the data.

They want to kill the open-source scene and are manipulating you to do so. Don’t build their moat for them.

source
- givesomefucks@lemmy.world ⁨1⁩ ⁨year⁩ ago
  And using publicly available data to train gets you a shitty chatbot…
  
  Hell, even using copyrighted data to train isn’t that great
  
  source
  - webghost0101@sopuli.xyz ⁨1⁩ ⁨year⁩ ago
    The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn’t copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.
    
    This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.
    
    The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.
    
    source
    be_excellent_to_each_other@kbin.social ⁨1⁩ ⁨year⁩ ago
    So then we as a society aren't ready to untangle the mess of our infancy in the digital age. ChatGPT isn't something we must have at all costs, it's something we should have when we can deploy it while still respecting the rights of people who have made the content being used to train it.
    
    source
    -> View More Comments
    RainfallSonata@lemmy.world ⁨1⁩ ⁨year⁩ ago
    I didn’t want any of this shit. IDGAF if we don’t have AI. I’m still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.
    
    source
    -> View More Comments
    TwilightVulpine@lemmy.world ⁨1⁩ ⁨year⁩ ago
    It’s not like all this data was randomly dumped at the AIs. For data sets to serve as good training materials they need contextual information so that the AI can discern patterns and replicate them when prompted.
    
    We see this when you can literally prompt AIs with whose style you want it to emulate. Meaning that the data it was fed had such information.
    
    Midjourney is facing extra backlash from artists after a spreadsheet was leaked containing a list of artist styles their AI was trained on. Meaning they can keep track of it and they trained the AI with those artists’ works deliberately. They simply pretend this is impossible to figure out so that they might not be liable to seek permission and compensate the artists whose works were used.
    
    source
    givesomefucks@lemmy.world ⁨1⁩ ⁨year⁩ ago
    That’s insane logic…
    
    Like you’re essentially saying I can copy/paste any article without a paywall to my own blog and sell adspace on it…
    
    And your still saying OpenAI is trying to make AI companies pay?
    
    Like, do you think AI runs off free cloud services? The hardware is insanely expensive.
    
    And OpenAI is trying to argue the opposite, that AI companies shouldn’t have to pay to use copyrighted works.
    
    You have zero idea what is going on, but you are really confident you do
    
    source
    -> View More Comments
  - tourist@lemmy.world ⁨1⁩ ⁨year⁩ ago
    
    Was this a meta joke and you had a chatbot write your comment?
    
    if someone said this to me I’d cry
    
    source
  - Grimy@lemmy.world ⁨1⁩ ⁨year⁩ ago
    If the data has to be paid for, openAI will gladly do it with a smile on their face. It guarantees them a monopoly and ownership of the economy.
    
    Paying more but having no competition except google is a good deal for them.
    
    source
    givesomefucks@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Eh, the issue is lots of people wouldn’t be willing to sell tho.
    
    Like, you think an author wants the chatbot to read their collected works and use that? Regardless of if it’s quoting full texts or “creating” text in their style.
    
    No author is going to want that.
    
    And if it’s up to publishers, they likely won’t either. Why take one small payday if that could potentially lead to loss of sales a few years down the row.
    
    It’s not like the people making the chat it’s just need to buy a retail copy of the text to be in the legal clear.
    
    source
    -> View More Comments
  - CIA_chatbot@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Hey man, that’s damn hurtful
    
    source
  - dependencyinjection@discuss.tchncs.de ⁨1⁩ ⁨year⁩ ago
    I’m not sure if someone else has brought this up, but I could see OpenAI and other early adopters pushing for tighter controls of training data as a means to be the only players in town. You can’t build your own competing AI because you won’t have the same amount of data as us and we’ll corner the market.
    
    source
  - LWD@lemm.ee ⁨1⁩ ⁨year⁩ ago
    Maybe Grimy does have concerns, but they’ve never used the words “open source” outside of talking about AI.
    
    source
    Grimy@lemmy.world ⁨1⁩ ⁨year⁩ ago
    It’s current and it’s the only open source project that’s under direct threat? I am both a fan of open source and of generative AI, not sure what that changes in the validity of my arguments.
    
    This isn’t a gotcha but pure rhetoric, which is on par with you. Attack my arguments, or just ignore me the moment it becomes clear you can’t insult yourself out of a debate like you did last time.
    
    I’m not even sure what exactly you are implying but I am not impressed.
    
    source
    -> View More Comments
- TwilightVulpine@lemmy.world ⁨1⁩ ⁨year⁩ ago
  OpenAI is definitely not the one arguing that they have stole data to train their AIs, and Disney will be fine whether AI requires owning the rights to training materials or not. Small artists, the ones protesting the most against it, will not. They are already seeing jobs and commission opportunities declining due to it.
  
  Being publicly available in some form is not a permission to use and reproduce those works however you feel like. Only the real owner have the right to decide. We on the internet have always been a bit blasé about it, sometimes deservedly, but as we get to a point we are driving away the very same artists that we enjoy and get inspired by, maybe we should be a bit more understanding about their position.
  
  source
  - Grimy@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Thats basically my main point, Disney doesn’t need the data, Getty either. AI isn’t going away and the jobs will be lost no matter what.
    
    Putting a price tag in the high millions for any kind of generative model only benefits the big players.
    
    I feel for the artists. It was already a very competitive domain that didn’t really pay well and it’s now much worse but if they aren’t a household name, they aren’t getting a dime out of any new laws.
    
    I’m not ready to give the economy to Microsoft, google, Getty and Adobe so GRRM can get a fat payday.
    
    source
    TwilightVulpine@lemmy.world ⁨1⁩ ⁨year⁩ ago
    If AI companies lose, small artists may have the recourse of seeking compensation for the use and imitation of their art too. Just feeling for them is not enough if they are going to be left to the wolves.
    
    There isn’t a scenario here in which big media companies lose so talking of it like it’s taking a stand against them doesn’t make much sense. What are we fighting for here? That we get to generate pictures of Goofy? The small AI user’s win here seems like such a silly novelty that I can’t see how it justifies just taking for granted that artists will have it much rougher than they already have.
    
    The reality here is that even if AI gets the free pass, large media and tech companies are still primed to profit from them far more than any small user. They will be the one making AI-assisted movies and integrating chat AI into their systems. They don’t lose in either situation.
    
    There are ways to train AI without relying on unauthorized copyrighted data. Even if OpenAI loses, it wouldn’t be the death of the technology. It may be more efficient and effective to train them with that data, but why is “efficiency” enough to justify this overreach?
    
    And is it even wise to be so callous about it? Because it’s not going to stop with artists. This technology has the potential to replace large swaths of service industries. If we don’t think of the human costs now, it will be even harder to make a case for everyone else.
    
    source
    -> View More Comments
- winterayars@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
  That depends on what your definition of “publicly available” is. If you’re scraping New York Times articles and pulling art off Tumblr then yeah, it’s exactly stealing in the same way scihub is. Only difference is, scihub isn’t boiling the oceans in an attempt to make rich people even richer.
  
  source
  - unionagainstdhmo@aussie.zone ⁨1⁩ ⁨year⁩ ago
    Also Sci-hub don’t make any money off the works
    
    source
- kibiz0r@lemmy.world ⁨1⁩ ⁨year⁩ ago
  We have a mechanism for people to make their work publically visible while reserving certain rights for themselves.
  
  Are you saying that creators cannot (or ought not be able to) reserve the right to ML training for themselves? What if they want to selectively permit that right to FOSS or non-profits?
  
  source
  - BURN@lemmy.world ⁨1⁩ ⁨year⁩ ago
    That’s exactly what they’re saying. The AI proponents believe that copyright shouldn’t be respected and they should be able to ignore any licensing because “it’s hard to find data otherwise”
    
    source
  - Grimy@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Essentially yes. There isn’t a happy solution where FOSS gets the best images and remains competitive. The amount of data needed is outside what can be donated. Any open source work will be so low in quality as to be unusable.
    
    It also won’t be up to them. The platforms where the images are posted will be selling and brokering. No individual is getting a call unless they are a household name.
    
    None of the artists are getting paid either way so yeah, I’m thinking of society in general first.
    
    source
    kibiz0r@lemmy.world ⁨1⁩ ⁨year⁩ ago
    The artists (and the people who want to see them continue to have a livelihood, a distinct voice, and a healthy engaged fanbase) live in that society.
    
    The platforms where the images are posted will be selling and brokering
    
    Isn’t this exactly the problem though?
    
    From books to radio to TV, movies, and the internet, there’s always:
    
    One group of people who create valuable works
    
    Another group of people who monopolize distribution of those works
    
    The distributors hijack ownership (or de facto ownership) of the work, through one means or another (either logistical superiority, financing requirements, or IP law fuckery) and exploit their position to make themselves the only channel for creators to reach their audience and vice-versa.
    
    That’s the precise pattern that OpenAI is following, and they’re doing it at a massive scale.
    
    It’s not new. Youtube, Reddit, Facebook, MySpace, all of these companies started with a public pitch about democratizing access to content. But a private pitch emerged, of becoming the main way that people access content. When it became feasible for them to turn against their users and liquidate them, they did.
    
    The difference is that they all had to wait for users to add the content over time. Imagine if Google knew they could’ve just seeded Google Video with every movie, episode, and clip ever aired or uploaded anywhere. Just say, “Mon Dieu! It’s impossible for us to run our service without including copyrighted materials! Woe is us!” and all is forgiven.
    
    But honestly, whichever way the courts decide, the legality of it doesn’t matter to me. It’s clearly a “Whose Line Is It?” situation where the rules are made up and ownership doesn’t matter. So I’m looking at “Does this consolidate power, or distribute it?” And OpenAI is pulling perhaps the biggest power grab that we’ve seen.
    
    –
    
    Unrelated: I love that there’s a very distinct echo of something we saw with the previous era of tech grift, crypto. The grifters would always say, after they were confronted, “Well, there’s no way to undo it now! It’s on the blockchain!” There’s always this back-up argument of “it’s inevitable so you might as well let me do it”.
    
    source
- grue@lemmy.world ⁨1⁩ ⁨year⁩ ago
  
  They want to kill the open-source scene
  
  Yeah, by using the argument you just gave as an excuse to “launder” copyleft code in the training data into permissively-licensed output.
  
  source
  - Grimy@lemmy.world ⁨1⁩ ⁨year⁩ ago
    100% agree, making all outputs copyleft is a great solution. We get to keep the economic and cultural boom that AI brings while keeping the big companies in check.
    
    source
- deweydecibel@lemmy.world ⁨1⁩ ⁨year⁩ ago
  The point is the entire concept of AI training off people’s work to make profit for others is wrong without the permission of and competition for the creator regardless if it’s corporate or open source.
  
  source
  - Angry_Maple@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
    I think I’ve decided to not publish anything that I want to keep ownership of, just in case. There’s an entire planet’s worth of countries, which will all have their own sets of laws. It takes waay too long to polish something, only to just give it away for free haha. Someone else is free to do that work if it is that easy. No skin off my back.
    
    I think it’s similar to many other hand-made crafts/items. Most people will buy their clothes from stores, but there are definitely still people who make beautiful clothing from hand better than machines could.
    
    Don’t even get me started on stuff like knitting. It already costs the creator a crap ton of money just for the materials. It takes a crap ton of time to make those, too. Despite the costs, many people just expect those knitted pieces for practically free. The people who expect that pricing are also free to go with machine-produced crafts/items instead.
    
    It comes down to what people want, and what they’re willing to pay, imo. Some people will find value in something physically being put together by another human, and other people will find value in having more for less. Neither is “wrong” necessarily, so long as no one is literally ripped off. (With over 8 billion people, it’s bound to happen at least once. I feel bad for whoever that is.)
    
    That being said, we’ll never be able to honestly say that the specific skills and techniques that are currenty required are the exact same. It would be like calling a photographer amazing at realism painting because their photo looks like real life. Photographers and painters both have their place, but they are not the exact same.
    
    I think that’s also part of what’s frustrating so many artists. Coding AI is not the same as using the colour wheel, choosing materials, working fine motor control, etc. It’s not learning about shadows, contrast, focal points, etc. I can definitely understand people not wanting those aspects to be brushed off, especially since it usually takes most of a lifetime to achieve. A music generator and a violin may both make great music, but they are not the same, and they require different technical skills.
    
    I’ll never buy AI art if I have any say in the matter. I’ll support handmade stuff first, every time.
    
    source
    Grimy@lemmy.world ⁨1⁩ ⁨year⁩ ago
    There is definitely more value in hand made art. Even the fanciest prints on canvas can’t compare and I don’t think AI art will be evoking the same feelings a john waterhouse exhibit does.
    
    On the subject of publishing, I’ve chosen to embrace it personally. My view is that even the hidden stuff on our comp ends up in a Chinese or US databases anyways.
    
    source
  - Meowoem@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
    I love that the people who push this kind of rhetoric often consider themselves left wing, it’s just so silly.
    
    ‘every word you ever utter must be considered private property and no other human may benefit from it without payments!’
    
    I mean yes I know you’re going to say socialism is about workers getting fair pay but come on, this is just pure rent seeking. We’re a global community of people, if this comment helps train an ai that can help other people better live their lives, better access medicine and education or other services then I think that’s a wonderful thing.
    
    And yes of course it should be open source and free to all people, that’s why these pushes to make sure only corporations can afford ai are so infuriating
    
    source
    General_Effort@lemmy.world ⁨1⁩ ⁨year⁩ ago
    So true.
    
    This talking point, too, is so infuriatingly silly:
    
    I mean yes I know you’re going to say socialism is about workers getting fair pay
    
    Workers, by definition, don’t own what they produce. Copyrights are intellectual property; business capital. Somehow, capitalists are workers in the minds of these people. This is your mind on trickle-down economics.
    
    source
- BURN@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Too bad
  
  If you can’t afford to pay the authors of the data required for your project to work, then that sucks for you, but doesn’t give you the right to take anything you want and violate copyright.
  
  Making a data agnostic model and releasing the source is fine, but a released, trained model owes royalties to its training data.
  
  source
- Asafum@feddit.nl ⁨1⁩ ⁨year⁩ ago
  Scientific research papers are generally public too, in that you can always reach out to the researcher and they’ll provide the papers for free, it’s just the “corporate” journals that need their profit off of other peoples work…
  
  source
- SchizoDenji@lemm.ee ⁨1⁩ ⁨year⁩ ago
  All of the AI fear mongering is fuelled by mega corps who fear that AI in some sort will eat into their profits.
  
  source
  - Meowoem@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
    Yeah, just wait until they see the ai design tools that allow anyone to casually describe the spare part or upgrade they want and it’ll be designed and printed at home or local fab shop.
    
    Lot of once fairly safe monopolies are going to start looking very shaky, and then things like natural language cookery toolarms disrupting even more…
    
    We’ve only barely started to see what the tech we have now is able to do, yes a million shitty chat bots / img gen apps are cashing in on the hype but when we start seeing some killer apps emerge it’s when people won’t be able to ignore it any longer
    
    source
- General_Effort@lemmy.world ⁨1⁩ ⁨year⁩ ago
  True, Big Tech loves monopoly power. It’s hard to see how there can be an AI monopoly without expanding intellectual property rights.
  
  It would mean a nice windfall profit for intellectual property owners. I doubt they worry about open source or competition but only think as far as lobbying to be given free money. It’s weird how many people here, who are probably not all rich, support giving extra money to owners, merely for owning things. That’s how it goes when you grow up on Ayn Rand, I guess.
  
  source
- Coasting0942@reddthat.com ⁨1⁩ ⁨year⁩ ago
  This is the hardest thing to explain to people. Just convert it into a person with unlimited memory.
  
  Open AI is sending said person to view every piece of human work, learns and makes connections, then make art or reports based on what you tell/ask this person.
  
  Sci-Hub is doing the same thing but you can ask it for a specific book and they will write it down word for word for you, an exact copy.
  
  Both morally should be free to do so. But we have laws that say the sci-hub human is illegally selling the work of others. Whereas the open ai human has to be given so many specific instructions to reproduce a human work that it’s practically like handing it a book and it handing the book back to you.
  
  source
- Mango@lemmy.world ⁨1⁩ ⁨year⁩ ago
  What data is public?
  
  source
richieadler@lemmy.myserv.one ⁨1⁩ ⁨year⁩ ago
Cue the Max Headroom episode where the blanks (disconnected people) are chased by the censors because they steal cable so their children can watch the educational shows and learn to read, and they are forced to use clandestine printing presses to teach them.

source
- mPony@kbin.social ⁨1⁩ ⁨year⁩ ago
  what's this? an anti-corporate message that sneers at cable TV companies??? CANCEL THAT SHOW!!!
  
  that show was so amazingly prescient: the theme of the first episode was how advertising literally kills its viewers and the news covers things up. No wonder they didn't get renewed. ;)
  
  source
- grue@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Reminds me of this: www.gnu.org/philosophy/right-to-read.html
  
  source
burliman@lemmy.world ⁨1⁩ ⁨year⁩ ago
[deleted]
source
- givesomefucks@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Because it’s easy to get these chatbots to output direct copyrighted text…
  
  Even ones the company never paid for, not even just a subscription for a single human to view the articles they’re reproducing. Like, think of it as buying a movie, then burning a copy for anyone who asks.
  
  Which reproducing word for word for people who didn’t pay is still a whole nother issue. So this is more like torrenting a movie, then seeding it.
  
  source
  - burliman@lemmy.world ⁨1⁩ ⁨year⁩ ago
    It’s not that easy, don’t believe the articles being broadcasted every day. They are heavily cherry picked.
    
    Also, if someone is creating copyright works, it is on that person to be responsible if they release or sell it, not the tool they used. Just because the tool can be good (learns well and responds well when asked to make a clone of something) doesn’t mean it is the only thing it does or must do. It is following instructions, which were to make a thing. The one giving the instructions is the issue, and the intent of that person when they distribute is the issue.
    
    If I draw a perfect clone of Donald Duck in the privacy of my home after looking at hundreds of Donald Duck images online, there is nothing wrong with that. If I go on Etsy and start selling them without a license, they will come after ME. Not because I drew it, but because I am selling it and violating a copyright. They won’t go after the pencil or ink manufacturer. And they won’t go after Adobe if I drew it on a computer with Photoshop.
    
    source
    givesomefucks@lemmy.world ⁨1⁩ ⁨year⁩ ago
    
    If I draw a perfect clone of Donald Duck in the privacy of my home after looking at hundreds of Donald Duck images online, there is nothing wrong with that
    
    In your picture example it would be an exact copy…
    
    But even if you started a business and when people asked for a picture of Donald Duck, giving them a traced copy is still copyright infringement…
    
    The worst thing about these chatbots is the people who think it’s amazing don’t understand what it’s doing. If you understood it, it wouldn’t be impressive.
    
    source
    -> View More Comments
- TwilightVulpine@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Because humans have more rights than tools. You are free to look at copyrighted text and pictures, memorize them and describe them to others. It doesn’t mean you can use a camera to take and share pictures of it.
  
  Acting like every right that AIs have must be identical to humans’, and if not that means the erosion of human rights, is a fundamentally flawed argument.
  
  source
- TrickDacy@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Whoosh
  
  source