Comment

Comment on AI’s Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source

melfie@lemy.lol ⁨5⁩ ⁨months⁩ ago

I don’t follow how LLMs destroy open source. For example, a LLM trained on the Linux kernel could probably be used to produce a closed source kernel with a lot of human effort. Big tech companies already make a lot of money from Linux without ever contributing back. That doesn’t change the fact that we can all run Linux and not be trapped using proprietary garbage like Windows. Community contributions still help create a rising tide that raises all boats, and shitty big tech companies having their own massive yachts raised as well doesn’t really change that fact.

I hate big tech companies and the AI grift as much as anyone else here, but don’t really follow the article’s point.

source

Sort:hotnew top

neukenindekeuken@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
Essentially almost all FOSS software is under an OSS license of some sort, which allows anyone to re-use their code or software as long as what re-uses it also remains free and open source or at least having at least as open/permissive of a license policy as the original work/code.

LLMs ignore that, hide it behind a subscription, and use it to train their models for selling to soulless corporate entities who will never ever allow their code to be in the FOSS world, thus, breaking the contract.

It’s not even an implicit contract, it’s explicit, and LLM companies are ignoring this and using their investment to squash any FOSS projects that want to challenge them in court on it.

source
- melfie@lemy.lol ⁨5⁩ ⁨months⁩ ago
  Given that LLMs increase productivity in the aggregate by 15-20%, and sticking with Linux as an example, a LLM trained on the Linux kernel could be used to make a similar kernel with a ton of human effort. That company could then make a proprietary OS and sell it. Other companies then have the choice of using open source Linux, devoting a ton of their own resources to making a proprietary OS with a little help from AI, or licensing the other company’s proprietary OS. Everyone else can still use Linux and not care.
  
  It’s possible I’m using the wrong example or overlooking something that would help me better understand this perspective.
  
  source
  - neukenindekeuken@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
    There is absolutely no way you’re using an LLM to rewrite the Linux kernel in any way. That’s not what they do, and whatever it produces wouldn’t be even a fraction of effective as the current kernel.
    
    They’re text prediction machines. That’s it. Markov generators on steroids.
    
    I’d also be curious about where that 15-20% productivity increase comes from in aggregate. That’s an extremely misleading statistic. The truth is there are no consensus data on any productivity improvements with LLMs today in aggregate. Anything anyone has is made up. It’s also not taking into account the additional bugs and issues caused by LLMs, which are significant, and also not a thing you want to have happening on every PR with kernel code, I promise.
    
    Regardless of all of that, the companies with these LLMs are using free software to train their models to make money without making their models free and open source or providing a way for people to use it for free/open source projects, so this is a clear violation of every single FOSS license model I’m familiar with (most commonly used is the Apache one).
    
    TL;DR; they are stealing code meant to be free and public with any derivative works, profiting off it, and then refusing to honor the license model of the code/project they stole.
    
    This is illegal. The only reason why we’re not seeing a lot about it is these FOSS generally have no money and are not going to sue them and potentially lose a substantial sum of their negligible funds in court. That’s it. Otherwise, what they are doing is very illegal. The sort of thing any professional software development company you work for’s legal team warns you about the second you start using an OSS project in your for profit business application codebase.
    
    LLMs get away with it because $$$$$$$$$$$$$$$$$. That’s it.
    
    source
    melfie@lemy.lol ⁨5⁩ ⁨months⁩ ago
    
    I’d also be curious about where that 15-20% productivity increase comes from in aggregate.
    
    This is from a Stanford study that is summarized here:
    
    linkedin.com/…/does-ai-actually-boost-developer-p…
    
    There are other studies with different conclusions, but this one aligns with my own experience. To your point about how AI won’t reproduce the Linux kernel, this study also points out that AI is significantly less effective, even going into the negative, with complex codebases, which is in agreement with what you said, since the Linux kernel certainly qualifies as a complex codebase.
    
    they are stealing code meant to be free and public with any derivative works, profiting off it, and then refusing to honor the license model of the code/project they stole.
    
    I agree big tech is using open source unethically, but how much different is this situation from the other ways big tech profits from open source without contributing back?
    
    source
    -> View More Comments
  - phil@lymme.dynv6.net ⁨5⁩ ⁨months⁩ ago
    As i understand, Linux is under a license (GPL) which explicitly prevents closing the reuse of its code. So it’s all about legal interpretation of “reuse” and so far the GPL stands up against abuses. I suppose that a company specifically targeting the source code for the intend of creating another OS might need to hire more lawyers than developers with far from certain results. But who knows, in a world where billions $ are free as soon as “AI” is mentioned in a business plan.
    
    source
melfie@lemy.lol ⁨5⁩ ⁨months⁩ ago
Not sure why I’m getting so many downvotes in this thread, aside from the fact that it may sound like I’m standing up for big tech, which I’m not. This article is more or less saying that open source is doomed as a result of big tech’s LLMs, and I’m saying it’s AI that is ultimately doomed and open source will be just fine. AI isn’t going to make it any easier to replicate the open source projects used to train it, well, for the same reason is doomed to fail: AI is based on exaggerated claims. No, companies aren’t going to use AI to make their own Linux kernel not bound by GPL licensing terms. What’s going to happen is the commercial AI bubble is going to pop, perhaps leaving behind open source AI models that will be used for the modest value they bring for certain tasks.

source