Comment on AI industry horrified to face largest copyright class action ever certified
JustARaccoon@lemmy.world 2 days agoIn theory sure, but in practice who has the resources to do large scale model training on huge datasets other than large corporations?
FauxLiving@lemmy.world 2 days ago
Distributed computing projects, large non-profits, people in the near future with much more powerful and cheaper hardware, governments which are interested in providing public services to their citizens, etc.
Look at other large technology projects. The Human Genome Project spent $3 billion to sequence the first genome but now you can have it done for around $500. This cost reduction is due to the massive, combined, effort of tens of thousands of independent scientists working on the same problem. It isn’t something that would have happened if Purdue Pharma owned the sequencing process and required every scientist to purchase a license from them in order to do research.
LLM and diffusion models are trained on the works of everyone who’s ever been online (which is stored in the Common Crawl datasets). We should not be cheering for a world where it is illegal to use this dataset and, instead, we are forced to license massive datasets from publishing companies.
The amount of progress on these types of models would immediately stop, there would be 3-4 corporations would could afford the licenses. They would have a de facto monopoly on LLMs and could enshittify them without worry of competition.
JustARaccoon@lemmy.world 1 day ago
The world you’re envisioning would only have paid licenses, who’s to say we can’t have a “free for non commercial purposes” license style for it all?