So no, no billion dollar company can make their own training data
This statement brought along with it the terrifying thought that there’s a dystopian alternative timeline where companies do make their own training data, by commissioning untold numbers of scientists, engineers, artists, researchers, and other specialties to undertake work that no one else has. But rather than trying to further the sum of human knowledge, or even directly commercializing the fruits of that research, that it’s all just fodder to throw into the LLM training set. A world where knowledge is not only gatekept like Elsevier but it isn’t even accessible by humans: only the LLM will get to read it and digest it for human consumption.
Written by humans, read by AI, spoonfed to humans. My god, what an awful world that would be.
witten@lemmy.world 1 week ago
We’re already living in it. Professional voice actors now have the choice between vying for the dwindling number of voice acting gigs or selling their voice to LLM companies as training data.