Comment on Are there any AI services that don't work on stolen data?
AmbitiousProcess@piefed.social 2 weeks agoThis is very true.
I was part of the OpenAssistant project, voluntarily submitting my personal writing to train open-source LLMs without having to steal data, in the hopes it would stop these companies from stealing people's work and make "AI" less of a black box.
After thousands of people submitting millions of prompt-response pairs, and after some researchers said it was the highest quality natural language dataset they'd seen in a while, the base model was almost always incoherent. You only got a functioning model if you just used the data to fine-tune an existing larger model, Llama at the time.