Comment on OpenAI moves to allow “mature apps” on its platforms
brucethemoose@lemmy.world 1 day agoExcept these AI models need data to train on, they cannot improve without an industry to leach off of.
Not anymore.
The new trend in ML is training on synthetic data, alongside smaller sets of highly curated data.
FartMaster69@lemmy.dbzer0.com 1 day ago
Ah sweet model collapse.
brucethemoose@lemmy.world 1 day ago
That’s sort of a meme, and something I’ve observed myself training GANs. It’s definitely a problem for the stupid (like Tech Bros).
But it doesn’t happen like you think as long as the augmentations are clever, and their scope is narrow. Hence the success of several recent distillations and ‘augmented’ models, and the failure of huge dataset trains like Llama4.
…And synthetic data generation/augmentation is getting clever, and is already being used in newer trains. See this, or newer papers if your search for them on arixv: github.com/qychen2001/Awesome-Synthetic-Data
Or Nvidia’s HUGE focus on this, combining it with their work in computer graphics: www.nvidia.com/…/synthetic-data-physical-ai/