AI can “learn” from and “read” a book in the same way a person can and does
This statement is the basis for your argument and it is simply not correct.
Training LLMs and similar AI models is much closer to a sophisticated lossy compression algorithm than it is to human learning. The processes are not at all similar given our current understanding of human learning.
AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?
The current Disney lawsuit against Midjourney is illustrative - literally, it includes numerous side-by-side comparisons - of how AI models are capable of recreating iconic copyrighted work that is indistinguishable from the original.
If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.
An AI doesn’t create works on its own. A human instructs AI to do so. Attribution is also irrelevant. If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).
antonim@lemmy.dbzer0.com 9 months ago
If it’s in the same way, then why do you need the quotation marks? Even you understand that they’re not the same.
And either way, machine learning is different from human learning in so many ways it’s ridiculous to even discuss the topic.
That depends on the model and the amount of data it has been trained on. I remember the first public model of ChatGPT producing a sentence that was just one word different from what I found by googling the text (from some scientific article summary, so not a trivial sentence that could line up accidentally). More recently, there was a widely reported-on study of AI-generated poetry where the model was requested to produce a poem in the style of Chaucer, and then produced a letter-for-letter reproduction of the well-known opening of the Canterbury Tales. It hasn’t been trained on enough Middle English poetry and thus can’t generate any of it, so it defaulted to copying a text that probably occurred dozens of times in its training data.