Comment on In a paper, media mogul Tim O'Reilly and economist Ilan Strauss say OpenAI likely trained GPT-4o on paywalled O'Reilly Media books without a licensing agreement.

<- View Parent
echodot@feddit.uk ⁨1⁩ ⁨day⁩ ago

You’re the problem is that even if their books are in the data set there’s no evidence that they will taken directly from the source. OpenAI scrape websites right, and O’Reilly books are often pirated because of their predatory business model (they changed their textbooks every year meaning you can’t use a previous year’s book). So it’s entirely possible, although unlikely, that the content got in there from scraping content from a pirate site.

source
Sort:hotnewtop