Comment

Comment on Ars Technica content is now available in OpenAI services

I want Ars content to be part of whatever training data is provided to the best models. How does that get done without appearing like they are being bought?

Even if their contract explicitly states that it is a data sharing agreement only and the products of the media organization (articles/investigations) are not grounds for breach or retaliation, it is assumed that there is now some impartiality in future reporting.

So, for all media companies, the options seem to be:

Contribute to the greater good by openly permitting site scraping (for $0)
Allow data sharing to contracted parties only (for a fee)
Public or privately prohibit use of any data, and then seek damages down the road for theft/copyright infringement when the legal framework has been established.

Is there a GPL or other license structure that permits data sharing for LLM training in a way that it does not get transformed into something evil?

source

Sort:hotnew top