I’m honestly not sure what you’re trying to say here. If by “it must have access to information for reference” you mean while it is running, it doesn’t. Like I said that information is only available during training.
Comment on Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data
NaibofTabr@infosec.pub 1 year agoChatGPT is a large language model. The model contains word relationships - a nebulous collection of rules for stringing word together. The model does not contain information. In order for ChatGPT to answer flexibly answer questions, it must have access to information for reference - information that it can index, tag and sort for keywords.
ayaya@lemdro.id 1 year ago
NaibofTabr@infosec.pub 1 year ago
Like I said that information is only available during training.
This is not correct. I understand how neural networks function, I also understand that the neural network is not a complete system in itself. In order to be useful, the model is connected to other things, including a source of reference information. For instance, earlier this year ChatGPT was connected to the internet so that it could respond to queries with more up-to-date information. At that point, the neural network was frozen. It was not being actively trained on the internet, it was just connected to it for the sake of completing search queries.
brianorca@lemmy.world 1 year ago
That is an optional feature, not required to make use of an LLM. And not even a feature of most LLMs.
TWeaK@lemm.ee 1 year ago
The dataset ChatGPT uses to train on contains data copied unlawfully. They’re not just reading the data at its source, they’re copying the data into a training database without sufficient license.
Whether ChatGPT itself contains all the works is debatable - is it just word relationships when the system can reproduce significant chunks of copyrighted data from those relationships? - but the process of training inherently requires unlicensed copying.
In terms of fair use, they could argue a research exemption, but this isn’t really research, it’s product development. The database isn’t available as part of scientific research, it’s protected as a trade secret. Even if it was considered research, it absolutely is commercial in nature.
In my opinion, there is a stronger argument that OpenAI have broken copyright for commercial gain than that they are legitimately performing fair use copying for the benefit of society.
Womble@lemmy.world 1 year ago
Every time you load a webpage you are making a local copy of it for your own use, if it is on the open web you are implicitly given permission to make a copy of it for your own use. You are not given permission to then distribute those copies which is where LLMs may get into trouble, but making a copy for the purpose of training is not a breach of copyright as far as I can understand or have heard.
TWeaK@lemm.ee 1 year ago
Yes, you do make a copy of a web page. And every time you load a video game you make a copy into RAM. However, this copying is all permitted under user license - you’re allowed to make minor copies as part of the process of running the software and playing the media.
Case in point, the UK courts ruled that playing pirated games was illegal, because when you load the game from a disc you copy it into RAM, and this copying is not licensed by the player.
OpenAI does not have any license for copying into its database. The terms and conditions of web pages say you’re allowed to view them, not allowed to take the data and use it for things. They don’t explicitly prohibit this (yet), but the lack of a prohibition does not mean a license is implied. OpenAI can only hope for a fair use exemption, and I don’t think they qualify because a) it isn’t really “research” but product development, and even if it is research b) it is purely for commercial gain.
Womble@lemmy.world 1 year ago
Could you point to the judgement on playing copied games was illegal in the UK? I can only find articles about specifically DS copy cartridges which are very obviously intended to make/use unlicensed copies of games to distribute.
Even so, that again hinges on right to distribute, not right to make a copy for personal use. If a game is made freely available on the web for you to play it is not illegal to download that game to play offline or study it.