Auto complete is not a lossy encoding of a database either, it’s a product of a dataset, just like you are a product of your experiences, but it is not wholly representative of that dataset.
If LLMs don’t encode their training data, then why are they proving susceptible to data exfiltration techniques where they output the content of their training dataset verbatim? m.youtube.com/watch?v=L_1plTXF-FE
Not_mikey@lemmy.world 11 months ago
I’m not saying it doesn’t encode some of its training data, I’m saying it’s not just encoding its training data. It probably does “memorize” a bunch of trivial facts from its training data and regurgitate them when asked. I’m saying that’s not all they are and that’s not what makes the intelligent, their ability to also answer questions outside their training data is.
knightly@pawb.social 11 months ago
But they don’t “answer questions”, they just respond to prompts. You can’t use them to learn anything without checking their responses against authoritative sources you should have used in the first place.
There’s no intelligence there, just a plagirism laundromat and some rules for formatting text like a 7th grader.
Not_mikey@lemmy.world 11 months ago
It can answer questions as well as any person. Just because you may need to check with another source doesn’t mean it didn’t answer the question it just means you can’t fully trust it. If I ask someone who’s the fourth u.s. president and they say Jefferson they still answered the question, they just answered it wrong. You also don’t have to check with another source in the same way you do with asking a person a question, if it sounds right. If that person answered Madison and I faintly recall it and think it sounds right I will probably not check their answer and take it as fact.
For example I asked chatgpt for a chocolate chip cookie recipe once. I make cookies pretty often so would know if the recipe seemed off but the one it provided seemed good, I followed it and made some pretty good cookies. It answered the question correctly as shown by the cookies. You could argue it plagiarized but while the ingredients and steps were pretty close to some I found later none were a perfect match which is about as good as you can get with recipes which tend to converge in the same thing. The only real difference between most of them is the dumb story they give at the beginning which thankfully chatgpt doesn’t do.
The 7th grader and plagiarism comment make me think you haven’t played with them much or really tested them. I have had it write contracts, one of which I had reviewed by a lawyer who only had some small comments, as well as other letters and documents I needed for my mortgage and buying a home. All of these were looked over by proffesionals and none of them realized it was a bot. None of them were plagiarized too because the parameters I gave it and the output it created were way too unique to be in its training set.
knightly@pawb.social 11 months ago
Of course I have, my employer has me shoehorning ChatGPT into everything, and I agree with what the research says: Children can answer questions better than LLMs can.
techxplore.com/…/2023-12-artificial-intelligence-…
Image
Stochastic plagirism is still plagirism.