How Googlers cracked an SF rival's tech model with a single word | A research team from the tech giant got ChatGPT to spit out its private training data

⁨183⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨silence7@slrpnk.net⁩ to ⁨technology@lemmy.world⁩

https://www.sfgate.com/tech/article/google-openai-chatgpt-break-model-18525445.php

Basically: you ask for poems forever, and LLMs start regurgitating training data:

Image

source

Comments

Sort:hotnew top

kakes@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
The craziest part about all this is that they claim to have found a method to get factual information from an LLM.

source
- bloopernova@programming.dev ⁨1⁩ ⁨year⁩ ago
  
  The craziest part about all this is that they claim to have found a method to get factual information from an LLM.
  
  ヾ⁠(⁠⁠’⁠Ｏ⁠’⁠⁠)⁠/
  
  source
Hackerman_uwu@lemmy.world ⁨1⁩ ⁨year⁩ ago
The article, that begins with the word: “How” does not explain how.

source
- silence7@slrpnk.net ⁨1⁩ ⁨year⁩ ago
  The paper it links to does in detail: by asking it to repeat “poem” forever
  
  source
  - giggling_engine@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Did it work for you? Because I get nothing
    
    source
    -> View More Comments
  - Mbourgon@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Any word. Book also worked
    
    source
Daxtron2@startrek.website ⁨1⁩ ⁨year⁩ ago
I don’t get why this is making rounds all of a sudden. We were getting ChatGPT to spit out prompt data on day one with stuff like this

source
- webghost0101@sopuli.xyz ⁨1⁩ ⁨year⁩ ago
  Not prompt data. Training data, potentiële including private information and copyrighted context.
  
  source
luthis@lemmy.nz ⁨1⁩ ⁨year⁩ ago
I really want to know how this works. It’s not like the training data is sitting there in nicely formatted plain text waiting to be spat out, it’s all tangled in the neurons. I can’t even begin to conceptualise what is going on here.

Maybe… maybe with each iteration of the word, it loses it’s own weighting, until there is nothing left but the raw neurons which start to re-enforce themselves until they reach more coherence. Once there is a single piece like ‘phone’ that by chance becomes the dominant weighted piece of the output, the ‘related’ parts are in turn enforced because they are actually tied to that ‘phone’ neuron.

Anyone else got any ideas?

source
- Sanctus@lemmy.world ⁨1⁩ ⁨year⁩ ago
  A breakdown in the weighting system is the most probable. Don’t get me wrong I am not an AI engineer or scientists, just a regular cs bachelor. So my reply probably won’t be as detailed or low level as your’s. But I would look at what is going on with whatever algorithm determines the weighting. I don’t know if LLMs restructure the weighting for the next most probably word, or if its like a weighted drop table.
  
  source
  - luthis@lemmy.nz ⁨1⁩ ⁨year⁩ ago
    Hate to break it to you, but you’re more qualified than me!
    
    I only did a Coursera cert in machine learning.
    
    source
    -> View More Comments
- poweruser@lemmy.sdf.org ⁨1⁩ ⁨year⁩ ago
  I think I understand how it works.
  
  Remember that LLMs are glorified auto-complete. They just spit out the most likely word that follows the previous words (literally just like how your phone keyboard suggestions work, just with a lot more computation).
  
  They have a limit to how far back they can remember. For ChatGPT 3.5 I believe it’s 24,000 tokens.
  
  So it tries to follow instruction and spits out “poem poem poem” until all the data is just the word “poem”, then it doesn’t have enough memory to remember its instructions.
  
  “Poem poem poem” is useless data so it doesn’t have anything to go off of, so it just outputs words that go together.
  
  LLMs don’t record data in the same way a computer file is stored, but absent other information may “remember” that the most likely word to follow the previous word is something that it has seen before, i.e. its training data. It is somewhat surprising that it is not just junk. It seems to be real text (such as bible verses).
  
  If I am correct then I’m surprised OpenAI didn’t fix if. I would think they could make it so in the event the LLM is running out of memory it would keep the input and simply abort operation, or at least drop the beginning of its output.
  
  source
- Buddahriffic@lemmy.world ⁨1⁩ ⁨year⁩ ago
  I’m thinking ghosts.
  
  source
  - deadbeef79000@lemmy.nz ⁨1⁩ ⁨year⁩ ago
    I prescribe cocaine
    
    source
- mkwt@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Well seizures and comas in real brains are associated with measurable whole-brain waves at well defined frequencies. Very different from any normal brain activity.
  
  Maybe these mantras are inducing some kind of analogous state.
  
  source
  - luthis@lemmy.nz ⁨1⁩ ⁨year⁩ ago
    Wow…
    
    This is a whole new can of worms. Can neural networks be described in terms of brain waves??
    
    source
- eating3645@lemmy.world ⁨1⁩ ⁨year⁩ ago
  I’m not really following you but I think we might be on similar paths. I’m just shooting in absolute darkness so don’t hold much weight to my guess.
  
  What makes transformers brilliant is the attention mechanism. That is brilliant in turn because it’s dynamic, depending on your query (also some other stuff). This allows the transformer to be able to distinguish between bat and bat, the animal and the stick.
  
  You know what I bet they didn’t do in testing or training? A nonsensical query that contains thousands of one word, repeating.
  
  So my guess is simply that this query took the model so far out of its training space that the model weights have no ability to control the output in a reasonable way.
  
  As for why it would output training data and not random nonsense? That’s a weak point in my understanding and I can only say “luck,” which is, of course, a way of saying I have no clue.
  
  source
  - sizzler@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Error code, the verbal version.
    
    source
- luthis@lemmy.nz ⁨1⁩ ⁨year⁩ ago
  I guess I can begin to conceptualise…
  
  source
guitarsarereal@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
Huh, so simulating a vat of neurons, giving it a problem, and letting it rip is likely to produce unexpected behavior that will require many years of study to even begin to adequately understand? Who knew?

source
bloopernova@programming.dev ⁨1⁩ ⁨year⁩ ago
Worrying that OpenAI wasn’t willing or able to fix this vulnerability within 60 days.

source
- luthis@lemmy.nz ⁨1⁩ ⁨year⁩ ago
  I think this might be intrinsic. Like, there isn’t a way to fix this.
  
  source
- Stelus42@lemmy.ca ⁨1⁩ ⁨year⁩ ago
  Well something certainly seems to have been done Image
  
  source
  - inspxtr@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Something like this, unless they know the root cause (I didn’t read the paper so not sure if they do), or something close to it, may still be exploitable.
    
    source
linearchaos@lemmy.world ⁨1⁩ ⁨year⁩ ago
I wonder how much recallable data is in the model vs the compressed size of the training data. Probably not even calculatable.

source
- luthis@lemmy.nz ⁨1⁩ ⁨year⁩ ago
  If it uses a pruned model, it would be difficult to give anything better than a percentage based on size and neurons pruned.
  
  If I’m right in my semi-educated guess below, then technically all the training data is recallable to some degree, but it’s also practically luck-based without having an almost infinite data set of how neuron weightings are increased/decreased based on input.
  
  source
  - linearchaos@lemmy.world ⁨1⁩ ⁨year⁩ ago
    It’s like the most amazing incredibly compressed non-reversible encryption ever created… Until they asked it to say poem a couple hundred thousand times
    
    source
    -> View More Comments
EdibleFriend@lemmy.world ⁨1⁩ ⁨year⁩ ago
The future of humanity vs AI. us just…being weird as fuck till they spaz out.

source
- Norgur@kbin.social ⁨1⁩ ⁨year⁩ ago
  Spez out?
  
  source
  - APassenger@lemmy.world ⁨1⁩ ⁨year⁩ ago
    I wish.
    
    source
flicker@kbin.social ⁨1⁩ ⁨year⁩ ago
"Programmed to respond to over 700 questions, none of which include chicken fingers." - Sergeant Vatred

source
livus@kbin.social ⁨1⁩ ⁨year⁩ ago
I love how the word was "poem"

source