Comment

Comment on Large-scale online deanonymization with LLMs

thinkercharmercoderfarmer@slrpnk.net ⁨2⁩ ⁨months⁩ ago

Why not? if LLMs are good at predicting mean outcomes for the next symbol in a string, and humans have idiosyncrasies that deviate from that mean in a predictable way, I don’t see why you couldn’t detect and correlate certain language features that map to a specific user. You could use things like word choice, punctuation, slang, common misspellings sentence structure… For example, I started with a contradicting question, I used “idiosyncrasies”, I wrote “LLMs” without an apostrophe, “language features” is a term of art, as is “map” as a verb, etc. None of these are indicative on their own, but unless people are taking exceptional care to either hyper-normalize their style, or explicitly spiking their language with confounding elements, I don’t see why an LLM wouldn’t be useful for this kind of espionage.

source

Sort:hotnew top

thedeadwalking4242@lemmy.world ⁨2⁩ ⁨months⁩ ago
It’s a language model not a classification model. People have already tried a similar experiment to have LLMs detect if a LLM wrote text or not and it couldn’t.

source
- thinkercharmercoderfarmer@slrpnk.net ⁨2⁩ ⁨months⁩ ago
  This is in some ways an easier problem than classifying LLM vs non-LLM authorship. That only has two possible outcomes, and it’s pretty noisy because LLMs are trained to emulate the average human. Here, you can generate an agreement score based on language features per comment, and cluster the comments by how they disagree with the model. Comments that disagree in particular ways (never uses semicolons, claims to live in Canada, calls interlocutors “buddy”, writes run-on sentences, etc.) would be clustered together more tightly. The more comments two profiles have in the same cluster(s), the more confident the match becomes. I’m not saying this attack is novel or couldn’t be accomplished without an LLM, but it seems like a good fit for what LLMs actually do.
  
  source