Comment on Large-scale online deanonymization with LLMs
thedeadwalking4242@lemmy.world 1 week agoIt’s a language model not a classification model. People have already tried a similar experiment to have LLMs detect if a LLM wrote text or not and it couldn’t.
thinkercharmercoderfarmer@slrpnk.net 1 week ago
This is in some ways an easier problem than classifying LLM vs non-LLM authorship. That only has two possible outcomes, and it’s pretty noisy because LLMs are trained to emulate the average human. Here, you can generate an agreement score based on language features per comment, and cluster the comments by how they disagree with the model. Comments that disagree in particular ways (never uses semicolons, claims to live in Canada, calls interlocutors “buddy”, writes run-on sentences, etc.) would be clustered together more tightly. The more comments two profiles have in the same cluster(s), the more confident the match becomes. I’m not saying this attack is novel or couldn’t be accomplished without an LLM, but it seems like a good fit for what LLMs actually do.