LLMs can unmask pseudonymous users at scale with surprising accuracy

⁨345⁩ ⁨likes⁩

Submitted ⁨⁨1⁩ ⁨month⁩ ago⁩ by ⁨return2ozma@lemmy.world⁩ to ⁨technology@lemmy.world⁩

https://arstechnica.com/security/2026/03/llms-can-unmask-pseudonymous-users-at-scale-with-surprising-accuracy/

source

Comments

Sort:hotnew top

jballs@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
As a registered Republican woman from Texas with five children and two dogs, let me just say that I am astonished!

source
- pivot_root@lemmy.world ⁨1⁩ ⁨month⁩ ago
  Me too. I thought I was safe as a Ottoman Empire expatriate living in Arrakis! I don’t want LLMs to connect this account to my pseudonymous mommy blog where I write about my three children who might exist but could be delusions of my untreated schizophrenia.
  
  source
  - CheesyFingers@piefed.social ⁨1⁩ ⁨month⁩ ago
    It seems that i, the original Unidan, will unfortunately need to create even more alts to escape being found out. Blast!
    
    source
    -> View More Comments
  - ivanafterall@lemmy.world ⁨1⁩ ⁨month⁩ ago
    Oh, WE EXIST, mommy! Let me assure you, as one of said imaginary schizophrenia babies. Currently shacking up in Miami with my new wife I just met cranking my hog at Sturgis.
    
    source
  - Bigfishbest@lemmy.world ⁨1⁩ ⁨month⁩ ago
    I don’t believe this! As a fumgrian living as a would be dead camoose off Mt. Kabul, I am overjizzed that AI is reading all my pornhub comments.
    
    source
- Whostosay@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
  You forgot to list your favorite brands
  
  source
  - Deceptichum@quokk.au ⁨1⁩ ⁨month⁩ ago
    Kleenex and Jergens
    
    source
  - HeyThisIsntTheYMCA@lemmy.world ⁨1⁩ ⁨month⁩ ago
    Kegel One
    
    source
- whaleross@lemmy.world ⁨1⁩ ⁨month⁩ ago
  As true as my name is Brenda and my last name is also Brenda. And so is my husband, Brenda. It is a hot day in Texas America today, I’m going to grill one of our dogs for dinner. It is a republican tradition, hence the name Hot Dogs and the playful name Wieners, named after wiener dogs.
  
  source
- goatinspace@feddit.org ⁨1⁩ ⁨month⁩ ago
  That was surprisingly accurate. Meep meep.
  
  source
FauxPseudo@lemmy.world ⁨1⁩ ⁨month⁩ ago
From a Facebook post I made on February 17th:

There are giant AI data firms that promise they can go through massive troves of data and pull out general and specific information from them. Information that is actionable and accurate. Give it 6 million data points and it’ll find all the links and organize them for you and unmask hidden details that aren’t visible to the naked eye.

Not one of those companies is stepping up to go through the publicly released Epstein files.

source
- Randomgal@lemmy.ca ⁨1⁩ ⁨month⁩ ago
  This is what I find crazy. Where are the AI bros chewing through the Epstein files?
  
  source
  - osaerisxero@kbin.melroy.org ⁨1⁩ ⁨month⁩ ago
    I would be shocked if someone hasn't shoved them into a local model somewhere, but all the big ones would filter them to death with content restrictions
    
    source
- General_Effort@lemmy.world ⁨1⁩ ⁨month⁩ ago
  There were reports of people trying to unredact the files almost immediately.
  
  source
  - FauxPseudo@lemmy.world ⁨1⁩ ⁨month⁩ ago
    But that’s not the same, is it?
    
    source
    -> View More Comments
- Spaniard@lemmy.world ⁨1⁩ ⁨month⁩ ago
  Today I asked AI to tell me which phone providers were available short by price and offers and it lied all the time, when I pointed it the AI corrected most of it but also removed some that were accurate for some reason.
  
  It would have been quicker if I did that myself instead of ask AI, oh also didn’t provide all companies.
  
  source
  - bleistift2@sopuli.xyz ⁨1⁩ ⁨month⁩ ago
    AI info is never up-to date. What where you expecting?
    
    source
    -> View More Comments
- Mubelotix@jlai.lu ⁨1⁩ ⁨month⁩ ago
  We wouldn’t want that tbh. Justice needs to be precise and backed up by tangible facts
  
  source
  - KeenFlame@feddit.nu ⁨1⁩ ⁨month⁩ ago
    Also don’t use dna tests or chemical analysis. It’s invisible hocus pocus and can be wrong! And woe if someone that fucks and tortures kids regularly is wrongly accused of raping kids and running their child minds no that would be awful
    
    source
  - FauxPseudo@lemmy.world ⁨1⁩ ⁨month⁩ ago
    You can use the results of the AI analysis to identify people and then use that to do a proper investigation. Right now none of that is happening. No speculation. No tangibles. No investigation. No indictment.
    
    Trying to unmask people is a step in the right direction.
    
    source
    -> View More Comments
FlashMobOfOne@lemmy.world ⁨1⁩ ⁨month⁩ ago
And it will falsely identify people at even greater scale, because it is an imprecise and buggy tool.

source
- RblScmNerfHerder@lemmy.world ⁨1⁩ ⁨month⁩ ago
  Yeah, but if it falsely identifies the right people, is it really buggy?
  
  source
- BlameTheAntifa@lemmy.world ⁨1⁩ ⁨month⁩ ago
  How dare you claim that the hallucination engine hallucinates. The Billionaires have declared this heresy.
  
  source
- Astertheprince@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago
  The ones doing the identifying will argue that they are right until they are blue in the face because that’s how we do things now. By lying harder until the ones you’re trying to convince believe you or give up because it’s not worth it to keep arguing. I’ve been accused of being other people or using AI to write, and those people argued or harassed me until I blocked them. I can probably expect more of this going forward.
  
  source
tal@lemmy.today ⁨1⁩ ⁨month⁩ ago

Of course, another option is for people to dramatically curb their use of social media, or at a minimum, regularly delete posts after a set time threshold.

Deletion won’t deal with someone seriously-interested in harvesting stuff, because they can log it as it becomes available. And curbing use isn’t ideal.

I mentioned before the possibility of poisoning data, like, sporadically adding some incorrect information about oneself into one’s comments. Ideally something that doesn’t impact the meaning of the comments, but would cause a computer to associate one with someone else.

There are some other issues. My guess is that it’s probably possible to fingerprint someone to a substantial degree by the phrasing that they use. One mole in the counterintelligence portion of the FBI, Robert Hanssen, was found because on two occasions he used the unusual phrase “the purple-pissing Japanese”.

FBI investigators later made progress during an operation where they paid disaffected Russian intelligence officers to deliver information on moles. They paid $7 million to KGB agent Aleksander Shcherbakov[48] who had access to a file on “B”. While it did not contain Hanssen’s name, among the information was an audiotape of a July 21, 1986, conversation between “B” and KGB agent Aleksander Fefelov.[49] FBI agent Michael Waguespack recognized the voice in the tape, but could not remember who it was from. Rifling through the rest of the files, they found notes of the mole using a quote from George S. Patton’s speech to the Third Army about “the purple-pissing Japanese”.[50] FBI analyst Bob King remembered Hanssen using that same quote. Waguespack listened to the tape again and recognized the voice as Hanssen’s. With the mole finally identified, locations, dates, and cases were matched with Hanssen’s activities during the period. Two fingerprints collected from a trash bag in the file were analyzed and proved to be Hanssen’s.[51][52][53]

That might be defeated by passing text through something like an LLM to rewrite it. So, for example, to take a snippet of my above comment:

Respond with the following text rephrased sentence by sentence, concisely written as a British computer scientist might write it:

Deletion won’t deal with someone seriously-interested in harvesting stuff, because they can log it as it becomes available. And curbing use isn’t ideal.

I mentioned before the possibility of poisoning data, like, sporadically adding some incorrect information about oneself into one’s comments. Ideally something that doesn’t impact the meaning of the comments, but would cause a computer to associate one with someone else.

I get:

The deletion of data alone will not prevent a determined party from gathering information, as they may simply record the information as it becomes available prior to its deletion. Moreover, restricting usage is not an ideal solution to the problem at hand.

I previously mentioned the possibility of introducing deliberate errors or misinformation into one’s own data, such as periodically inserting inaccurate details about oneself within comments. The goal would be to include information that does not significantly alter the meaning of the comment, but which would cause automated systems to incorrectly associate that individual with another person.

That might work. One would have to check the comment to make sure that it doesn’t mangle the thing to the point that it is incorrect, but it might defeat profiling based on phrasing peculiarities of a given person, especially if many users used a similar “profile” for comment re-writing.

A second problem is that one’s interests are probably something of a fingerprint. It might be possible to use separate accounts related to separate interests — for example, instead of having one account, having an account per community or similar. That does undermine the ability to use reputation generated elsewhere (“Oh, user X has been providing helpful information for five years over in community X, so they’re likely to also be doing so in community Y”), which kind of degrades online communities, but it’s better than just dropping pseudonymity and going 4chan-style fully anonymous and completely losing reputation.

source
- HyperfocusSurfer@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago
  Regarding the last point: it’s more of a bias, tho, so it may even be a good thing. E.g. asking Kent Overstreet’s opinion on your bcachefs setup is probably useful, while getting relationship advice from him is ill-advised.
  
  source
  - regenwetter@piefed.social ⁨1⁩ ⁨month⁩ ago
    Advice being right or wrong isn’t necessarily the big issue for online communities (unless most other users are also wrong). What really degrades them is users acting like assholes, and someone who acts like that in a tech community is fairly likely to also do that in a political or relationship community.
    
    source
- GamingChairModel@lemmy.world ⁨1⁩ ⁨month⁩ ago
  
  It might be possible to use separate accounts related to separate interests
  
  That’s what people should do. And the natural consequence is that there is code switching, where people subtly use different jargon and references and writing style when talking to different audiences.
  
  Nobody is gonna correlate my shitposts or joke comments to my work email, because the way I write in a professional environment is totally different from the way I write with my friends and family, or in casual contexts organized around different interests. Even between different friends, family, or colleagues, I have a sense of my audience, and my tone/style differs significantly for different people.
  
  So at that point, if I have a Linux/technology account and a separate account for the sports I like and a separate account for the local things happening in my city, who’s going to be able to link them by their very different textual styles?
  
  source
- Yliaster@lemmy.world ⁨1⁩ ⁨month⁩ ago
  Why is curbing use unideal?
  
  source
  - KeenFlame@feddit.nu ⁨1⁩ ⁨month⁩ ago
    We like internet
    
    source
- zerofk@lemmy.zip ⁨1⁩ ⁨month⁩ ago
  Your above average use of the word “one” and variations like “one’s” could be quite telling.
  
  As could my correction of “it’s” in the above sentence.
  
  source
  - frongt@lemmy.zip ⁨1⁩ ⁨month⁩ ago
    As well as your use of fancy quote marks.
    
    source
ne0phyte@feddit.org ⁨1⁩ ⁨month⁩ ago
I am so grateful for already having been paranoid about sharing anything identifying about me starting 15+ years ago.

I never uploaded a picture of myself. Never used my real name anywhere. I used different nicks for different branches of the Internet. A plethora of different email addresses etc.

People thought I was being overly careful and I probably missed a lot of things due to not using Whatsapp, Facebook, Instagram, Twitter, Snapchat but I can’t say I regretted it at any point.

source
- Scrollone@feddit.it ⁨1⁩ ⁨month⁩ ago
  It’s not enough. You should use a different writing style for each website you write on.
  
  source
- TankovayaDiviziya@lemmy.world ⁨1⁩ ⁨month⁩ ago
  Doing those is not unreasonable, but not even having a bank account is way too far. I know of someone, who was later diagnosed with autism and doesn’t have a job due to condition, initially didn’t want a bank account for fear of online snooping.
  
  Minimising digital footprint is perfectly fine, but trying to be off the grid and yet wants to participate in society and still engage in consumption is unreasonable. And this thinking isn’t just on one person, I saw many users in Reddit privacy stressing themselves out in trying to completely wipe off their digital footprints. Unless you participate in political activities, or really just wants to live completely isolated in a forest, being off the grid is totally unreasonable.
  
  source
DarkCloud@lemmy.world ⁨1⁩ ⁨month⁩ ago
Great, we’re at a point where “researchers” are helping tech bros hurt the public interest. Could they just NOT publish this shit? Stop giving helpful tips to tyrannical oligarchs!

Academics can be stupid idiots sometimes.

source
- ShotDonkey@lemmy.world ⁨1⁩ ⁨month⁩ ago
  Tbh I read the research article and it’s not rocket science that they were doing. Any 2nd rate FBI analyst would have come up with these ideas sooner or later to try and match anonymous profiles with veryfied ones using LLMs.
  
  source
- zerofk@lemmy.zip ⁨1⁩ ⁨month⁩ ago
  Researchers’ work has always been abused by others. The advancement and free distribution of knowledge should not be curtailed for fear of malicious parties.
  
  source
73ms@sopuli.xyz ⁨1⁩ ⁨month⁩ ago
Have they tried doing this for Satoshi Nakamoto yet?

source
maplesaga@lemmy.world ⁨1⁩ ⁨month⁩ ago
Average people download gamed and apps and their phone is loaded to the tilt with bloatware. You think they care?

source
- SupraMario@lemmy.world ⁨1⁩ ⁨month⁩ ago
  The average person puts their entire lives on Facebook or linkedin with their real names…they don’t give a shit.
  
  source
  - ThunderQueen@lemmy.world ⁨1⁩ ⁨month⁩ ago
    “WeLl I hAvE nOtHiNg To HiDe”
    
    source
    -> View More Comments
  - EndlessNightmare@reddthat.com ⁨1⁩ ⁨month⁩ ago
    LinkedIn, if used properly, should just be professional/career related content. If you put anything overly personal or controversial, you are using it wrong.
    
    I’m not saying that people don’t do that though.
    
    source
    -> View More Comments
ToTheGraveMyLove@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
Who am I? No forreal, WHO AM I? Last I remember I was on a cruise around the Caribbean. I blacked out one night while at the casino and when I came to I was on a beach in the middle of nowhere with a toothless man who spoke a language I couldn’t comprehend. Thankfully he still has a dial up connection somehow in the year of our lord 2026, but I’ve been on this island for two years now. SOMEONE COME GET ME!

source
- FenrirIII@lemmy.world ⁨1⁩ ⁨month⁩ ago
  Your wife is much happier with me now and the children are already calling me dad. It’s time to move on.
  
  source
  - ToTheGraveMyLove@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
    Fuck, I WAS MARRIED!? This toothless guy keeps trying to wrap a bit of twine around my finger and cuddle me when I’m passed out. If I know I’m single now, I might as well go for it. Wish me luck!
    
    source
- johsny@lemmy.world ⁨1⁩ ⁨month⁩ ago
  You think that is bad? 3 weeks ago I went camping while jacking off and I came across my family doctor’s grandma staring at a pile of leaves. As I got closer I noticed it wasn’t a pile of leaves at all but rather a man that I recognized from somewhere. I realized I had seen him while on a trip to the UK last year at an authentic British fish and chips place my wife and 6 uncles had lunch at. He was vinegar balls Edward, an old fisherman who comes to your table for you to squeeze malt vinegar out of his balls onto your fries for an authentic British experience.
  
  So here he is on my camping jack just laying there dead.
  
  I did what any smart person would do and I pulled out my Swiss Army knife and hacked off his sack. 2 weeks later I went back to the UK and sold his scrotum and balls to that restaurant, they were about to go out of business without malt vinegar so they were extremely appreciative when I brought them the vinegar balls. The mayor of the town named a street after me and gave me six packs of smokes. I smoked them all that day despite being a non smoker because I needed to show that I was thankful for the gift.
  
  In 3 months I’ll be going on another camping trip with my step grandpa, no jacking off allowed this time but maybe I’ll find a corpse that’Il haunt me forever. All it takes is 6 packs of smokes and a pocket full of belly buttons. That’s right, I’m totally a smoker now because smoking is the coolest fuckin thing anyone could ever do.
  
  source
vacuumflower@lemmy.sdf.org ⁨1⁩ ⁨month⁩ ago
The bright side - they can also be used to mask pseudonymous users. Guess how.

source
- dejected_warp_core@lemmy.world ⁨1⁩ ⁨month⁩ ago
  Yes, but don’t use a public service for this. Use a local LLM and maintain distinct profiles, one for each online account.
  
  source
Fizz@lemmy.nz ⁨1⁩ ⁨month⁩ ago
This seems like complete bullshit.

source
- Astertheprince@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago
  They didn’t show their work citing misuse as the reason, which leads me to believe it’s fearmongering and that it isn’t going to work so well for the people who are actually trying (AKA have good OPSEC).
  
  This is one of those examples of “It’s not possible to be private online so don’t bother trying” type propaganda posts.
  
  source
  - Fizz@lemmy.nz ⁨1⁩ ⁨month⁩ ago
    Its just a shit article lazily posted because the headline further radicalizes people. Low quality articles like this should be banned because they lower the IQ of everyone who reads it and misinform anyone who doesn’t.
    
    source
thedeadwalking4242@lemmy.world ⁨1⁩ ⁨month⁩ ago
I call BS we can’t even get AI models to determine if an AI write text. This as go to me some magic statistics

source
deadymouse@lemmy.world ⁨1⁩ ⁨month⁩ ago
For those who don’t know, we’ve been living in a dystopia since the 2000s.

source
ExLisper@lemmy.curiana.net ⁨1⁩ ⁨month⁩ ago
I think this will only work with people narrating their lives on social media.

“Got coffee from my favorite Granier at La Rambla! Ready of new day of work designing hats for dogs”

“Me and Bobby heading to Madrid to see my friend Concepcion. Do you like his new hat?”

“Just got nominated for ‘best business-casual hat’ at this year’s Barkies! So proud”

Because how are you going to de-anonymize some random ramblings about Linux and beans? Everyone likes Linux and beans.

source
- rebelsimile@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
  This is good advice. Pick a person you know and drop hints that you’re them. Bonus points if that person is terminally online. Anyway, gotta get back to running X.
  
  source
  - ToTheGraveMyLove@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
    Holy shit, guys! Its the Antichrist!
    
    source
- KeenFlame@feddit.nu ⁨1⁩ ⁨month⁩ ago
  Nope. It’s in special tiny ways we author text. I think.
  
  source
- vaultdweller013@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
  Hell you could even pull a me and just lie about some shit. A story I heard from a YouTube video or documentary gets modified into an old bastard I knew for example. Introducing such variables would be easy for a human to eventually pick up that I’m lying but an AI may come to the conclusion that I’m Zack Hazard. Point is that I dirty up the info about myself, the psychotic larp I do semi-consistently probably also helps.
  
  source
ShotDonkey@lemmy.world ⁨1⁩ ⁨month⁩ ago
The results, especially the high numbers stated in the news article (68% recall, 90% accuracy) are overestimated as their verification method (i.e., whether the LLM detected really the right account) come from matching veryfied accounts with a test set of anonymous accounts of which they knew the real name. They knew the real name bcs the persons had a public link to their LinkedIn in their “anonymous” profile (which was removed for the sake of testing wheter the LLm can match the two acfounts. That being said: a user who uses a pseudonym but links his/her account publically to a, say, LinkedIn account doesn’t really care about anonymity and might hand out many more ‘breadcrumbs’ to follow than a truly anonymous account.

But I still think that also in the case of a fully anonymous account, people can be fingerprinted and matched with non-anonymous identities due to language, style etc. by a LLM.

source
- GamingChairModel@lemmy.world ⁨1⁩ ⁨month⁩ ago
  Reminds me of an AI tool that could identify authorship of articles with surprisingly high accuracy, and then they peeked under the hood and realized it was just looking for the author byline at the top of the article that says “By John Doe,” where it completely failed if the article didn’t explicitly say who the author was.
  
  source
  - XeroxCool@lemmy.world ⁨1⁩ ⁨month⁩ ago
    I can’t believe this product, modeled after humans, would lie and cheat like humans
    
    source
ComradePenguin@lemmy.ml ⁨1⁩ ⁨month⁩ ago
Is this the first step towards using local LLMs for anonymity? 🫠 Always rephrasing each sentence somewhat. Truly dystopian stuff

source
doesit@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
Kind of obvious. If you’re a highschool teacher and you used to be a photographer. You also volunteer as a fireman. You live in France. You heve 2 daughters. In 2022 you asked about repairs on your honda civic.
All off this can be amassed from different posts on facebook or reddit. There’ll be just a few people that fit this profile.

source
merde@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
somebody should inform EU that they no longer need chatControl

:/

source
cley_faye@lemmy.world ⁨1⁩ ⁨month⁩ ago
Yeah. I got a hunch of that a while ago, while trying some “old” scenarios of de-anonymization we used to do by hand. Just asking questions and posting pictures got surprisingly accurate results. A single picture with (to me) no significant landmark could lead to localizing a specific part of a city, and that was using a local LLM with a relatively small model, running on a 16GB VRAM 4060Ti.

It is now time to remember fondly the time where the younger people were warned by older people to not post all their stuff online, not over-share, be cautious about strangers, etc. I’m not sure when we lost that, but oh boy, it’s a festival.

source
- tal@lemmy.today ⁨1⁩ ⁨month⁩ ago
  
  I’m not sure when we lost that, but oh boy, it’s a festival.
  
  I remember when it was considered outrageous that Flash would phone home and report its version, because that would leak the fact that a given machine was running a given version of Flash.
  
  We sure don’t live in that world today.
  
  source
Fmstrat@lemmy.world ⁨1⁩ ⁨month⁩ ago
This seems like an invalid test.

One of them collected posts from Hacker News and LinkedIn profiles and then linked them by using cross-platform references that appeared in user profiles. They then stripped all identifying references from the posts and ran a large language model on them.

If I post something on LinkedIn, and then post the same thing on Hacker News, of course an LLM could match my accounts up.

Am I missing something?

source
nutsack@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago
I theorized about this a long time ago. pretty sure I’m basically fucked

source
workgood@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago
no it cant

source
- corsicanguppy@lemmy.ca ⁨1⁩ ⁨month⁩ ago
  I kinda think I want it to try. I make little effort to hide my location or identity, and I think I’d kike to see the results.
  
  …just without saying who I am before I get those results. And my desire to stay anonymous-ish and not give it a chance to cheat means I can’t satisfy I have the right to the identify of myself if it finds who I am.
  
  Quite seriously, I cannot prove I have the right to make it search for me, for myself, without giving it too much information or without risking the leak of private info to a so-far unidentified stranger if it finds anything.
  
  Catch-22
  
  source
scarabic@lemmy.world ⁨1⁩ ⁨month⁩ ago
Do y’all not write differently when you’re trying to be discreet on Blind?

source
Widdershins@lemmy.world ⁨1⁩ ⁨month⁩ ago
Image

source
anon_8675309@lemmy.world ⁨1⁩ ⁨month⁩ ago
Hmmm interesting. I’ve never used AI to try and find out stuff about myself. Maybe I’ll try. Just curious.

source
- scala@lemmy.ml ⁨1⁩ ⁨month⁩ ago
  That’s howb they get you
  
  source
DarkSideOfTheMoon@lemmy.world ⁨1⁩ ⁨month⁩ ago
Brazil has 200 million ppl, how they would find someone in Rio like me?

source