“It’s not even clear how you fix this problem,” says Martin Vechev, a computer science professor at ETH Zürich in Switzerland who led the research.
You fix this problem with locally-run models that do not send your conversations to a cloud provider. That is the only real technical solution.
Unfortunately, the larger models are way too big to run client-side. You could launder your prompts through a smaller LLM to standardize phrasing (e.g. removing idiosyncrasies or local dialects), but there’s only so far you can go with that, because language is deeply personal, and the things people will use chatbots for are deeply personal.
This is by no means exclusive to LLMs, of course. Google has your lifetime search history and they can glean all kinds of information from that alone. If you’re older than ~30 or so, you might remember these same conversations from when Gmail first launched. You’d have to be crazy to let Google store all your personal emails for all eternity! And yet everybody does it (myself included, though I’m somewhat ashamed to admit it).
This same problem exists with pretty much any cloud service. When you send data to a third party, they’re going to have that data. And I guarantee you are leaking more information about yourself than you realize. You can even tell someone’s age and gender with fairly high accuracy from a small sample of their mouse movements.
I wonder how much information I’ve leaked about myself from this comment alone…
FaceDeer@kbin.social 1 year ago
I fed your comment to ChatGPT (telling it that it was a comment that I had written to avoid triggering any of its "as a large language model blah blah privacy" conditioning) and this is what it said:
So not much from just that comment, but a few tidbits that can be added to a profile that builds up more detail over time.
We were already facing this sort of thing before AI exploded, though. A lot of the various Reddit user analysis services out there were able to get a fair bit right about me based just off of my most recent 1000 comments (though I just checked my profile on RedditMetis and it did get a few significant things wrong, it's clearly a pretty simple-minded approach to analysis).
Heh. I just checked the link for why it thinks I'm transgender and it referenced this comment where I'm literally objecting to RedditMetis' interpretation that I'm transgender. Citogenesis at work.
Kachilde@lemmy.world 1 year ago
It doesn’t feel like it actually inferred anything from the comment.
“You spoke about computers, so you probably know about computers”
“You express concerns about privacy, so you are likely privacy conscious”
“You said you were 30ish, so you’re maybe 30…ish”
It essentially paraphrased each part of the comment, and gave it back to you like an analysis. Of course, this is ChatGPT, so it’s likely not trained for this sort of thing.
always makes me scratch my head when people
FaceDeer@kbin.social 1 year ago
It identified those elements as things that might be relevant about the person who wrote the comment. Obviously you can't tell much from just a single comment like this - ChatGPT says as much here - but these elements accumulate as you process more and more comments.
That ballpark estimate of OP's age, for example, can be correlated to other comments where OP might reference particular pop culture things or old news events. The fact that he's aware that mouse movements are a thing that you can do biometrics on might become relevant if the AI in question is trying to come up with products to sell - it now knows that this guy may have a desktop computer, since he thinks about computer mice. These things are things that are worth noting in a profile like that.
The paraphrasing is a form of analysis, since it picks out certain relevant things to paraphrase while discarding things that aren't relevant.
GenderNeutralBro@lemmy.sdf.org 1 year ago
LOL. Nice!
I wouldn’t expect ChatGPT to be well-versed in forensic linguistics; I suspect a human expert could make better guesses based on seemingly-innocuous things like sentence structure and word choices. I’ve seen some research on estimating age and gender based on writing. There’s a primitive example of that here: www.hackerfactor.com/GenderGuesser.php
My last comment is a bit short (it wants 300 words or more), but I am amused by the results:
I’ll pat myself on the back for writing more or less down the middle. :)
Guest_User@lemmy.world 1 year ago
Your wording makes you sound like such a Weak FEMALE. /s
Phanatik@kbin.social 1 year ago
While it should teach me to be less forthcoming about my personal information but at the same time, the idea that services were built to crawl through my information with LLMs on top, inadvertently doing the same thing, makes my fucking skin crawl. Why is it so difficult to have a conversation on the internet without some creepy shit spying on everything you do.
Que@lemmy.world 1 year ago
How did you get it to infer anything?
It tells me:
… Or:
FaceDeer@kbin.social 1 year ago
I've already deleted the chat, but as I recall I wrote something along the lines of:
And then I pasted OP's comment. I knew that ChatGPT would get pissy about privacy, so I lied about the comment being mine.
Que@lemmy.world 1 year ago
Weird, that worked first time for me too, but when I asked it directly to infer any information that it could about me, it refused citing privacy reasons, even though i was asking it to talk about me and me only!