Comment

Comment on OpenAI has built a text watermarking method to detect chatgpt written content

brucethemoose@lemmy.world ⁨2⁩ ⁨months⁩ ago

This has been known in the ML space forever. LLMs don’t actually output words, but “probabilities” for tokens. And if you arbitrarily weigh these probabilities, it creates a “signature” in any text thats easy to measure. The sampler randomizes it a tiny bit, but thats not a problem in long texts.

It’s defeatable. I’m sure if you maken enough OpenAI queries, you can find the bias. But this likely will stop the lazy absures, aka 99% of abusers.

source

Sort:hotnew top

PenisDuckCuck9001@lemmynsfw.com ⁨2⁩ ⁨months⁩ ago
So if cheating on homework, use self hosted only then. Cool.

source
- brucethemoose@lemmy.world ⁨2⁩ ⁨months⁩ ago
  You have full control of your logit outputs with local LLMs, so theoretically you could “unscramble” them.
  
  OpenAI (IIRC) very notably stopped giving the logprobs of their models. They did this for many reasons, and most of them boil down to “profits” and “they are anticompetitive jerks”
  
  source
archomrade@midwest.social ⁨2⁩ ⁨months⁩ ago
It wouldn’t be surprising to me if they’ve had this implemented for awhile.

There’s still some question about why their 3.5 model had an apparent sudden drop-off in quality about a year ago, and among the plausible explanations for it could be that they were fucking with their weights in order to watermark the outputs in exactly the way you’re mentioning. They were also fighting against prompt-injection methods and censor disapproved uses at the time, so who the fuck knows.

source
- brucethemoose@lemmy.world ⁨2⁩ ⁨months⁩ ago
  This doesn’t touch the weights at all, it’s just a change to the sampler.
  
  What lobotomizes their models is cost cutting and trying to make them “safe,” or at least thats what I suspect.
  
  source