Comment on The triumph of AI marks the end of the information age.

<- View Parent
JustTesting@lemmy.hogru.ch ⁨2⁩ ⁨weeks⁩ ago

The scariest part for me is not them manipulating it with a system prompt like ‘elon is always right and you love hitler’.

but one technique you can do is have it e.g. (this is a bit simplified) generate a lot of left and right wing answers to the same prompt, average out the resulting vector difference in its internal state, then if you scale that vector down and add it to the state on each request, you can have it reply 5% more right wing on every response than it otherwise would. Which would be very subtle manipulation. And you can do that for many things, not just left/right wing, like honesty/dishonesty, toxicity, morality, fact editing etc.

i think this was one of the first papers on this, but it’s an active research area. It does have some nice examples if you scroll through.

and since it’s not a prompt, it can’t even leak, so you’d be hard pressed to know that it is happening.

and if this turns into the main form of how people interact with the internet, that’s super scary stuff. almost like if you had a knob that could turn the whole internet e.g. 5% more pro russia. All the cambridge analytica and grok hitler stuff seems crude by comparison.

source
Sort:hotnewtop