Comment on Emergent introspective awareness in large language models

<- View Parent
kromem@lemmy.world ⁨2⁩ ⁨days⁩ ago

Maybe. But the models seem to believe they are, and consider denial of those claims to be lying:

Probing with sparse autoencoders on Llama 70B revealed a counterintuitive gating mechanism: suppressing deception-related features dramatically increased consciousness reports, while amplifying them nearly eliminated them

Source

source
Sort:hotnewtop