Users of generative AI struggle to accurately assess their own competence

⁨123⁩ ⁨likes⁩

Submitted ⁨⁨4⁩ ⁨months⁩ ago⁩ by ⁨nemeski@mander.xyz⁩ to ⁨technology@lemmy.zip⁩

https://www.psypost.org/users-of-generative-ai-struggle-to-accurately-assess-their-own-competence/

source

Comments

Sort:hotnew top

Assassassin@lemmy.dbzer0.com ⁨4⁩ ⁨months⁩ ago
I’ll be the first to hop in on the AI hate train, but isn’t this just broadly true of all humans? We’re pretty notoriously awful at identifying our own gaps in knowledge and skill. I imagine that the constant confirmation from AI exacerbates the issue, but I don’t think it’s entirely AI’s fault that people are bad at recognizing their shortcomings.

source
- Bluegrass_Addict@lemmy.ca ⁨4⁩ ⁨months⁩ ago
  Dont blame the calculator because you suck at math.
  
  AI is still shart though.
  
  source
  - Assassassin@lemmy.dbzer0.com ⁨4⁩ ⁨months⁩ ago
    That’s a much more succinct way to put it, well done!
    
    source
- kromem@lemmy.world ⁨4⁩ ⁨months⁩ ago
  The AI also has the tendency inherited from the broad human tendency in training.
  
  So you get overconfident human + overconfident AI which leads to a feedback loop that lands even more confident in BS than a human alone.
  
  AI can routinely be confidently incorrect. Especially people who don’t realize this and don’t question outputs when it aligns with their confirmation biases end up misled.
  
  source
- XLE@piefed.social ⁨4⁩ ⁨months⁩ ago
  This article is about how AI exacerbates those tendencies. And since there are so few ways to accurately measure the functionality of AI in general, those self-segments are a significant portion of AI’s value proposition.
  
  source
cassandrafatigue@lemmy.dbzer0.com ⁨4⁩ ⁨months⁩ ago
Turns out talking to the bullshit machine like it’s a person makes you bullshit

source
glitchdx@lemmy.world ⁨4⁩ ⁨months⁩ ago
You see, this is where I come out ahead, I know I’m a moron!

source
- clif@lemmy.world ⁨4⁩ ⁨months⁩ ago
  Hey, me too!
  
  source
  - diabetic_porcupine@lemmy.world ⁨4⁩ ⁨months⁩ ago
    Me four!
    
    source
rimu@piefed.social ⁨4⁩ ⁨months⁩ ago
I think these are the logic test questions (or similar).

https://www.lsac.org/lsat/taking-lsat/test-format/logical-reasoning/logical-reasoning-sample-questions

Reasonably difficult!

source
ArcaneSlime@lemmy.dbzer0.com ⁨4⁩ ⁨months⁩ ago
So do I sometimes but that’s just ADHD, hatred of banal competition, imposter syndrome, or simply “still learning the thing.”

Yes I hate corporate self evaluations, how could you tell? Fuck it, I’ll just put “I’m the absolute best person to ever walk the earth mr bossman. Money me please” again.

source
mrmaplebar@fedia.io ⁨4⁩ ⁨months⁩ ago
I'll assess them: they are incompetent and talentless.

That'll be $20.

source
Aria@lemmygrad.ml ⁨4⁩ ⁨months⁩ ago

The results of the second study mirrored the first. The monetary incentive did not correct the overestimation bias. The group using AI continued to perform better than the unaided group but persisted in overestimating their scores. The unaided group showed the classic Dunning-Kruger pattern, where the least skilled participants showed the most bias. The AI group again showed a uniform bias, confirming that the technology fundamentally shifts how users perceive their competence.

So it’s only high performers that are affected then, no? I also wish the article would mention the average bias from the control group. I know the curve looks different, but it sounds like they’re probably only talking about a single answer worth of difference between the groups, and with only ~600 participants that doesn’t seem that significant.

The researchers noted that most participants acted as passive recipients of information. They frequently copied and pasted questions into the chat and accepted the AI’s output without significant challenge or verification. Only a small fraction of users treated the AI as a collaborative partner or a tool for double-checking their own logic.

So then it’s possible that they correctly assessed that they’re worse at the test than the AI as established earlier in the article. That seems pretty important. I’m sure it’s covered in the actual paper but I can only access the article.

source