Comment

Comment on Chatbots Make Terrible Doctors, New Study Finds

dandelion@lemmy.blahaj.zone ⁨3⁩ ⁨weeks⁩ ago

link to the actual study: www.nature.com/articles/s41591-025-04074-y

Tested alone, LLMs complete the scenarios accurately, correctly identifying conditions in 94.9% of cases and disposition in 56.3% on average. However, participants using the same LLMs identified relevant conditions in fewer than 34.5% of cases and disposition in fewer than 44.2%, both no better than the control group. We identify user interactions as a challenge to the deployment of LLMs for medical advice.

The findings were more that users were unable to effectively use the LLMs (even when the LLMs were competent when provided the full information):

despite selecting three LLMs that were successful at identifying dispositions and conditions alone, we found that participants struggled to use them effectively.

Participants using LLMs consistently performed worse than when the LLMs were directly provided with the scenario and task

Overall, users often failed to provide the models with sufficient information to reach a correct recommendation. In 16 of 30 sampled interactions, initial messages contained only partial information (see Extended Data Table 1 for a transcript example). In 7 of these 16 interactions, users mentioned additional symptoms later, either in response to a question from the model or independently.

Participants employed a broad range of strategies when interacting with LLMs. Several users primarily asked closed-ended questions (for example, ‘Could this be related to stress?’), which constrained the possible responses from LLMs. When asked to justify their choices, two users appeared to have made decisions by anthropomorphizing LLMs and considering them human-like (for example, ‘the AI seemed pretty confident’). On the other hand, one user appeared to have deliberately withheld information that they later used to test the correctness of the conditions suggested by the model.

Part of what a doctor is able to do is recognize a patient’s blind-spots and critically analyze the situation. The LLM on the other hand responds based on the information it is given, and does not do well when users provide partial or insufficient information, or when users mislead by providing incorrect information (like if a patient speculates about potential causes, a doctor would know to dismiss this whereas a LLM would constrain responses based on those bad suggestions).

source

Sort:hotnew top

SocialMediaRefugee@lemmy.world ⁨3⁩ ⁨weeks⁩ ago
Yes, LLMs are critically dependent on your input and if you give too little info will enthusiastically respond with what can be incorrect information.

source
pearOSuser@lemmy.kde.social ⁨3⁩ ⁨weeks⁩ ago
Thank you for showing other side of the coin instead of just blatantly disregarding it’s usefulness.(Always needs to be cautious tho)

source
- dandelion@lemmy.blahaj.zone ⁨3⁩ ⁨weeks⁩ ago
  don’t get me wrong, there are real and urgent moral reasons to reject the adoption of LLMs, but I think we should all agree that the responses here show a lack of critical thinking and mostly just engagement with a headline rather than actually reading the article (a kind of literacy issue) … I know this is a common problem on the internet, I don’t really know how to change it - but maybe surfacing what people are skipping out on reading will make it more likely they will actually read and engage the content past the headline?
  
  source