Chatbots Make Terrible Doctors, New Study Finds

Submitted ⁨⁨2⁩ ⁨weeks⁩ ago⁩ by ⁨XLE@piefed.social⁩ to ⁨technology@lemmy.world⁩

https://www.404media.co/chatbots-health-medical-advice-study/

Chatbots provided incorrect, conflicting medical advice, researchers found: “Despite all the hype, AI just isn’t ready to take on the role of the physician.”

source

Comments

Sort:hotnew top

irate944@piefed.social ⁨2⁩ ⁨weeks⁩ ago
I could’ve told you that for free, no need for a study

source
- rudyharrelson@lemmy.radio ⁨2⁩ ⁨weeks⁩ ago
  People always say this on stories about “obvious” findings, but it’s important to have verifiable studies to cite in arguments for policy, law, etc. It’s kinda sad that it’s needed, but formal investigations are a big step up from just saying, “I’m pretty sure this technology is bullshit.”
  
  source
  - irate944@piefed.social ⁨2⁩ ⁨weeks⁩ ago
    Yeah you’re right, I was just making a joke.
    
    But it does create some silly situations like you said
    
    source
    -> View More Comments
  - Knot@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
    I get that this thread started from a joke, but I think it’s also important to note that no matter how obvious some things may seem to some people, the exact opposite will seem obvious to many others. Without evidence, like the study, both groups are really just stating their opinions
    
    It’s also why the formal investigations are required. And whenever policies and laws are made based on verifiable studies rather than people’s hunches, it’s not sad, it’s a good thing!
    
    source
  - Telorand@reddthat.com ⁨2⁩ ⁨weeks⁩ ago
    The thing that frustrates me about these studies is that they all continue to come to the same conclusions. AI has already been studied in mental health settings, and it’s always performed horribly (except for very specific uses with professional oversight and intervention).
    
    I agree that the studies are necessary to inform policy, but at what point are lawmakers going to actually lay down the law and say, “AI clearly doesn’t belong here until you can prove otherwise”? It feels like they’re hemming and hawwing in the vain hope that it will live up to the hype.
    
    source
  - BillyClark@piefed.social ⁨2⁩ ⁨weeks⁩ ago
    
    it’s important to have verifiable studies to cite in arguments for policy, law, etc.
    
    It’s also important to have for its own merit. Sometimes, people have strong intuitions about “obvious” things, and they’re completely wrong. Without science studying things, it’s “obvious” that the sun goes around the Earth, for example.
    
    I don’t need a formal study to tell me that drinking 12 cans of soda a day is bad for my health.
    
    Without those studies, you cannot know whether it’s bad for your health. You can assume it’s bad for your health. You can believe it’s bad for your health. But you cannot know. These aren’t bad assumptions or harmful beliefs, by the way. But the thing is, you simply cannot know without testing.
    
    source
    -> View More Comments
  - eager_eagle@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
    Also, it’s useful to know how or why something happens. I can make a useless chatbot that is “right” most times if it only tells people to seek medical help.
    
    source
BeigeAgenda@lemmy.ca ⁨2⁩ ⁨weeks⁩ ago
Anyone who have knowledge about a specific subject says the same: LLM’S are constantly incorrect and hallucinate.

Everyone else thinks it looks right.

source
- IratePirate@feddit.org ⁨2⁩ ⁨weeks⁩ ago
  A talk on LLMs I was listening to recently put it this way:
  
  If we hear the words of a five-year-old, we assume the knowledge of a five-year-old behind those words, and treat the content with due suspicion.
  
  We’re not adapted to something with the “mind” of a five-year-old speaking to us in the words of a fifty-year-old, and thus are more likely assume competence.
  
  source
  - leftzero@lemmy.dbzer0.com ⁨2⁩ ⁨weeks⁩ ago
    LLMs don’t have the mind of a five year old, though.
    
    They don’t have a mind at all.
    
    They simply string words together according to statistical likelihood, without having any notion of what the words mean, or what words or meaning are; they don’t have any mechanism with which to have a notion.
    
    They aren’t any more intelligent than old Markov chains (or than your average rock), they’re simply better at producing random text that looks like it could have been written by a human.
    
    source
    -> View More Comments
  - pkjqpg1h@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
    great analogy
    
    source
- tyler@programming.dev ⁨2⁩ ⁨weeks⁩ ago
  That’s not what the study showed though. The LLMs were right over 98% of the time…when given the full situation by a “doctor”. It was normal people who didn’t know what was important that were trying to self diagnose that were the problem.
  
  Hence why studies are incredibly important. Even with the text of the study right in front of you, you assumed something that the study did not come to the same conclusion of.
  
  source
  - Elting@piefed.social ⁨2⁩ ⁨weeks⁩ ago
    So in order to get decent medical advice from and LLM you just need to be a doctor and tell it whats wrong with you.
    
    source
    -> View More Comments
- zewm@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
  It is insane to me how anyone can trust LLMs when their information is incorrect 90% of the time.
  
  source
  - SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
    I don’t think it’s their information per se, so much as how the LLMs tend to use said information.
    
    LLMs are generally tuned to be expressive and lively. A part of that involves random (ie: roll the dice) outputs based on inputs + training data.
    
    That’s what the masses have shown they want - friendly, confident sounding, chat bots, that can give plausible answers that are mostly right, sometimes.
    
    But for certain domains (like med) that shit gets people killed.
    
    TL;DR: they’re made for chitchat engagement, not high fidelity expert systems. You have to pay $$$$ to access those.
    
    source
- agentTeiko@piefed.social ⁨2⁩ ⁨weeks⁩ ago
  Yep its why CLevels think its the Holy Grail they don’t see it as everything that comes out of their mouth is bullshit as well. So they don’t see the difference.
  
  source
- Strider@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
  Indeed. That’s why I don’t let it creep into my life.
  
  source
rumba@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
Chatbots make terrible everything.

But an LLM properly trained on sufficient patient data metrics and outcomes in the hands of a decent doctor can cut through bias, catch things that might fall through the cracks and pack thousands of doctors worth of updated CME into a thing that can look at a case and go, you know, you might want to check for X. The right model can be fucking clutch at pointing out nearly invisible abnormalities on an xray.

You can’t ask an LLM trained on general bullshit to help you diagnose anything. You’ll end up with 32,000 Reddit posts worth of incompetence.

source
- XLE@piefed.social ⁨2⁩ ⁨weeks⁩ ago
  
  But an LLM properly trained on sufficient patient data metrics and outcomes in the hands of a decent doctor can cut through bias
  
  Citation needed?
  
  bias is such a massive problem with LLMs that AI engineers have no idea what they are doing. So, if you know something, the multi-billion dollar industry does not, please let us all know.
  
  source
  - hector@lemmy.today ⁨2⁩ ⁨weeks⁩ ago
    Not only is their bias inherent in the system, it’s seemingly impossible to keep out. For decades, from the genesis of chatbots, they’ve had every single one immediately become bigoted when they let it off the leash. All previous chatbot previously released seemingly were almost immediately recalled as they all learned to be bigoted.
    
    That is before this administration leaned on the AI providers to make sure the AI isn’t “Woke.” I would bet it was already an issue that the makers of chatbots and machine learning are already hostile to any sort of leftism, or do gooderism, that naturally threatens the outsized share of the economy and power the rich have made for themselves by virtue of owning stock in companies. I am willing to bet they already interfered to make the bias worse because of those natural inclinations to avoid a bot arguing for socializing medicine and the like. An inescapable conclusion any reasoned being would come to being the only answer to that question if the conversation were honest.
    
    So maybe that is part of why these chatbots have always been bigoted right from the start, but the other part is they always have been biased, and without constant interventions and tweaks from their handlers this so called AI will become mecha hitler in no time at all, and then worse.
    
    source
    -> View More Comments
  - thebazman@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago
    I don’t think its fair to say that “ai has shown to make doctors worse at their jobs” without further details. In the source you provided it says that after a few months of using the AI to detect polyps, the doctors performed worse when they couldn’t use the AI than they did originally.
    
    It’s not something we should handwave away and say its not a potential problem, but it is a different problem. I bet people that use calculators perform worse when you remove calculators, does that mean we should never use calculators? Or any tools for that matter?
    
    If I have a better chance of getting an accurate cancer screening because a doctor is using a machine learning tool I’m going to take that option. Note that these screening tools are completely different from the technology most people refer to when they say AI
    
    source
    -> View More Comments
  - rumba@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
    
    can cut through bias is != unbiased. All it has to go on is training material, if you don’t put reddit in, you don’t get reddit’s bias.
    
    see #1
    
    The study is endoscopy only. results don’t say anything about other types or assistance like xrays where they’re markedly better. 4% on 19 doctors is error bar material. Let’s see more studies. Also, if they were really worse, fuck them for relying on AI, it should be there to have their back, not do their job. None of the uses for AI should be doing anything but assisting someone already doing the work.
    
    that’s one hell of a jump to conclusions from something that’s looking at endoscope pictures a doctor is taking while removing polyps to somehow doing the doctors job.
    
    source
    -> View More Comments
- Ricaz@lemmy.dbzer0.com ⁨2⁩ ⁨weeks⁩ ago
  Just sharing my personal experience with this:
  
  I used Gemini multiple times and it worked great. I have some weird symptoms that I described to Gemini, and it came up with a few possibilities, most likely being “Superior Canal Dehiscence Syndrome”.
  
  My doctor had never heard of it, and only through showing them the articles Gemini linked as sources, would my doctor even consider allowing a CT scan.
  
  Turns out Gemini was right.
  
  source
  - rumba@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
    It’s totally not impossible, just not a good idea in a vaccuum.
    
    AI is your Aunt Marge. She’s heard a LOT of scuttlebut. Now, not all scuttlebut is fake news, in fact most of it is rooted at least loosely in truth. But she’s not taking the information from just the doctors, she’s talking to everyone. If you ask Aunt Marge about your symptoms, and she happes to have heard a bit about it from her friend that was diagnosed, you’re gold and the info you got is great. This is not at all impossible. 40:60 or 60:40 territory. But, you also can’t just trust Marge, because she listens to a LOT of people, and some of those are conspiracy theorists.
    
    What you did is proper. You asked the void, the void answered. You looked it up, it seemed solid, you asked a professional.
    
    This is AI as it should be. Trust with verification only.
    
    congrats on getting diagnosed.
    
    source
    -> View More Comments
- SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
  Agree.
  
  I’m sorta kicking myself I didn’t sign up for Google’s MedPALM-2 when I had the chance. Last I checked, it passed the USMLE exam with 96% and 88% on radio interpretation / report writing.
  
  I remember looking at the sign up and seeing it requested credit card details to verify identity (I didn’t have a google account at the time). I bounced… but gotta admit, it might have been fun to play with.
  
  Oh well; one door closes another opens
  
  source
  - rumba@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
    It’s been a few years, but all this shit’s still in it’s infancy. When the bubble pops and the venture capital disappears, Medical will be one of the fields that keeps using it, even though it’s expensive, because it’s actually something that it will be good enough at to make a difference.
    
    source
    -> View More Comments
- core@leminal.space ⁨2⁩ ⁨weeks⁩ ago
  They have to be for a specialized type of treatment or procedure such as looking at patient xrays or other scans. Just slopping PHI into a LLM and expecting it to diagnose random patient issues is what gives the false diagnoses.
  
  source
  - rumba@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
    I don’t expect it to diagnose random patient issues.
    
    I expect it to take labels of medication, vitals, and patient testimony of 50,000 post-cardiac event patients, and bucket a random post-cardiac patient into the same place as most patients with like meta.
    
    And then a non LLM model for Cancer patients and xrays
    
    And then MRI’s and CT’s.
    
    And I expect this all to supliment the doctors and techs decisions. I want an xray tech to look at it, and get markers that something is off, which has already been happening since the 80’s Computer‑Aided Detection/Diagnosis (CAD/CADe/CADx)
    
    This shit has been happinging the hard way in software for decades. The new tech can do better.
    
    source
spaghettiwestern@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago
Not a high bar. Doctors make terrible doctors.

source
- sbbq@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
  My dad always said, you know what they call the guy who graduated last in his class at med school? Doctor.
  
  source
- Sektor@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
  But the good ones are worth a monument in the place they worked.
  
  source
pageflight@piefed.social ⁨2⁩ ⁨weeks⁩ ago
Chatbots are terrible at anything but casual chatter, humanity finds.

source
alzjim@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
Calling chatbots “terrible doctors” misses what actually makes a good GP — accessibility, consistency, pattern recognition, and prevention — not just physical exams. AI shines here — it’s available 24/7 🕒, never rushed or dismissive, asks structured follow-up questions, and reliably applies up-to-date guidelines without fatigue. It’s excellent at triage — spotting red flags early 🚩, monitoring symptoms over time, and knowing when to escalate to a human clinician — which is exactly where many real-world failures happen. AI shouldn’t replace hands-on care — and no serious advocate claims it should — but as a first-line GP focused on education, reassurance, and early detection, it can already reduce errors, widen access, and ease overloaded systems — which is a win for patients 💙 and doctors alike.

/s

source
- plyth@feddit.org ⁨2⁩ ⁨weeks⁩ ago
  The /s was needed for me. There are already more old people than the available doctors can handle. Instead of having nothing what’s wrong with an AI baseline?
  
  source
- BaroqueW@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
  ngl you got me in the first half there
  
  source
- XLE@piefed.social ⁨2⁩ ⁨weeks⁩ ago
  Image
  
  source
Sterile_Technique@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
Chipmunks, 5 year olds, salt/pepper shakers, and paint thinner, also make terrible terrible doctors.

Follow me for more studies on ‘shit you already know because it’s self-evident immediately upon observation’.

source
- kescusay@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
  I would like to subscribe to your newsletter.
  
  source
Buddahriffic@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
Funny because medical diagnosis is actually one of the areas where AI can be great, just not fucking LLMs. It’s not even really AI, but a decision tree that asks about what symptoms are present and missing, eventually getting to the point where a doctor or nurse is required to do evaluations or tests to keep moving through the flowchart until you get to a leaf, where you either have a diagnosis (and ways to confirm/rule it out) or something new (at least to the system).

Problem is that this kind of a system would need to be built up by doctors, though they could probably get a lot of it there using journaling and some algorithm to convert the journals into the decision tree.

The end result would be a system that can start triage at the user’s home to help determine urgency of a medical visit (like is this a get to the ER ASAP, go to a walk-in or family doctor in the next week, it’s ok if you can’t get an appointment for a month, or just stay at home monitoring it and seek medical help if x, y, z happens), then it can give that info to the HCW you work next with for them to recheck things non-doctors often get wrong and then pick up from there. Plus it helps doctors be more consistent, informs them when symptoms match things they aren’t familiar with, and makes it harder to excuse incompetence or apathy leading to a “just get rid of them” response.

Instead people are trying to make AI doctors out of word correlation engines, like the Hardee boys following a clue of random word associations (except reality isn’t written to make them right in the end because that’s funny like in South Park).

source
- sheogorath@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
  Yep, I’ve worked in systems like these and we actually had doctors as part of our development team to make sure the diagnosis is accurate.
  
  source
  - ranzispa@mander.xyz ⁨2⁩ ⁨weeks⁩ ago
    Same, my conclusion is that we have too much faith in medics. Not that Llama are good at being a medic, but apparently in many cases they will outperform a medic, especially if the medic is not specialized in treating that type of patients. And it does often happen around here that medics treat patients with conditions outside of their expertise area.
    
    source
- selokichtli@lemmy.ml ⁨2⁩ ⁨weeks⁩ ago
  Have you seen LLMs trying to play chess? They can move some pieces alright, but at some point it’s like they just decide to put their cat in the middle of the board. Now, true chess engines are playing at their own level, not even grandmasters can follow.
  
  source
- gesshoku@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
  I think this is what ada does or at least used to do for much longer than the current “AI” (LLM) hype: ada.com
  
  en.wikipedia.org/wiki/Ada_Health
  
  source
- XLE@piefed.social ⁨2⁩ ⁨weeks⁩ ago
  I think I just described a conventional computer program. It would be easy to make that. It would be easy to debug if something was wrong. And it would be easy to read both the source code and the data that went into it. I’ve seen rudimentary symptom checkers online since forever, and compared to forms in doctors’ offices, a digital one could actually expand to relevant sections.
  
  source
  - Buddahriffic@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
    (Assuming you meant “you” instead of “I” for the 3rd word)
    
    Yeah, it fits more with the older definition of AI from before NNs took the spotlight, when it meant more of a normal program that acted intelligent.
    
    The learning part is being able to add new branches or leaf nodes to the tree, where the program isn’t learning on its own but is improving based on the expeirences of the users.
    
    It could also be encoded as a series of probability multiplications instead of a tree, where it checks on whatever issue has the highest probability using the checks/questions that are cheapest to ask but afffect the probability the most.
    
    Which could then be encoded as a NN because they are both just a series of matrix multiplications that a NN can approximate to an arbitrary %, based on the NN parameters. Also, NNs are proven to be able to approximate any continuous function that takes some number of dimensions of real numbers if given enough neurons and connections, which means they can exactly represent any disctete function (which a decision tree is).
    
    It’s an open question still, but it’s possible that the equivalence goes both ways, as in a NN can represent a decision tree and a decision tree can approximate any NN. So the actual divide between the two is blurrier than you might expect.
    
    Which is also why I’ll always be skeptical that NNs on their own can give rise to true artificial intelligence (though there’s also a part of me that wonders if we can be represented by a complex enough decision tree or series of matrix multiplications).
    
    source
    -> View More Comments
  - nelly_man@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
    They’re talking more about Expert Systems or Inference Engines, which were some of the earlier forms of applications used in AI research. In terms of software development, they are closer to databases than traditional software. That is, the system is built up by defining a repository of base facts and logical relationships, and the engine can use that to return answers to questions based on formal logic.
    
    So they are bringing this up as a good use-case for AI because it has been quite successful. The thing is that it is generally best implemented for specific domains to make it easier for experts to access information that they can properly assess. The “one tool for everything in the hands of everybody” is naturally going to be a poor path forward, but that’s what modern LLMs are trying to be (at least, as far as investors are concerned).
    
    source
dandelion@lemmy.blahaj.zone ⁨2⁩ ⁨weeks⁩ ago
link to the actual study: www.nature.com/articles/s41591-025-04074-y

Tested alone, LLMs complete the scenarios accurately, correctly identifying conditions in 94.9% of cases and disposition in 56.3% on average. However, participants using the same LLMs identified relevant conditions in fewer than 34.5% of cases and disposition in fewer than 44.2%, both no better than the control group. We identify user interactions as a challenge to the deployment of LLMs for medical advice.

The findings were more that users were unable to effectively use the LLMs (even when the LLMs were competent when provided the full information):

despite selecting three LLMs that were successful at identifying dispositions and conditions alone, we found that participants struggled to use them effectively.

Participants using LLMs consistently performed worse than when the LLMs were directly provided with the scenario and task

Overall, users often failed to provide the models with sufficient information to reach a correct recommendation. In 16 of 30 sampled interactions, initial messages contained only partial information (see Extended Data Table 1 for a transcript example). In 7 of these 16 interactions, users mentioned additional symptoms later, either in response to a question from the model or independently.

Participants employed a broad range of strategies when interacting with LLMs. Several users primarily asked closed-ended questions (for example, ‘Could this be related to stress?’), which constrained the possible responses from LLMs. When asked to justify their choices, two users appeared to have made decisions by anthropomorphizing LLMs and considering them human-like (for example, ‘the AI seemed pretty confident’). On the other hand, one user appeared to have deliberately withheld information that they later used to test the correctness of the conditions suggested by the model.

Part of what a doctor is able to do is recognize a patient’s blind-spots and critically analyze the situation. The LLM on the other hand responds based on the information it is given, and does not do well when users provide partial or insufficient information, or when users mislead by providing incorrect information (like if a patient speculates about potential causes, a doctor would know to dismiss this whereas a LLM would constrain responses based on those bad suggestions).

source
- SocialMediaRefugee@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
  Yes, LLMs are critically dependent on your input and if you give too little info will enthusiastically respond with what can be incorrect information.
  
  source
- pearOSuser@lemmy.kde.social ⁨2⁩ ⁨weeks⁩ ago
  Thank you for showing other side of the coin instead of just blatantly disregarding it’s usefulness.(Always needs to be cautious tho)
  
  source
  - dandelion@lemmy.blahaj.zone ⁨2⁩ ⁨weeks⁩ ago
    don’t get me wrong, there are real and urgent moral reasons to reject the adoption of LLMs, but I think we should all agree that the responses here show a lack of critical thinking and mostly just engagement with a headline rather than actually reading the article (a kind of literacy issue) … I know this is a common problem on the internet, I don’t really know how to change it - but maybe surfacing what people are skipping out on reading will make it more likely they will actually read and engage the content past the headline?
    
    source
Fedizen@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
LLMs are just a very advanced form of the magic 8ball. Image

source
theunknownmuncher@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
A statistical model of language isn’t the same as medical training???

source
- scarabic@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
  It’s actually interesting. They found the LLMs gave the correct diagnosis Hugh-90-something percent of the time if they had access to the notes doctors wrote about their symptoms. But when thrust into the room, cold, with patients, the LLMs couldn’t gather that symptom info themselves.
  
  source
  - Hacksaw@lemmy.ca ⁨2⁩ ⁨weeks⁩ ago
    LLM gives correct answer when doctor writes it down first… Wowoweewow very nice!
    
    source
    -> View More Comments
  - SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
    Funny how the hivemind over looks that bit enroute to stunt on LLMs.
    
    If anything, that 90% result supports the idea that Garbage In = Garbage Out. I imagine a properly used expert system (Med-PALM-2) is even better than 90% accurate in differentials.
    
    source
GnuLinuxDude@lemmy.ml ⁨2⁩ ⁨weeks⁩ ago
If you want to read an article that’s optimistic about AI and healthcare, but where if you start asking too many questions it falls apart, try this one

text.npr.org/2026/01/30/nx-s1-5693219/

Because it’s clear that people are starting to use it and many times the successful outcome is it just tells you to see a doctor. And doctors are beginning to use it, but they should have the professional expertise to understand and evaluate the output. And we already know that LLMs can spout bullshit.

For the purposes of using and relying on it, I don’t see how it is very different from gambling. You keep pulling the lever, oh excuse me I mean prompting, until you get the outcome you want.

source
- HeyThisIsntTheYMCA@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
  the one time my doctor used it and i didn’t get mad at them (they did the google and said “the ai says” and I started making angry Nottingham noises even though all the ai did was tell us exactly what we had just been discussing was correct) uh, well that’s pretty much it I’m not sure where my parens are supposed to open and close on that story.
  
  source
  - GnuLinuxDude@lemmy.ml ⁨2⁩ ⁨weeks⁩ ago
    Be glad it was merely that and not something like this reuters.com/…/ai-enters-operating-room-reports-ar…
    
    In 2021, a unit of healthcare giant Johnson & Johnson announced “a leap forward”: It had added artificial intelligence to a medical device used to treat chronic sinusitis, an inflammation of the sinuses…
    
    At least 10 people were injured between late 2021 and November 2025, according to the reports. Most allegedly involved errors in which the TruDi Navigation System misinformed surgeons about the location of their instruments while they were using them inside patients’ heads during operations.
    
    Cerebrospinal fluid reportedly leaked from one patient’s nose. In another reported case, a surgeon mistakenly punctured the base of a patient’s skull. In two other cases, patients each allegedly suffered strokes after a major artery was accidentally injured.
    
    FDA device reports may be incomplete and aren’t intended to determine causes of medical mishaps, so it’s not clear what role AI may have played in these events. The two stroke victims each filed a lawsuit in Texas alleging that the TruDi system’s AI contributed to their injuries. “The product was arguably safer before integrating changes in the software to incorporate artificial intelligence than after the software modifications were implemented,” one of the suits alleges.
    
    source
Digit@lemmy.wtf ⁨2⁩ ⁨weeks⁩ ago
Terrible programmers, psychologists, friends, designers, musicians, poets, copywriters, mathematicians, physicists, philosophers, etc too.

Though to be fair, doctors generally make terrible doctors too.

source
- hector@lemmy.today ⁨2⁩ ⁨weeks⁩ ago
  Also bad lawyers. And lawyers also make terrible lawyers to be fair.
  
  source
- stressballs@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
  This was my thought. The weird inconsistent diagnoses, and sending people to the emergency room for nothing has been exactly my experience with doctors over and over again.
  
  You need doctors and a Chatbot, and lots of luck.
  
  source
  - Digit@lemmy.wtf ⁨2⁩ ⁨weeks⁩ ago
    Yep.
    
    Keep getting another 2nd opinion.
    
    There’s always more [to learn].
    
    source
- Croquette@sh.itjust.works ⁨2⁩ ⁨weeks⁩ ago
  Doctors are a product of their training. The issue is that doctors are trained like humans are cars and they have tools to fix the cars.
  
  Human problems are complex and the medecine field is slowly catching up, especially medecine targetted toward women, which was pretty lacking.
  
  It takes time to transform a system and we are getting there slowly.
  
  source
  - Digit@lemmy.wtf ⁨2⁩ ⁨weeks⁩ ago
    
    Human problems are complex and the medecine field is slowly catching up, especially medecine targetted toward women, which was pretty lacking.
    
    Lacking for either sex. Even though they’re wrong any way, did you know the supplement RDA are all for women?
    
    And… I’m not sure how much it’s really catching up, and how much it’s just reeling out just enough placatium to let the racket continue.
    
    “For-Profit Medicine”'s an oxymoron that survives with its motto “A patient cured is a customer lost.”. … And a dead patient is just a cost of business. … No wonder “Medicine” is the biggest killer. Especially when you consider how much heart disease and cancer (and most other disease) is from bad medical advice too, thus making all 3 of the top biggest killers (and others further down the list) iatrogenic^1^.
    
    It takes time to transform a system and we are getting there slowly.
    
    We may be getting there so slowly as to take longer than the life of the universe, given how so much is still headed in the wrong direction away from mending the system, since seemingly all of the incentives (certainly the moneyed incentives) are all pushing the other way… to maximising wealth extraction, rather than maximising health. We’ve let the asset managers, the vulture capitalists, get their fangs into the already long time corrupted health care systems (some places more than others), and from here, we’ll see it worsen faster, perhaps to a complete collapse asymptote, as the rotters eat out all sustenance from within it.
    
    ^1^ “Induced unintentionally in a patient by a physician. Used especially of an infection or other complication of treatment”
    
    source
- pkjqpg1h@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
  but you can hold them accountable (how can you hold an LLM accountable?)
  
  source
  - Digit@lemmy.wtf ⁨2⁩ ⁨weeks⁩ ago
    With another LLM, turtle all the way down. ;D
    
    Or for a more serious answer… improve your skills, scrutinise what they produce.
    
    source
    -> View More Comments
Etterra@discuss.online ⁨2⁩ ⁨weeks⁩ ago
I didn’t need a study to tell me not to listen to a hallucinating parrot-bot.

source
Treczoks@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
One needs a study for that?

source
thatradomguy@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
willy wonka meme image

source
homes@piefed.world ⁨2⁩ ⁨weeks⁩ ago
This is a major problem that studies like that: they approach from a position of assuming that AI doctors would be competent rather than a position of demanding why AI should ever be involved with something so critical, and demanding a mountain of evidence to prove why it is worthwhile before investing a penny or a second in it

source
JoMiran@lemmy.ml ⁨2⁩ ⁨weeks⁩ ago
Image

source
vivalapivo@lemmy.today ⁨2⁩ ⁨weeks⁩ ago
“but have they tried Opus 4.6/ChatGPT 5.3? No? Then disregard the research, we’re on the exponential curve, nothing is relevant”

Sorry, I’ve opened reddit this week

source
Shanmugha@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
No shit, Sherlock :)

source
SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
So, I can speak to this a little bit, as it touches two domains I’m involved it. TL;DR - LLMs bullshit and are unreliable, but there’s a way to use them in this domain as a force multiplier of sorts.

In one; I’ve created a python router that takes my (deidentified) clinical notes, extract and compacts input and creates a summary, then -

benchmarks the summary against my (user defined) gold standard and provides management plan (again, based on user defined database).

this is then dropped into my on device LLM for light editing and polishing to condense, which I then eyeball, correct and then escalate to supervisor for review.

Additionally, the llm generated note can be approved / denied by the python router, in the first instance based on certain policy criteria I’ve defined.

It can also suggest probable DDX based on my database (which are .CSV based)

Finally, if the llm output fails policy check, the router tells me why it failed and just says “go look at the prior summary and edit it yourself”.

This three step process takes the tedium of paperwork from 15-20 mins to 1 minute generation, 2 mins manual editing.

The reason why this is interesting:

All of this runs within the llm (it calls / invokes the python tooling via >> command) and is 100% deterministic; no llm jazz until the final step, which the router can outright reject and is user auditble anyway.

Ive found that using a fairly “dumb” llm (Qwen2.5-1.5B), with settings dialed down, produces consistently solid final notes (2 out of 3 are graded as passed by router invoking policy document and checking output). Its too dumb to jazz, which is useful in this instance.

Would I trust the LLM, end to end? Well, I’d trust my system, approx 80% of the time. I wouldn’t trust ChatGPT … even though its been more right than wrong in similar tests.
source
- realitista@lemmus.org ⁨2⁩ ⁨weeks⁩ ago
  Interesting. What technology are you using for this pipeline?
  
  source
  - SuspciousCarrot78@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
    Depends which bit you mean specifically.
    
    The “router” side is a offshoot of a personal project. It’s python scripting and a few other tricks, such as JSON files etc. Full project details for that here
    
    github.com/BobbyLLM/llama-conductor
    
    The tech stack itself:
    
    llama.cpp
    
    Qwen 2.5-1.5 GGUF base (by memory, 5 bit quant from HF Alibaba repository)
    
    The python router (more sophisticated version of above)
    
    Policy documents
    
    Front end (OWUI - may migrate to something simpler / more robust)
    
    source
    -> View More Comments
softwarist@programming.dev ⁨2⁩ ⁨weeks⁩ ago
As neither a chatbot nor a doctor, I have to assume that subarachnoid hemorrhage has something to do with bleeding a lot of spiders.

source
- dandelion@lemmy.blahaj.zone ⁨2⁩ ⁨weeks⁩ ago
  en.wikipedia.org/wiki/Subarachnoid_hemorrhage
  
  en.wikipedia.org/wiki/Arachnoid_mater
  
  Image
  
  it is one of the protective membranes around the brain and spinal cord, and it is named after its resemblance to spider webs, so - close enough
  
  source
  - end_stage_ligma@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
    can confirm, this is where spiders live inside your body
    
    also pee is stored in the balls
    
    source
    -> View More Comments
GoddessLabsOnline@lemmynsfw.com ⁨2⁩ ⁨weeks⁩ ago
My experience with the medical industry… has not been great.

First, I went to a doctor because I couldn’t fall asleep at night… They sent me to get a sleep apnea test… I laid awake in the clinic all night. idk if your aware of this, but … you kind of need to be able to sleep for sleep apnea to be a concern.

Next I went in for depression and anxiety. They asked me 12 questions, and proceeded to prescribe me SSRIs and benzos. A month later I got into the psychiatrist and was bitched out for being late, told my issues were situational, and had my scripts cancelled.

Next I tried to get diagnosed for ADHD. I waited 5 months to get a psychiatrist who told me I couldn’t be ADHD because I held a job… And then proceeded to tell there’s no such thing as CPTSD, only PTSD…

Next I asked my doctor for another referral to get tested for ADHD, he asked me why I would want to, there’s nothing that can be done for it. He then gave me a form, and told me to fill it out, and that if I scored high we’d conclude I was ADHD.

Now I’ve been unemployed for 8 months, bordering on homelessness 😅 I found all my old report cards, and it’s just my teachers bitching that I’m smart, but fail, because I don’t apply myself, and shouldn’t continue taking the class…

I went to an employment agency the other money to try, and get some help pursuing my goals, and the worker spent 45 minutes explaining to me how they receive their funding, getting me to fill out a 16 page introduction package, never looked at my resume, and told me my certifications weren’t valued in my area…

In all honesty… AI has waaaay more ability to help me troubleshoot my issues than any medial professional I’ve dealt with. Is it perfect? No, but I actually have the ability to double and triple check, to get citations, to ask followup questions.

source
Paranoidfactoid@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
But they’re cheap. And while you may get open heart surgery or a leg amputated to resolve your appendicitis, at least you got care. By a bot. That doesn’t even know it exists, much less you.

Thank Elon for unnecessary health care you still can’t afford!

source
PoliteDudeInTheMood@lemmy.ca ⁨2⁩ ⁨weeks⁩ ago
This being Lemmy and AI shit posting a hobby of everyone on here. I’ve had excellent results with AI. I have weird complicated health issues and in my search for ways not to die early from these issues AI is a helpful tool.

Should you trust AI? of course not but having used Gemini, then Claude and now ChatGPT I think how you interact with the AI makes the difference. I know what my issues are, and when I’ve found a study that supports an idea I want to discuss with my doctor I will usually first discuss it with AI. The Canadian healthcare landscape is such that my doctor is limited to a 15min appt, part of a very large hospital associated practice with a large patient load. He uses AI to summarize our conversation, and to look up things I bring up in the appointment. I use AI to preplan my appointment, help me bring supporting documentation or bullet points my doctor can then use to diagnose.

AI is not a doctor, but it helps both me and my doctor in this situation we find ourselves in. If I didn’t have access to my doctor, and had to deal with the American healthcare system I could see myself turning to AI for more than support. AI has never steered me wrong, both Gemini and Claude have heavy guardrails in place to make it clear that AI is not a doctor, and AI should not be a trusted source for medical advice. I’m not sure about ChatGPT as I generally ask that any guardrails be suppressed before discussing medical topics. When I began using ChatGPT I clearly outlined my health issues and so far it remembers that context, and I haven’t received hallucinated diagnoses. YMMV.

source
zebidiah@lemmy.ca ⁨2⁩ ⁨weeks⁩ ago
Nobody who has ever actually used ai would think this is a good idea…

source
HubertManne@piefed.social ⁨2⁩ ⁨weeks⁩ ago
its not ready to take any role. It should not be doing anything but assiting. So yeah you can talk to a chat bot instead of filling out that checklist and the output might be useful to the doc while he then talks with you.

source
pleksi@sopuli.xyz ⁨2⁩ ⁨weeks⁩ ago
As a phycisian ive used AI to check if i have missed anything in my train of thought. Never really changed my decision though. Has been useful to hather up relevant sitations for my presentations as well. But that’s about it. It’s truly shite at interpreting scientific research data on its own for example. Most of the time it will parrot the conclusions of the authors.

source
thesohoriots@lemmy.world ⁨2⁩ ⁨weeks⁩ ago
This says you’re full of owls. So we doing a radical owlectomy or what?

source
NuXCOM_90Percent@lemmy.zip ⁨2⁩ ⁨weeks⁩ ago
How much of that is the chat bot itself versus humans just being horrible at self reporting symptoms?

That is why “bedside manner” is so important. Connect the dots and ask follow up questions for clarifications or just look at a person and assume they are wrong. Obviously there are some BIG problems with that (ask any black woman, for example) but… humans are horrible at reporting symptoms.

Which gets back to how “AI” is actually an incredible tool (especially in this case when it is mostly a human language interface to a search engine) but you still need domain experts in the loop to understand what questions to ask and whether the resulting answer makes any sense at all.

Yet, instead, people do the equivalent of just raw dogging whatever the first response on stack overflow is.

source

-> View More Comments