Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

A weird phrase is plaguing scientific papers – and we traced it back to a glitch in AI training data

⁨328⁩ ⁨likes⁩

Submitted ⁨⁨3⁩ ⁨weeks⁩ ago⁩ by ⁨Fallstar@mander.xyz⁩ to ⁨technology@lemmy.world⁩

https://theconversation.com/a-weird-phrase-is-plaguing-scientific-papers-and-we-traced-it-back-to-a-glitch-in-ai-training-data-254463

source

Comments

Sort:hotnewtop
  • crystalmerchant@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

    The phrase is “vegetative electron microscopy”

    source
    • catloaf@lemm.ee ⁨3⁩ ⁨weeks⁩ ago

      And it looks more like a machine translation error than anything else. Per the article, there was a dataset with two instances of the phrase being created from bad OCR. Then, more recently, somehow the bad phrase got associated with a typo: in Farsi, the words “scanning” and “vegetative” are extremely similar. Thus, when some Iranian authors wanted to translate their paper to English, they used an LLM, and it decided that since “vegetative electron microscope” was apparently a valid term (since it was included in its training data), that’s what they meant.

      It’s not that the entire papers were being invented from nothing by Chatgpt.

      source
      • wewbull@feddit.uk ⁨3⁩ ⁨weeks⁩ ago

        It’s not that the entire papers were being invented from nothing by Chatgpt.

        Yes it is. The papers are the product of an LLM. Even if the user only thought it was translating, the translation hasn’t been reviewed and has errors. The causal link between what goes in to an LLM and what comes out is not certain, so if nobody is checking the output it could just be a technical sounding lorem ipsum generator.

        source
        • -> View More Comments
      • criitz@reddthat.com ⁨3⁩ ⁨weeks⁩ ago

        It’s been found in many papers though. Do they all have such excuses?

        source
        • -> View More Comments
  • Telorand@reddthat.com ⁨3⁩ ⁨weeks⁩ ago

    The lede is buried deep in this one. Yeah, these dumb LLMs got bad training data that persists to this day, but more concerning is the fact that some scientists are relying upon LLMs to write their papers. This is literally the way scientists communicate their findings to other scientists, lawmakers, and the public, and they’re using fucking predictive text like it has cognition and knows anything.

    Sure, most (all?) of those papers got retracted, but those are just the ones that got caught. How many more are lurking out there with garbage claims fabricated by a chatbot?

    Thankfully, science will inevitably sus those papers out eventually, as it always does, but it’s shameful that any scientist would be so fatuous to put out a paper written by a dumb bot. You’re the experts. Write your own goddamn papers.

    source
    • adespoton@lemmy.ca ⁨3⁩ ⁨weeks⁩ ago

      In some cases, it’s people who’ve done the research and written the paper who then use an LLM to give it a final polish. Often, it’s people who are writing in a non-native language.

      Doesn’t make it good or right, but adds some context.

      source
      • Telorand@reddthat.com ⁨3⁩ ⁨weeks⁩ ago

        Sure, and I’m sympathetic to the baffling difficulties of English, but use Google Translate and ask someone who’s more fluent for help with the final polish (as a single suggestion). Trusting your work, trusting science to an LLM is lunacy.

        source
        • -> View More Comments
      • wewbull@feddit.uk ⁨3⁩ ⁨weeks⁩ ago

        Adding extra polish like nonsense phrases. Nobody is supervising it then.

        source
    • BussyCat@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

      They were translating them not actually writing them like obviously it should have been caught by reviewers but that’s not nearly as bad

      source
      • wewbull@feddit.uk ⁨3⁩ ⁨weeks⁩ ago

        Translating them…otherwise know as rewriting the whole paper.

        source
        • -> View More Comments
    • dgriffith@aussie.zone ⁨3⁩ ⁨weeks⁩ ago

      Thankfully, science will inevitably sus those papers out eventually, as it always does,

      In the future, all search engines will have an option to ignore any results from 2022-20xx, the era of AI slop.

      source
    • unexposedhazard@discuss.tchncs.de ⁨3⁩ ⁨weeks⁩ ago

      Its the immediate takeaway i made from the headline, so i dont feel like its buried deep

      source
      • Telorand@reddthat.com ⁨3⁩ ⁨weeks⁩ ago

        It’s not mentioned at all in the article, so what you inferred from the headline is not what the author conveyed.

        source
        • -> View More Comments
    • Ledericas@lemm.ee ⁨3⁩ ⁨weeks⁩ ago

      oh yea,not to mention alot of papers tend to be low quality before the AI was used, ive been hearing people are writing dozens of papers just to fluff up thier resume/cv. it was quanitity over quality. i was in an presentation where the guy presenting thier research wrote 40+ papers just to get hired a university somewhere.

      source
  • yuki2501@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

    The scientific community needs to gather and reach a consensus where AI is banned from writing their papers.

    source
  • TachyonTele@lemm.ee ⁨3⁩ ⁨weeks⁩ ago

    Don’t use fucking AI to write scientific papers and the problem is solved. Wtf.

    source
    • Cryophilia@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

      More salient takeaway is, don’t use a LLM to translate a scientific paper. Because it can’t translate a scientific paper. It can only rewrite the entire paper, in a different language. And it will introduce misunderstandings and hallucinations.

      source
  • MuskyMelon@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

    GIGO overcomes all

    source
  • HailSeitan@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

    Let’s delve into the issue

    source
  • Archangel1313@lemm.ee ⁨3⁩ ⁨weeks⁩ ago

    So, all those research papers were written by AI? Huh.

    source
    • angrystego@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

      No, they were not. AI was probably used for translation.

      source
      • wewbull@feddit.uk ⁨3⁩ ⁨weeks⁩ ago

        Translating is the process of rewriting the paper in another language. The paper has been written (in English) by an LLM.

        source
        • -> View More Comments
  • Letsdothisok@lemmy.world ⁨3⁩ ⁨weeks⁩ ago

    Super interesting. But also, super boring.

    source