Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

How AI and Wikipedia have sent vulnerable languages into a doom spiral

⁨161⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨days⁩ ago⁩ by ⁨chobeat@lemmy.ml⁩ to ⁨technology@lemmy.world⁩

https://www.technologyreview.com/2025/09/25/1124005/ai-wikipedia-vulnerable-languages-doom-spiral/

source

Comments

Sort:hotnewtop
  • A_norny_mousse@feddit.org ⁨2⁩ ⁨days⁩ ago

    As soon as you leave the big languages, esp. English, Wikipedia can be very problematic for all sorts of reasons.
    Mostly because of a lack of eyeballs.
    But it doesn’t end with merely badly written/generated content but also with narrative manipulation that - unlike in the English version - remains unchallenged.

    source
    • vacuumflower@lemmy.sdf.org ⁨1⁩ ⁨day⁩ ago

      Sorry, but English-speaking countries have basically invented “narrative manipulation”. For most of history it was normal that there are many competing narratives from interested parties on anything. But such sophistication at making one side’s narrative seem impartial, perpetually contested and self-healing has never been achieved before.

      It’s as if you paint a lake red, it’s expensive, and people may get used to it and even believe that’s kinda normal, but one can still see that it’s just one lake. If you paint the world oceans red, so that it rains red and mists red, that’s far more persuasive, and that’s what the “collective West” has achieved.

      To make a lake painted red seem normal, you need to prevent most of your population from looking at other lakes. But when you’ve managed to paint the ocean red, you don’t need to limit them at all. The fence and the punishment would hurt trust, but without them your and other people looking at the red oceans and rains will think they are also free.

      Despite being just one alliance of former and current colonizing powers on this planet.

      It’s very sad to live in an era of frustration where we can see that it can’t reform itself further in humanist direction, than it already has by about year 1988.

      Sort of like a planetwide revolutionary situation by Lenin, where the dominating powers can’t keep the order the old way (that persuasion still slowly dies), and the dominated can’t live the old way. But, as we know, revolutionary situations by Lenin generally don’t lead to what one would hope for.

      source
      • A_norny_mousse@feddit.org ⁨1⁩ ⁨day⁩ ago

        Sorry, but English-speaking countries have basically invented “narrative manipulation”.

        You have no idea how wrong you are. I could claim it was the roman catholic church and there’d probably still be older examples.

        Nothing against you personally, but this is not the edgy take you think it is.

        Oh, I forgot. The point is that it’s actually nice sometimes to have alternative pages in smaller languages on niche subjects, explained better to my own taste.

        No, the point is that there are countries where people speak these languages and they want to read things in their own language. Sheesh.

        source
        • -> View More Comments
    • Truscape@lemmy.blahaj.zone ⁨1⁩ ⁨day⁩ ago

      I wonder if language and other cultural fields are the only areas where Linus’s law are impossible to safely apply. Programming seems quite easy by comparison.

      source
      • squaresinger@lemmy.world ⁨1⁩ ⁨day⁩ ago

        Hmm, the law begins with “Given enough eyeballs”. So it’s explicitly not about small-language Wikipedia sites having too few editors.

        It also doesn’t talk about finding consensus. “All bugs are shallow” means that someone can see the solution. In software development, that’s most often quite easy, especially when it comes to bugfixes. It’s rarely difficult to verify whether the solution to a bug works or not. So in most cases if someone finds a solution and it works, that’s good enough for everyone.

        In cultural fields, that’s decidedly not the case.

        For most of society’s problems, there are hardly any new solutions. We have had the same basic problems for centuries and pretty much “all” the solutions have been proposed decades or centuries ago.

        How to make government fair? How to get rid of crime? How to make a good society?

        These things have literally been issues since the first humans learned to speak.

        That’s why Linus’ law doesn’t really apply here. We all want different things and there’s no fix that satisfies all requirements or preferences.

        source
        • -> View More Comments
  • AnarchistArtificer@slrpnk.net ⁨1⁩ ⁨day⁩ ago

    As a society, we need to better value the labour that goes into our collective knowledge bases. Non-English Wikipedia is just one example of this, but it highlights the core of the problem: the system relies on a tremendous amount of skilled labour that cannot easily be done by just a few volunteers.

    Paying people to contribute would come with problems of its own (in a hypothetical world where this was permitted by Wikipedia, which I don’t believe it is at present), but it would be easier for people to contribute if the time they wanted to volunteer was competing with their need to keep their head above the water financially. Universal basic income, or something similar, seems like one of the more viable ways to improve this tension.

    However, a big component of the problem is around the less concrete side of how society values things. I’m a scientist in an area where we are increasingly reliant on scientific databases, such as the Protein Database (pdb), where experimentally determined protein structures are deposited and annotated, as well as countless databases on different genes and their functions. Active curation of these databases is how we’re able to research a gene in one model organism, and then apply those insights to the equivalent gene in other organisms.

    For example, the gene CG9536 is a term for a gene found in Drosophila melanogaster — fruit flies, a common model organism for genetic research, due to the ease of working with them in a lab. Much of the research around this particular gene can be found on flybase, a database for D. melanogaster gene research. Despite being super different to humans, there are many fruitfly genes that have equivalents in humans, and CG9536 is no exception; TMEM115 is what we call it in humans. The TL;DR answer of what this gene does is “we don’t know”, because although we have some knowledge of what it does, the tricky part about this kind of research is figuring out how genes or proteins interact as part of a wider system — even if we knew exactly what it does in a healthy person, for example, it’s much harder to understand what kinds of illnesses arise from a faulty version of a gene, or whether a gene or protein could be a target for developing novel drugs. I don’t know much about TMEM115 specifically, but I know someone who was exploring whether it could be relevant in understanding how certain kinds of brain tumours develop.

    Whilst the data that fill these databases are produced by experimental research that are attached to published papers, there’s a tremendous amount of work that makes all these resources talk to each other. That flybase link above links to the page on TMEM115, and I can use these resources to synthesise research across so many separate fields that would previously have been separate: the folks who work on flies will have a different research culture than those who work in human gene research, or yeast, or plants etc. TMEM115 is also sometimes called TM115, and it would be a nightmare if a scientist reviewing the literature missed some important existing research that referred to the gene under a slightly different name.

    Making these biological databases link up properly requires active curation, a process that the philosopher of Science Sabine Leonelli refers to as “data packaging”, a challenging task that includes asking “who else might find this data useful?” ^[1]. The people doing the experiments that produce the data aren’t necessarily the best people for figuring out how to package and label that data for others to use because inherently, this requires thinking in a way that spans many different research subfields. Crucially though, this infrastructure work gives a scientist far fewer opportunities to publish new papers, which means this essential labour is devalued in our current system of doing science.

    It’s rather like how some of the people who are adding poor quality articles to non-English Wikipedia feel like they’re contributing because using automated tools allows them to create more new articles than someone with actual specialist knowledge could. It’s the product of a culture of an ever-hungry “more” that fuels the production of slop, devalues the work of curators and is degrading our knowledge ecosystem. The financial incentives that drive this behaviour play a big role, but I see that as a symptom of a wider problem: society’s desire to easily quantify value causing important work that’s harder to quantify to be systematically devalued (a problem that we also see in how reproductive labour (i.e. the labour involved in managing a family or household) has historically been dismissed).

    We need to start recognising how tenuous our existing knowledge is. The OP discusses languages with few native speakers, which likely won’t affect many who read the article, but we’re at risk of losing so much more if we don’t learn to recognise how tenuous our collective knowledge is. The more we learn, the more we need to invest into expanding our systems of knowledge infrastructure, as well as maintaining what we already have.


    [1]: I am not going to cite the paper in which Sabine Leonelli coined the phrase “data packaging”, but her 2016 book “Data-Centric Biology: A Philosophical Study”. I don’t imagine that many people will read this large comment of mine, but if you’ve made it this far, you might be interested to check out her work. Though it’s not aimed at a general audience, it’s still fairly accessible, if you’re the kind of nerd who is interested in discussing the messy problem of making a database usable by everyone.

    If your appetite for learning is larger than your wallet, then I’d suggest that Anna’s Archive or similar is a good shout. Some communities aren’t cool with directly linking to resources like this, so know that you can check the Wikipedia page of shadow library sites to find a reliable link: en.wikipedia.org/wiki/Anna's_Archive

    source
    • AwesomeLowlander@sh.itjust.works ⁨1⁩ ⁨day⁩ ago

      This is the sort of comment that makes me wish I could do multiple upvotes

      source
      • GreyEyedGhost@lemmy.ca ⁨1⁩ ⁨day⁩ ago

        Tine to spin up some alts?

        source
  • chloroken@lemmy.ml ⁨1⁩ ⁨day⁩ ago

    It’s profoundly chauvinistic to think that people who speak other languages don’t have the same depth of literary resource as English-speakers because Wikipedia has fewer users.

    Books. They’re called books. Every nation speaking every language has them.

    source
    • HereIAm@lemmy.world ⁨1⁩ ⁨day⁩ ago

      I understand you’re trying to be nice to minority languages, but if you write research papers you either limit your demographic to your own country, or you publish in English (I guess Spanish is pretty world wide as well). If you set out to read a new paper in your field, I doubt you’d pick up something in Mongolian.

      Even in Sweden I would write a serious paper in English, so that more of the world could read it. Yes, we have text books for our courses that are in Swedish, but i doubt there are many books covering LLMs being published currently for example.

      source
      • chloroken@lemmy.ml ⁨1⁩ ⁨day⁩ ago

        I’m not “trying to be nice to minority languages”, I’m directly pushing back against the chauvinistic idea that Wikipedia is so important that those without it are somehow inferior.

        As for scientific papers, it’s called a translation. One can write academic literature in one’s native langaue and have it translated for more reach. That isnt the case with Wikipedia which is constantly being edited.

        source
        • -> View More Comments
  • Bloefz@lemmy.world ⁨1⁩ ⁨day⁩ ago

    Does it really matter? I think the extreme amount of languages in the world right now is not helping us communicate. I don’t view language as a cultural heritage thing, just a communication protocol. And I have moved around a lot in the world, it’s very difficult to be constantly adapting to different languages.

    I think if we had a universal language (note that it wouldn’t have to be English) we would be able to understand each other better and have less wars.

    source
    • RightEdofer@lemmy.ca ⁨1⁩ ⁨day⁩ ago

      This is the worst take I’ve ever seen.

      source
      • Bloefz@lemmy.world ⁨1⁩ ⁨day⁩ ago

        Yeah I’m just not really wed to any language. I guess it is also because I have moved around so much. I’m from Holland but I don’t consider myself a Dutch person, more like a citizen of the world. I’ve become too different to fit in in my home country (also because it’s become an extreme-right cesspool lately 😢 ). I’ve spent about half my life elsewhere. And the places I’ve lived where I spoke the languages I fared noticeably better.

        But I know a lot of people do view language as a cultural thing, it’s just my point of view.

        source
    • theoriginalcows@lemmings.world ⁨23⁩ ⁨hours⁩ ago

      I don’t view language as a cultural heritage thing, just a communication protocol.

      Well, you’re wrong.

      source
    • TankovayaDiviziya@lemmy.world ⁨23⁩ ⁨hours⁩ ago

      Languages have their own quirks and characters, representative of what the people’s cultural values and express ideas not even present in other cultures. As many languages have to be preserved as possible.

      source
    • zarkanian@sh.itjust.works ⁨20⁩ ⁨hours⁩ ago

      Esperanto still exists and there is a worldwide community of speakers.

      source
  • Kissaki@feddit.org ⁨23⁩ ⁨hours⁩ ago

    Three! Popovers? Come on man. I just wanted a peek at the article.

    source
    • zarkanian@sh.itjust.works ⁨21⁩ ⁨hours⁩ ago

      Are you not using an adblocker?

      source
      • Kissaki@feddit.org ⁨20⁩ ⁨hours⁩ ago

        I am. DNS + uBlock Origin with more than the default filters.

        source