AI content on Wikipedia - found via a simple ISBN checksum calculator (39C3)

Submitted ⁨⁨1⁩ ⁨month⁩ ago⁩ by ⁨lemmydividebyzero@reddthat.com⁩ to ⁨technology@lemmy.world⁩

https://media.ccc.de/v/39c3-ai-generated-content-in-wikipedia-a-tale-of-caution

A talk from the hacker conference 39C3 on how AI generated content was identified via a simple ISBN checksum calculator (in English).

source

Comments

Sort:hotnew top

ChillCapybara@discuss.tchncs.de ⁨1⁩ ⁨month⁩ ago
TL;DW:

He wrote checksum verifier for ISBN and discovered AI generated content on Wikipedia with hallucinated sources. He used Claude to write the checksum verifier and the irony is not lost on him. He tracked down those who submitted the fake articles and determined many are doing so out of a misplaced desire to help, without an understanding of the limitations and pitfalls of using LLM gen content without verification.

source
- Saapas@piefed.zip ⁨1⁩ ⁨month⁩ ago
  What’s the irony?
  
  source
  - EncryptKeeper@lemmy.world ⁨1⁩ ⁨month⁩ ago
    He used AI to write the anti-AI tool
    
    source
    -> View More Comments
fubarx@lemmy.world ⁨1⁩ ⁨month⁩ ago
He notes that LLM vendors have been training their models on Wikipedia content. But if the content contains incorrect information and citations, you get the sort of circular (incorrect) reference that leads to misinformation.

One irony, he says, is that LLM vendors are now willing to pay for training data unpolluted by the hallucinated output their own products generate.

source
Buckshot@programming.dev ⁨1⁩ ⁨month⁩ ago
Sort of related, I was fact checking some of the content on my own website that had been provided by someone who later turned out to be less than reliable.

There was one claim I was completely unable to find a source for and suspected it was an AI hallucination. Turned to chatgpt and tried to find a source with that and it provided nearly the exact same sentence and cited my website as the source thus completing the hallucination cycle.

I just deleted it all from my site and started over.

source
errer@lemmy.world ⁨1⁩ ⁨month⁩ ago
You could just verify all ISBNs are valid on each edit, which would have some value in finding typos, but not doing that leaves this honeypot that could be used to identify AI slop accounts. Clever. Though maybe they’re wise to it now.

source
- Thorry@feddit.org ⁨1⁩ ⁨month⁩ ago
  He also points out that in the real world there are many ISBNs that are “wrong”, but are actually correct in the real world. This is because publishers don’t always understand about the checksum and just increment the ISBN when publishing a new book. In many library systems there is this checkbox next to the ISBN entry field where you can say something like “I understand this ISBN is wrong, but it is correct in the real world”.
  
  So just flagging wrong ISBNs would lead to a lot of false positives and would need specific structures to deal with that.
  
  source
  - hereiamagain@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
    That’s extremely frustrating. Like, it’s literally your job to get that number correct…
    
    People are lazy.
    
    source
    -> View More Comments
- anomnom@sh.itjust.works ⁨1⁩ ⁨month⁩ ago
  I think most of the contributors aren’t going to get wife to it, because they have no idea what they are doing is wrong.
  
  source
  - timt@lemmy.world ⁨1⁩ ⁨month⁩ ago
    *wise to it
    
    source
lemmydividebyzero@reddthat.com ⁨1⁩ ⁨month⁩ ago
People go to libraries asking for books that don’t even exist and some think that the libraries are hiding them from them, but the ISBNs are just halucinations… 😅

source
Akasazh@lemmy.world ⁨1⁩ ⁨month⁩ ago
Can someone tl;dw me? I’m curious as to how comment gets tied to isbn

source
- mjr@infosec.pub ⁨1⁩ ⁨month⁩ ago
  Not watched yet, but I suspect AI edits are using hallucinated citations with ISBNs that don’t even pass a checksum test. AI may improve on this if someone trains them about ISBNs better, but it’s cool if this sort of test weeds some slop out for now.
  
  source
  - lemmydividebyzero@reddthat.com ⁨1⁩ ⁨month⁩ ago
    Correct.
    
    source
daannii@lemmy.world ⁨1⁩ ⁨month⁩ ago
A cylindrical human centipede is a really good analogy to how LLMs work.

source
Akasazh@lemmy.world ⁨1⁩ ⁨month⁩ ago
Can someone tl;dw me? I’m curious as to how comment gets tied to isbn

source
- thethunderwolf@lemmy.dbzer0.com ⁨1⁩ ⁨month⁩ ago
  !commentmitosis@lemmy.dbzer0.com
  
  source