Comment

Comment on Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data

how do we know the ChatGPT models haven’t crawled the publicly accessible breach forums where private data is known to leak? I imagine the crawler models would have some ‘follow webpage-attachments and then crawl’ function. surely they have crawled all sorts of leaked data online but also genuine question bc i haven’t done any previous research.

source

Sort:hotnew top

d3Xt3r@lemmy.nz ⁨1⁩ ⁨year⁩ ago
We don’t, but from what I’ve seen, those forums either require registration or payment to access the data, and/or some special means to download it (eg: bittorrent). A simple web crawler wouldn’t be able to access it.

source