A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It

⁨642⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨months⁩ ago⁩ by ⁨themachinestops@lemmy.dbzer0.com⁩ to ⁨technology@lemmy.world⁩

https://www.404media.co/a-developer-accidentally-found-csam-in-ai-data-google-banned-him-for-it/

source

Comments

Sort:hotnew top

Devial@discuss.online ⁨2⁩ ⁨months⁩ ago
The article headline is wildly misleading, bordering on just a straight up lie.

Google didn’t ban the developer for reporting the material, they didn’t even know he reported it, because he did so anonymously, and to a child protection org, not Google.

Google’s automatic tools, correctly, flagged the CSAM when he unzipped the data and subsequently nuked his account.

Google’s only failure here was to not unban on his first or second appeal. And whilst that is absolutely a big failure on Google’s part, I find it very understandable that the appeals team generally speaking won’t accept “I didn’t know the folder I uploaded contained CSAM” as a valid ban appeal reason.

It’s also kind of insane how this article somehow makes a bigger deal out of this devolper being temporarily banned by Google, than it does of the fact that hundreds of CSAM images were freely available online and openly sharable by anyone, and to anyone, for god knows how long.

source
- forkDestroyer@infosec.pub ⁨2⁩ ⁨months⁩ ago
  I’m being a bit extra but…
  
  Your statement:
  
  The article headline is wildly misleading, bordering on being just a straight up lie.
  
  The article headline:
  
  A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It
  
  The general story in reference to the headline:
  
  He found csam in a known AI dataset, a dataset which he stored in his account.
  
  Google banned him for having this data in his account.
  
  The article mentions that he tripped the automated monitoring tools.
  
  The article headline is accurate if you interpret it as
  
  “A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It” (“it” being “csam”).
  
  The article headline is inaccurate if you interpret it as
  
  “A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It” (“it” being “reporting csam”).
  
  I read it as the former, because the action of reporting isn’t listed in the headline at all.
  
  ^___^
  
  source
  - Blubber28@lemmy.world ⁨2⁩ ⁨months⁩ ago
    This is correct. However, many websites/newspapers/magazines/etc. love to get more clicks with sensational headlines that are technically true, but can be easily interpreted as something much more sinister/exciting. This headline is a great example of it. While you interpreted it correctly, or claim to at least, there will be many people that initially interpret it the second way you described. Me among them, admittedly. And the people deciding on the headlines are very much aware of that. Therefore, the headline can absolutely be deemed misleading, for while it is absolutely a correct statement, there are less ambiguous ways to phrase it.
    
    source
    -> View More Comments
  - WildPalmTree@lemmy.world ⁨2⁩ ⁨months⁩ ago
    The inclusion of “found” indicates that it is important to the action taken by Google, would be my interpretation.
    
    source
- MangoCats@feddit.it ⁨2⁩ ⁨months⁩ ago
  
  Google’s only failure here was to not unban on his first or second appeal.
  
  My experience of Google and the unban process is: it doesn’t exist, never works, doesn’t even escalate to a human evaluator in a 3rd world sweatshop - the algorithm simply ignores appeals inscrutably.
  
  source
- cupcakezealot@piefed.blahaj.zone ⁨2⁩ ⁨months⁩ ago
  so they got mad because he reported it to an agency that actually fights csam instead of them so they can sweep it under the rug?
  
  source
  - Devial@discuss.online ⁨2⁩ ⁨months⁩ ago
    They didn’t get mad. Did you even read my comment ?
    
    source
    -> View More Comments
- Cybersteel@lemmy.world ⁨2⁩ ⁨months⁩ ago
  We need to block access to the web to certain known actors and tie ipaddresses to IDs, names, passport number. For the children.
  
  source
  - kylian0087@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
    Oh hell no. That’s a privacy nightmare to he abused like hell.
    
    Also that wouldn’t work at all what you say.
    
    source
    -> View More Comments
  - tetris11@feddit.uk ⁨2⁩ ⁨months⁩ ago
    Also, pay me exhorbitant amounts of tax-payer money to ineffectually enforce this. For the children.
    
    source
  - jjlinux@lemmy.zip ⁨2⁩ ⁨months⁩ ago
    Fuck you, and everything you stand for.
    
    source
    -> View More Comments
  - NoForwardslashS@sopuli.xyz ⁨2⁩ ⁨months⁩ ago
    No need to go that far. If we just require one valid photo ID for TikTok, the children will finally be safe.
    
    source
- bobzer@lemmy.zip ⁨2⁩ ⁨months⁩ ago
  
  CSAM images
  
  ATM machine
  
  source
  - Goodlucksil@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
    CSAM stands for “material”. Adding “image” specifies what kind of material it is.
    
    source
    -> View More Comments
  - Devial@discuss.online ⁨2⁩ ⁨months⁩ ago
    Which of the letters in CSAM stand for images then ?
    
    source
    -> View More Comments
- ulterno@programming.dev ⁨2⁩ ⁨months⁩ ago
  Another point is, the reason Google’s AI is able to identify CSAM is because it has that in its training data, flagged as such.
  
  In that case, it would have detected the training material as ~100% match.
  
  I don’t get though, how it ended up being openly available as if it were properly tagged, they would probably exclude it from the open-sourced data. And now I see it would also not be viable to have an open-source, openly scrutinisable AI deployment for CSAM detection for the same reason.
  
  And while some governmental body got a lot of backlash for trying to implement such an AI thing on chat stuff, Google gets to do so all it wants because it’s E-Mail/GDrive and all on their servers and you can’t expect privacy.
  
  source
  - arararagi@ani.social ⁨2⁩ ⁨months⁩ ago
    You would think, but none of these companies actually make their own dataset, they buy from third parties.
    
    source
    -> View More Comments
- ayyy@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
  
  The article headline is wildly misleading, bordering on being just a straight up lie.
  
  A 404Media headline? The place exclusively staffed by former BuzzFeed/Cracked employees? Noooo, couldn’t be.
  
  source
TheJesusaurus@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
Why confront the glaring issues with your “revolutionary” new toy when you could just suppress information instead

source
- Kyrgizion@lemmy.world ⁨2⁩ ⁨months⁩ ago
  This was about sending a message: “stfu or suffer the consequences”. Hence, subsequent people who encounter similar will think twice about reporting anything.
  
  source
  - Devial@discuss.online ⁨2⁩ ⁨months⁩ ago
    Did you even read the article ? The dude reported it anonymously, to a child protection org, not google, and his account was nuked as soon as he unzipped the data, because the content was automatically flagged.
    
    Google didn’t even know he reported this.
    
    source
    -> View More Comments
killea@lemmy.world ⁨2⁩ ⁨months⁩ ago
So in a just world, google would be heavily penalized for not only allowing csam on their servers, but also for violating their own tos with a customer?

source
- shalafi@lemmy.world ⁨2⁩ ⁨months⁩ ago
  We really don’t want that first part to be law.
  
  Section 230 was enacted as part of the Communications Decency Act of 1996 and is a crucial piece of legislation that protects online service providers and users from being held liable for content created by third parties. It is often cited as a foundational law that has allowed the internet to flourish by enabling platforms to host user-generated content without the fear of legal repercussions for that content.
  
  Though I’m not sure if that applies to scraping other server’s content. But I wouldn’t say it’s fair for the scraper to review everything. If we don’t like that take, then we should illegalize scraping altogether, but I’m betting there are unwanted side effects to that.
  
  source
  - mic_check_one_two@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
    While I agree with Section 230 in theory, it is often only used in practice to protect megacorps. For example, many Lemmy instances started getting spammed by CSAM after the Reddit API migration. It was very clearly some angry redditors who were trying to shut down instances, to try and keep people on Reddit.
    
    But individual server owners were legitimately concerned that they could be held liable for the CSAM existing on their servers, even if they were not the ones who uploaded it. The concern was that Section 230 would be thrown out the window if the instance owners were just lone devs and not massive megacorps.
    
    Especially since federation caused content to be cached whenever a user scrolled past another instance’s posts. So even if they moderated their own server’s content heavily (which wasn’t even possible with the mod tools that existed at the time), then there was still the risk that they’d end up cacheing CSAM from other instances. It led to a lot of instances moving from federation blacklists to whitelists instead. Basically, default to not federating with an instance, unless that instance owner takes the time to jump through some hoops and promises to moderate their own shit.
    
    source
  - vimmiewimmie@slrpnk.net ⁨2⁩ ⁨months⁩ ago
    Not to create an argument, which isn’t my intent, as certainty there may be a thought such as, “scraping as it stands is good because of the simplification and ‘benefit’”. Which, sure, it’s easiest to wide net and absorb, to simply the concept, at least as I’m also understanding it.
    
    Yet, maybe it is the process of scraping, and also absorbing into databases including AI, which is a worthwhile point of conversation. Maybe how we’ve been doing something isn’t the continued ‘best course’ for a situation.
    
    Undeniably, more minutely monitoring what is scraped and stored creates large quantities, and large in scope, of questions and obstacles, but, maybe having that conversation is where things should go.
    
    Thoughts?
    
    source
  - killea@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Oh my, yes, you are correct. That was sort of knee jerk, as opposed to it being the reporting party’s burden somehow. I simply cannot understand the legal gymnastics needed to punish your customers for this sort of thing; I’m tired but I feel like this is not exactly an uncommon occurrence. Anyways let us all learn from my mistake and do not be rash and curtail your own freedoms.
    
    source
- dev_null@lemmy.ml ⁨2⁩ ⁨months⁩ ago
  They were not only not allowing it, they immediately blocked the user’s attempt to put it on their servers and banned the user for even trying. That’s as far from allowing it as possible.
  
  source
- abbiistabbii@lemmy.blahaj.zone ⁨2⁩ ⁨months⁩ ago
  This, literally the only reason I could guess is that it is to teach AI to recognise childporn, but if that is the case, why is google going it instead of like, the FBI?
  
  source
  - gustofwind@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Who do you think the FBI would contract to do the work anyway 😬
    
    Maybe not Google but it would sure be some private company. Our government doesn’t do stuff itself almost ever. It hires the private sector
    
    source
    -> View More Comments
  - alias_qr_rainmaker@lemmy.world ⁨2⁩ ⁨months⁩ ago
    i know it’s really fucked up, but the FBI needs to train an AI on CSAM if it is to be able to identify it.
    
    i’m trying to help, i have a script that takes control of your computer and basically opens the folder where all your fucked up shit is downloaded. trying to use steganography to embed the applescript in the metadata of a png
    
    source
    -> View More Comments
  - forkDestroyer@infosec.pub ⁨2⁩ ⁨months⁩ ago
    Google isn’t the only service checking for csam. Microsoft (and other file hosting services, likely) also have methods to do this. This doesn’t mean they also host csam to detect it. I believe their checks use hash values to determine if a picture is already clocked as being in that category.
    
    This has existed since 2009 and provides good insight on the topic, used for detecting all sorts of bad category images:
    
    technologycoalition.org/…/the-tech-coalition-empo…
    
    source
  - frongt@lemmy.zip ⁨2⁩ ⁨months⁩ ago
    Google wants to be able to recognize and remove it. They don’t want the FBI all up in their business.
    
    source
    -> View More Comments
finitebanjo@lemmy.world ⁨2⁩ ⁨months⁩ ago
My dumb ass sitting here confused for a solid minute thinking CSAM was in reference to a type of artillery.

source
- pigup@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Combined surface air munitions
  
  source
- llama@lemmy.zip ⁨2⁩ ⁨months⁩ ago
  Right I thought it was cyber security something or other like API keys now duck duck go probably thinks I’m a creep
  
  source
  - echodot@feddit.uk ⁨2⁩ ⁨months⁩ ago
    This guy is into some really weird stuff, and not the normal henti that everyone else is asking for.
    
    source
- Hozerkiller@lemmy.ca ⁨2⁩ ⁨months⁩ ago
  I feel that I assumed it was something like SCCM.
  
  source
cyberpunk007@lemmy.ca ⁨2⁩ ⁨months⁩ ago
Child sexual abuse material.

Is it just me or did anyone else know what “CSAM” was already?

source
- pipe01@programming.dev ⁨2⁩ ⁨months⁩ ago
  Yeah it’s pretty common, unfortunately
  
  source
  - TipsyMcGee@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
    [deleted]
    source
    -> View More Comments
- chronicledmonocle@lemmy.world ⁨2⁩ ⁨months⁩ ago
  I had no idea what the acronym was. Guess I’m just sheltered or something.
  
  source
hummingbird@lemmy.world ⁨2⁩ ⁨months⁩ ago
It goes to show: developers should make sure they don’t make their livelihood dependent on access to Google services.

source
rizzothesmall@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
Never heard that acronym before…

source
- TheJesusaurus@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
  Not sure where it originates but it’s the preferred term in UK policing and therefore most media reporting to refer to what might have been called “CP” on the interweb in the past as CSAM. Probably because porn implies it’s art rather than crime, and also just a wider umbrella term
  
  source
  - Zikeji@programming.dev ⁨2⁩ ⁨months⁩ ago
    It’s also more distinct. CP has many potential definitions. CSAM only has the one I’m aware if.
    
    source
    -> View More Comments
- rizzothesmall@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
  Lol why tf people downvoting that? Sorry I learned a new fucking thing jfc.
  
  source
- Deceptichum@quokk.au ⁨2⁩ ⁨months⁩ ago
  It’s basically the only one anyone uses?
  
  source
arararagi@ani.social ⁨2⁩ ⁨months⁩ ago
“stop noticing things” -Google

source
DylanMc6@lemmy.ml ⁨2⁩ ⁨months⁩ ago
gill o’ teens

source
DylanMc6@lemmy.ml ⁨2⁩ ⁨months⁩ ago
time for guillotines

source
devolution@lemmy.world ⁨2⁩ ⁨months⁩ ago
Gemini likes twins…

…I’ll see myself out.

source
b_tr3e@feddit.org ⁨2⁩ ⁨months⁩ ago
That’s what you get for critisising AI - and righ so. I for one, welcome our new electronic overlords!

source
- b_tr3e@feddit.org ⁨2⁩ ⁨months⁩ ago
  cough: https://knowyourmeme.com/memes/i-for-one-welcome-our-new-insect-overlords
  
  source
j4k3@piefed.world ⁨2⁩ ⁨months⁩ ago
[deleted]
source
- minorkeys@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Me stupid. Pls dumbsplain.
  
  source
  - j4k3@piefed.world ⁨2⁩ ⁨months⁩ ago
    [deleted]
    source
    -> View More Comments
billwashere@lemmy.world ⁨2⁩ ⁨months⁩ ago
I imagine most of these models have all kinds of nefarious things in them, sucking up all the info they could find indiscriminately.

source
AngryishHumanoid@lemmynsfw.com ⁨2⁩ ⁨months⁩ ago
“Sign up for free access to this article.”

source