Why confront the glaring issues with your “revolutionary” new toy when you could just suppress information instead
A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It
Submitted 3 days ago by themachinestops@lemmy.dbzer0.com to technology@lemmy.world
https://www.404media.co/a-developer-accidentally-found-csam-in-ai-data-google-banned-him-for-it/
Comments
TheJesusaurus@sh.itjust.works 3 days ago
Kyrgizion@lemmy.world 3 days ago
This was about sending a message: “stfu or suffer the consequences”. Hence, subsequent people who encounter similar will think twice about reporting anything.
Devial@discuss.online 2 days ago
Did you even read the article ? The dude reported it anonymously, to a child protection org, not google, and his account was nuked as soon as he unzipped the data, because the content was automatically flagged.
Google didn’t even know he reported this.
cyberpunk007@lemmy.ca 1 day ago
Child sexual abuse material.
Is it just me or did anyone else know what “CSAM” was already?
chronicledmonocle@lemmy.world 1 day ago
I had no idea what the acronym was. Guess I’m just sheltered or something.
pipe01@programming.dev 1 day ago
Yeah it’s pretty common, unfortunately
TipsyMcGee@lemmy.dbzer0.com 1 day ago
I don’t quite get it, is it a wider term than child pornography or more narrow (e.g. excludes some types of materials that could be considered porn but strictly speaking doesn’t depict abuse)? The abbreviation sounds like some kind of exotic Surface to Air Missile lol
killea@lemmy.world 3 days ago
So in a just world, google would be heavily penalized for not only allowing csam on their servers, but also for violating their own tos with a customer?
shalafi@lemmy.world 2 days ago
We really don’t want that first part to be law.
Section 230 was enacted as part of the Communications Decency Act of 1996 and is a crucial piece of legislation that protects online service providers and users from being held liable for content created by third parties. It is often cited as a foundational law that has allowed the internet to flourish by enabling platforms to host user-generated content without the fear of legal repercussions for that content.
Though I’m not sure if that applies to scraping other server’s content. But I wouldn’t say it’s fair for the scraper to review everything. If we don’t like that take, then we should illegalize scraping altogether, but I’m betting there are unwanted side effects to that.
mic_check_one_two@lemmy.dbzer0.com 2 days ago
While I agree with Section 230 in theory, it is often only used in practice to protect megacorps. For example, many Lemmy instances started getting spammed by CSAM after the Reddit API migration. It was very clearly some angry redditors who were trying to shut down instances, to try and keep people on Reddit.
But individual server owners were legitimately concerned that they could be held liable for the CSAM existing on their servers, even if they were not the ones who uploaded it. The concern was that Section 230 would be thrown out the window if the instance owners were just lone devs and not massive megacorps.
Especially since federation caused content to be cached whenever a user scrolled past another instance’s posts. So even if they moderated their own server’s content heavily (which wasn’t even possible with the mod tools that existed at the time), then there was still the risk that they’d end up cacheing CSAM from other instances. It led to a lot of instances moving from federation blacklists to whitelists instead. Basically, default to not federating with an instance, unless that instance owner takes the time to jump through some hoops and promises to moderate their own shit.
vimmiewimmie@slrpnk.net 2 days ago
Not to create an argument, which isn’t my intent, as certainty there may be a thought such as, “scraping as it stands is good because of the simplification and ‘benefit’”. Which, sure, it’s easiest to wide net and absorb, to simply the concept, at least as I’m also understanding it.
Yet, maybe it is the process of scraping, and also absorbing into databases including AI, which is a worthwhile point of conversation. Maybe how we’ve been doing something isn’t the continued ‘best course’ for a situation.
Undeniably, more minutely monitoring what is scraped and stored creates large quantities, and large in scope, of questions and obstacles, but, maybe having that conversation is where things should go.
Thoughts?
killea@lemmy.world 2 days ago
Oh my, yes, you are correct. That was sort of knee jerk, as opposed to it being the reporting party’s burden somehow. I simply cannot understand the legal gymnastics needed to punish your customers for this sort of thing; I’m tired but I feel like this is not exactly an uncommon occurrence. Anyways let us all learn from my mistake and do not be rash and curtail your own freedoms.
dev_null@lemmy.ml 2 days ago
They were not only not allowing it, they immediately blocked the user’s attempt to put it on their servers and banned the user for even trying. That’s as far from allowing it as possible.
abbiistabbii@lemmy.blahaj.zone 2 days ago
This, literally the only reason I could guess is that it is to teach AI to recognise childporn, but if that is the case, why is google going it instead of like, the FBI?
gustofwind@lemmy.world 2 days ago
Who do you think the FBI would contract to do the work anyway 😬
Maybe not Google but it would sure be some private company. Our government doesn’t do stuff itself almost ever. It hires the private sector
alias_qr_rainmaker@lemmy.world 2 days ago
i know it’s really fucked up, but the FBI needs to train an AI on CSAM if it is to be able to identify it.
i’m trying to help, i have a script that takes control of your computer and basically opens the folder where all your fucked up shit is downloaded. trying to use steganography to embed the applescript in the metadata of a png
forkDestroyer@infosec.pub 2 days ago
Google isn’t the only service checking for csam. Microsoft (and other file hosting services, likely) also have methods to do this. This doesn’t mean they also host csam to detect it. I believe their checks use hash values to determine if a picture is already clocked as being in that category.
This has existed since 2009 and provides good insight on the topic, used for detecting all sorts of bad category images:
frongt@lemmy.zip 2 days ago
Google wants to be able to recognize and remove it. They don’t want the FBI all up in their business.
finitebanjo@lemmy.world 2 days ago
My dumb ass sitting here confused for a solid minute thinking CSAM was in reference to a type of artillery.
pigup@lemmy.world 2 days ago
Combined surface air munitions
llama@lemmy.zip 2 days ago
Right I thought it was cyber security something or other like API keys now duck duck go probably thinks I’m a creep
echodot@feddit.uk 1 day ago
This guy is into some really weird stuff, and not the normal henti that everyone else is asking for.
Hozerkiller@lemmy.ca 2 days ago
I feel that I assumed it was something like SCCM.
arararagi@ani.social 1 day ago
“stop noticing things” -Google
hummingbird@lemmy.world 3 days ago
It goes to show: developers should make sure they don’t make their livelihood dependent on access to Google services.
rizzothesmall@sh.itjust.works 3 days ago
Never heard that acronym before…
TheJesusaurus@sh.itjust.works 3 days ago
Not sure where it originates but it’s the preferred term in UK policing and therefore most media reporting to refer to what might have been called “CP” on the interweb in the past as CSAM. Probably because porn implies it’s art rather than crime, and also just a wider umbrella term
Zikeji@programming.dev 3 days ago
It’s also more distinct. CP has many potential definitions. CSAM only has the one I’m aware if.
rizzothesmall@sh.itjust.works 2 days ago
Lol why tf people downvoting that? Sorry I learned a new fucking thing jfc.
Deceptichum@quokk.au 3 days ago
It’s basically the only one anyone uses?
DylanMc6@lemmy.ml 2 days ago
gill o’ teens
DylanMc6@lemmy.ml 2 days ago
time for guillotines
devolution@lemmy.world 3 days ago
Gemini likes twins…
…I’ll see myself out.
billwashere@lemmy.world 23 hours ago
I imagine most of these models have all kinds of nefarious things in them, sucking up all the info they could find indiscriminately.
b_tr3e@feddit.org 3 days ago
That’s what you get for critisising AI - and righ so. I for one, welcome our new electronic overlords!
j4k3@piefed.world 3 days ago
[deleted]AngryishHumanoid@lemmynsfw.com 3 days ago
“Sign up for free access to this article.”
Devial@discuss.online 2 days ago
The article headline is wildly misleading, bordering on just a straight up lie.
Google didn’t ban the developer for reporting the material, they didn’t even know he reported it, because he did so anonymously, and to a child protection org, not Google.
Google’s automatic tools, correctly, flagged the CSAM when he unzipped the data and subsequently nuked his account.
Google’s only failure here was to not unban on his first or second appeal. And whilst that is absolutely a big failure on Google’s part, I find it very understandable that the appeals team generally speaking won’t accept “I didn’t know the folder I uploaded contained CSAM” as a valid ban appeal reason.
It’s also kind of insane how this article somehow makes a bigger deal out of this devolper being temporarily banned by Google, than it does of the fact that hundreds of CSAM images were freely available online and openly sharable by anyone, and to anyone, for god knows how long.
forkDestroyer@infosec.pub 2 days ago
I’m being a bit extra but…
Your statement:
The article headline:
The general story in reference to the headline:
The article headline is accurate if you interpret it as
“A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It” (“it” being “csam”).
The article headline is inaccurate if you interpret it as
“A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It” (“it” being “reporting csam”).
I read it as the former, because the action of reporting isn’t listed in the headline at all.
^___^
Blubber28@lemmy.world 2 days ago
This is correct. However, many websites/newspapers/magazines/etc. love to get more clicks with sensational headlines that are technically true, but can be easily interpreted as something much more sinister/exciting. This headline is a great example of it. While you interpreted it correctly, or claim to at least, there will be many people that initially interpret it the second way you described. Me among them, admittedly. And the people deciding on the headlines are very much aware of that. Therefore, the headline can absolutely be deemed misleading, for while it is absolutely a correct statement, there are less ambiguous ways to phrase it.
WildPalmTree@lemmy.world 1 day ago
The inclusion of “found” indicates that it is important to the action taken by Google, would be my interpretation.
MangoCats@feddit.it 2 days ago
My experience of Google and the unban process is: it doesn’t exist, never works, doesn’t even escalate to a human evaluator in a 3rd world sweatshop - the algorithm simply ignores appeals inscrutably.
ulterno@programming.dev 1 day ago
Another point is, the reason Google’s AI is able to identify CSAM is because it has that in its training data, flagged as such.
In that case, it would have detected the training material as ~100% match.
I don’t get though, how it ended up being openly available as if it were properly tagged, they would probably exclude it from the open-sourced data. And now I see it would also not be viable to have an open-source, openly scrutinisable AI deployment for CSAM detection for the same reason.
And while some governmental body got a lot of backlash for trying to implement such an AI thing on chat stuff, Google gets to do so all it wants because it’s E-Mail/GDrive and all on their servers and you can’t expect privacy.
arararagi@ani.social 1 day ago
You would think, but none of these companies actually make their own dataset, they buy from third parties.
cupcakezealot@piefed.blahaj.zone 2 days ago
so they got mad because he reported it to an agency that actually fights csam instead of them so they can sweep it under the rug?
Devial@discuss.online 2 days ago
They didn’t get mad. Did you even read my comment ?
Cybersteel@lemmy.world 2 days ago
We need to block access to the web to certain known actors and tie ipaddresses to IDs, names, passport number. For the children.
kylian0087@lemmy.dbzer0.com 2 days ago
Oh hell no. That’s a privacy nightmare to he abused like hell.
Also that wouldn’t work at all what you say.
tetris11@feddit.uk 2 days ago
Also, pay me exhorbitant amounts of tax-payer money to ineffectually enforce this. For the children.
jjlinux@lemmy.zip 2 days ago
Fuck you, and everything you stand for.
NoForwardslashS@sopuli.xyz 2 days ago
No need to go that far. If we just require one valid photo ID for TikTok, the children will finally be safe.
bobzer@lemmy.zip 2 days ago
ATM machine
Goodlucksil@lemmy.dbzer0.com 2 days ago
CSAM stands for “material”. Adding “image” specifies what kind of material it is.
Devial@discuss.online 2 days ago
Which of the letters in CSAM stand for images then ?
ayyy@sh.itjust.works 2 days ago
A 404Media headline? The place exclusively staffed by former BuzzFeed/Cracked employees? Noooo, couldn’t be.