Comment on OpenAI strikes Reddit deal to train its AI on your posts
db2@lemmy.world 8 months ago
Not my posts. Go ahead, look at what remains. The rest was edited and then deleted.
Fuck you, Steve. Right in the ass.
Comment on OpenAI strikes Reddit deal to train its AI on your posts
db2@lemmy.world 8 months ago
Not my posts. Go ahead, look at what remains. The rest was edited and then deleted.
Fuck you, Steve. Right in the ass.
cyberpunk007@lemmy.ca 8 months ago
If only snapshots and backups were a thing…
CeeBee@lemmy.world 8 months ago
It’s theoretically possible, but the issue that anyone trying to do that would run into is consistency.
How do you restore the snapshots of a database to recover deleted comments but also preserve other comments newer than the snapshot date?
The answer is that it’s nearly impossible. Not impossible, but not worth the massive monumental effort when you can just focus on existing comments which greatly outweigh any deleted ones.
skulblaka@startrek.website 8 months ago
Just collate them based on edit/deletion date… Each post will have a last-edited attribute that can be used for sorting. Even more so once the AI is bootstrapped enough to start recognizing the standard protest edit messages. At that point you hardly even need human oversight anymore, because the bot will be able to recognize “that’s a fuck spez edit, ignore that; this post looks good; that’s a Shreddit/PowerDelete edit, ignore that” and so on. Can even have it fetch the previous edit automatically when it comes across something like that, to a point where a comment removed by a PowerDelete tool is nothing more than a cover letter that states “there was once a real human-generated comment in this location”.
cyberpunk007@lemmy.ca 8 months ago
It’s a piece of cake. Some code along the lines of:
If ($user.modifyCommentRecentlyCount > 50){
Print “user is nuking comments” $comment = $previousComment }
Or some shit. It can be done quite easily, trust me.
CeeBee@lemmy.world 8 months ago
The words of every junior dev right before I have to spend a weekend undoing their crap.
I’ve been there too many times.
There are always edge cases you need to account for, and you can’t account for them until you run tests and then verify the results.
And you’d be parsing billions upon billions of records. Not a trivial thing to do when running multiple tests to verify. And ultimately for what is a trivial payoff.
You don’t screw around with infinitely invaluable prod data of your business without exhausting every single possibility of data modification.
It hurts how often I’ve heard this and how often it’s followed by a massive screw up.
Todgerdickinson@lemmy.world 8 months ago
Yea that’s the problem isn’t it. I had a great idea involving bullshit-efying my comments by editing them slowly with a LLM via long running script and repeatedly over months.
I realised that they probably don’t delete the original text on edit anyway which, as you say is probably buried in a backup someplace.
AceSLS@ani.social 8 months ago
I don’t think it is in backups only. My guess is they store your full edit history for each comment/post/whatever. Newest one will be shown on the frontend, rest is for data vampires
cyberpunk007@lemmy.ca 8 months ago
This is it exactly. Edits to use are “changed”. To the back end it’s just an iteration while the rest still exist.