Yea that’s the problem isn’t it. I had a great idea involving bullshit-efying my comments by editing them slowly with a LLM via long running script and repeatedly over months.
I realised that they probably don’t delete the original text on edit anyway which, as you say is probably buried in a backup someplace.
CeeBee@lemmy.world 7 months ago
It’s theoretically possible, but the issue that anyone trying to do that would run into is consistency.
How do you restore the snapshots of a database to recover deleted comments but also preserve other comments newer than the snapshot date?
The answer is that it’s nearly impossible. Not impossible, but not worth the massive monumental effort when you can just focus on existing comments which greatly outweigh any deleted ones.
skulblaka@startrek.website 7 months ago
Just collate them based on edit/deletion date… Each post will have a last-edited attribute that can be used for sorting. Even more so once the AI is bootstrapped enough to start recognizing the standard protest edit messages. At that point you hardly even need human oversight anymore, because the bot will be able to recognize “that’s a fuck spez edit, ignore that; this post looks good; that’s a Shreddit/PowerDelete edit, ignore that” and so on. Can even have it fetch the previous edit automatically when it comes across something like that, to a point where a comment removed by a PowerDelete tool is nothing more than a cover letter that states “there was once a real human-generated comment in this location”.
cyberpunk007@lemmy.ca 7 months ago
It’s a piece of cake. Some code along the lines of:
If ($user.modifyCommentRecentlyCount > 50){
Print “user is nuking comments” $comment = $previousComment }
Or some shit. It can be done quite easily, trust me.
CeeBee@lemmy.world 7 months ago
The words of every junior dev right before I have to spend a weekend undoing their crap.
I’ve been there too many times.
There are always edge cases you need to account for, and you can’t account for them until you run tests and then verify the results.
And you’d be parsing billions upon billions of records. Not a trivial thing to do when running multiple tests to verify. And ultimately for what is a trivial payoff.
You don’t screw around with infinitely invaluable prod data of your business without exhausting every single possibility of data modification.
It hurts how often I’ve heard this and how often it’s followed by a massive screw up.
cyberpunk007@lemmy.ca 7 months ago
There are so many ways this can be done that I think you are not thinking of. Say a user goes to “shreddit” (or some other similar app) their comments. They likely have thousands. On every comment edit, it’s quite easy to check the last time the users edited one of their comments. All they need is some check like checking if the last 10 consecutive comments were edited in hours or milliseconds/seconds. After that, reddit could easily just tell the user it’s editing their comments but it’s not. Like a shadowban kind of method. Another way would be at the data structure level. We don’t know what their databases and hardware are like, but I can speculate. What if each user edited comment is not an update query on a database, but an add/insert. Then all you need to do is update the live comments where the date is before the malicious date where the username=$username. Not to mention when you start talking Nimble storage and stuff like that, the storage is extremely quick to respond. Hell I would wager it didn’t even hit storage yet, probably still on some all flash cache or in memory. Another way could be at the filesystem level. Ever heard of zfs? What if each user had their own dataset or something, it’s extremely easy and quick to roll back a snapshot, or to clone the previous snapshot. There are so many ways.
At the end of the day a user is triggering this action, so we don’t necessarily need to parse “billions” of records. Just the records for a single user.