The developer is to blame. Using a cutting edge tool irresponsibly. I have made mistakes using AI to help coding as well, never this bad though. Blaming AI would be like blaming the hammer a roofer was using to hammer nails and slamming their finger accidentally with it. You don’t blame the hammer, you blame the negligence of the roofer.
Claude Code deletes developers' production setup, including its database and snapshots — 2.5 years of records were nuked in an instant
Submitted 10 hours ago by throws_lemy@lemmy.nz to technology@lemmy.world
Comments
jaykrown@lemmy.world 45 minutes ago
pHr34kY@lemmy.world 50 minutes ago
The lesson: AI cannot bride an air-gapped backup. This could all be prevented with a crappy portable hard drive from costco.
GaumBeist@lemmy.ml 37 minutes ago
Nobody wants to point out that Alexei Grigorev changes to being named Gregory after 2 paragraphs?
Slop journalism at its sloppiest. I wouldn’t be surprised to find out that this story was entorely fabricated.
EndlessNightmare@reddthat.com 43 minutes ago
<insert Padme meme>: You had a backup, right?
coalie@piefed.zip 9 hours ago
athatet@lemmy.zip 4 hours ago
Honestly. At this point, after it having happened to multiple people, multiple times, this is the only appropriate response.
SapphironZA@sh.itjust.works 5 hours ago
We used to say Raid is not a backup. Its a redundancy
Snapshots are not a backup. Its a system restore point.
Only something offsite, off system and only accessible with seperate authentication detauls, is a backup.
daychilde@lemmy.world 5 hours ago
AND something tested to restore successfully, otherwise it’s just unknown data that might or might not work.
(i.e. reinforcing your point, no disagreements)
mic_check_one_two@lemmy.dbzer0.com 4 hours ago
AKA Schrödinger’s Backup. Until you have successfully restored from a backup, it is just an amorphous blob of data that may or may not be valid.
I say this as someone who has had backups silently fail. For instance, just yesterday, I had a managed network switch generate an invalid config file for itself. I was making a change on the switch, and saved a backup of the existing settings before changing anything. That way I could easily reset the switch to default and push the old settings to it, if the changes I made broke things. Sure enough, the change I made broke something, so I performed a factory reset and went to upload that backup I had saved like 20 minutes prior… When I tried to restore settings after the factory reset, the switch couldn’t read the file that it had generated like 20 minutes earlier.
So I was stuck manually restoring the switch’s settings, and what should have been a quick 2 minute “hold the reset button and push the settings file once it has rebooted” job turned into a 45 minute long game of “find the difference between these two photos” for every single page in the settings.
Whitebrow@lemmy.world 5 hours ago
Schrödinger backup
tetris11@feddit.uk 4 hours ago
3-2-1 Backup Rule: Three copies of data at two different types of storage media, with 1 copy offsite
SreudianFlip@sh.itjust.works 2 hours ago
Fukan yes
- D\L all assets locally
- proper 3-2-1 of local machines
- duty roster of other contributors with same backups
- automate and have regular checks as part of production
- also sandbox the stochastic parrot
HugeNerd@lemmy.ca 3 hours ago
A LTO drive with a non-consumer interface?
prenatal_confusion@feddit.org 3 hours ago
We still say that.
OrteilGenou@lemmy.world 3 hours ago
I remember back when I first started seeing a DR plan with three tiers of restore, 1 hour, 12 hours or 72 hours. I knew that to 1 hour meant a simple redirect to a DB partition that was a real time copy of the active DB, and twelve hours meant that failed, so the twelve hours was a restore point exercise that would mean some data loss, but less than one hour, or something like that.
I had never heard of 72 hours and so raised a question in the meeting. 72 hours meant having physical tapes shipped to the data center, and I believe meant up to 12 (though it could have been 24) hours of data lost. I was impressed by this, because the idea of having a job that ran either daily or twice daily that created tape backups was completely new to me.
This was in the early aughts. Not sure if tapes are still used…
aesthelete@lemmy.world 4 hours ago
Stop giving chat bots tools with this kind of access.
Modern_medicine_isnt@lemmy.world 2 hours ago
Wrong answer. If you don’t give them access, the alternative (ruling out not using AI because leadership will never go for that) is to hire high school kids to take a task from a manager, ask the ai to do it, then do what the AI says repeatedly to iterate to the solution. The problem with that alt is that it is no better than giving the ai access, and it leaves you with no senior tech people. Instead, you give it access, but only give senior tech people access to the AI. Ones who would know to tell the AI to have a backup of the database, one designed to not let you delete it without multiple people signing off.
Senior tech people aren’t going to spend thier time trying things an AI needs tried to find the solution. So if you don’t give it access, they won’t use it, and eventually they will all be gone. Then you are even further up shit creek than you are now.
The answer overall, is smarter people talking to the AI, and guardrails to stop a single point of failure. The later is nothing new.
vithigar@lemmy.ca 1 hour ago
What is this insane rambling?
The alternative is that the only thing with access to make changes in your production environment is the CI pipeline that deploys your production environment.
Neither the AI, nor anything else on the developers machine, should have access to make production changes.
MartianRecon@lemmus.org 1 hour ago
The answer is no AI. It’s really simple. The costs for ai are not worth the output.
Shanmugha@lemmy.world 1 hour ago
Nah. As a tech people, I am not going to give an llm write access to anything in production, period
Matty_r@programming.dev 1 hour ago
I’m in favour of hiring kids to figure out the solution through iteration and doing web searches etc. If they fuck up, then they learn and eventually become better at their job - maybe even becoming a Senior themselves eventually.
I get what you’re saying - Seniors are more likely to use the tools more effectively, but there are many cases of the AI not doing what its told. Its not repeatably consistent like a bash script.
People are better - always.
minorkeys@lemmy.world 3 hours ago
No risk, no reward. People are desperate for these tools to help them success.
HugeNerd@lemmy.ca 3 hours ago
Success bigly, even.
fubarx@lemmy.world 9 hours ago
Given that the infrastructure description included the DataTalks.Club website, this resulted in a full wipe of the setup for both sites, including a database with 2.5 years of records, and database snapshots that Grigorev had counted on as backups. The operator had to contact Amazon Business support, which helped restore the data within about a day.
Non-story. He let Terraform zap his production site without offsite backups. But then support restored it all back.
I’d be more alarmed that a ‘destroy’ command is reversible.
CubitOom@infosec.pub 8 hours ago
Distributed Non Consensual Backup
db2@lemmy.world 9 hours ago
Never assume anything is gone when you hit delete.
Vlyn@lemmy.zip 7 hours ago
Except when it’s your own data, then usually you’re fucked.
zr0@lemmy.dbzer0.com 7 hours ago
For technical reasons, you never immediately delete records, as it is computationally very intense.
For business reasons, you never want to delete anything at all, because data = money.
jaybone@lemmy.zip 5 hours ago
Back in the day, before virtualized services was all “the cloud” as it is today, if you were re-provisioning storage hardware resources that might be used by another customer, you would “scrub” disks by writing from /dev/random and /dev/null to the disk. If you somehow kept that shit around and something “leaked”, that was a big boo boo and a violation of your service agreement and customer would sue the fuck out of you. But now you just contact support and they have a copy laying around. 🤷
wewbull@feddit.uk 5 hours ago
Retaining data can mean violating legal obligations. Hidden backups can be a lawyers playground.
brbposting@sh.itjust.works 7 hours ago
Thought it could be a liability sometimes! Maybe that ship sailed
rizzothesmall@sh.itjust.works 37 minutes ago
A developer having the ability to accidentally erase your production db is pretty careless.
An AI agent having the ability to “accidentally” erase your production db is fucking stupid as all fuck.
An AI agent having the ability to accidentally erase your production db and somehow also all the backup media? That requires a special course on complete dribbling fuckwittery.
just_another_person@lemmy.world 9 hours ago
Whoever did this was incredibly lazy. What you using an agent to run your Terraform commands for you in the first place if it’s not part of some automation? You’re saving yourself, what, 15 seconds tops? You deserve this kind of thing for being like this.
PabloSexcrowbar@piefed.social 8 hours ago
Yeah, and to do that without some sort of DR in place is peak hubris.
lobut@lemmy.ca 7 hours ago
Our DR process is a slow POS … takes far too long to back up and redeploy and set up again.
I was the one that designed it. I pray I’ll never have to use it.
kautau@lemmy.world 6 hours ago
It’s a grifter running a site called “aishippinglabs.com” which charges 500 euros for a “closed community of likeminded individuals”. He’s selling ai slop and a discord channel to other idiots who will do exactly shit like this with little understanding of what is going on
SeductiveTortoise@piefed.social 6 hours ago
It’s an intelligence test. And if you take it, you’ve failed.
criss_cross@lemmy.world 3 hours ago
Were they also into crypto 7 years ago?
eleitl@lemmy.zip 7 hours ago
“and database snapshots that Grigorev had counted on as backups” – yes, this is exactly how you run “production”.
Nighed@feddit.uk 5 hours ago
With some of the cloud providers, their built in backups are linked to the resource. So even if you have super duper geo-zone redundant backups for years, they still get nuked if you drop the server.
It’s always felt a bit stupid, but the backups can still normally be restored by support.
Sam_Bass@lemmy.world 2 hours ago
But ai is s good thing! /s
Yaztromo@lemmy.world 57 minutes ago
AI is like a circular saw. Are circular saws useful?
Of course.
Can you cut your entire hand off if you don’t use it correctly? Absolutely.
Sam_Bass@lemmy.world 30 minutes ago
And just like a circular saw, its only useful for a finite set of situations.
SaharaMaleikuhm@feddit.org 1 hour ago
Filters out the biggest fools it seems.
Poppa_Mo@lemmy.world 5 hours ago
Whoever gave it access to production is a complete moron.
tempest@lemmy.ca 4 hours ago
If you’ve ever used it you can see how easily it can happen.
At first you Sandbox box it and your careful. Then after a while the sand box is a bit of a pain so you just run it as is. Then it asks for permission a 1000 times to do something and at first you carefully check each command but after a while you just skim them and eventually, sure you can run ‘psql *’ to debug some query on the dev instance…
It’s one of the major problems with the “full self driving” stuff as well. It’s right often enough that eventually you get complacent or your attention drifts elsewhere.
This kind of stuff happened before the LLM coding agents existed, they have just supercharged the speed and as a result increased the amount of damage that can be done before it’s noticed.
There are already a bunch of failures in place for something like this to happen. Having the prod credentials available etc etc it’s just now instead of rolling the dice every couple weeks your LLM is rolling them every 20s.
BorgDrone@feddit.nl 2 hours ago
If you’ve ever used it you can see how easily it can happen.
How could this happen easily? A regular developer shouldn’t even have access to production outside of exceptional circumstances (e.g. diagnosing a production issue). Certainly not as part of the normal dev process.
ExLisper@lemmy.curiana.net 3 hours ago
If you’ve ever used it you can see how easily it can happen.
Yes, I can see how it can easily happen to stupid lazy people.
BrianTheeBiscuiteer@lemmy.world 6 hours ago
Whether human, AI, or code, you don’t give a single entity this much power in production.
LiveLM@lemmy.zip 4 hours ago
but should serve as a cautionary tale.
Jesus there’s a headline like this every month, how many tales people need to learn???
Atropos@lemmy.world 3 hours ago
I am approaching caution critical mass.
Once the threshold is hit, I buy some solar panels and become an off grid farmer.
Jankatarch@lemmy.world 3 hours ago
Caution Treshold!
Modern_medicine_isnt@lemmy.world 2 hours ago
Have you met software. Nearly all of it is a cautionary tale. Even before AI. So this is just business as usual for the software industry.
anon_8675309@lemmy.world 7 hours ago
Mistakes happen. But how do you go 2.5 years without proper backups?
4grams@awful.systems 6 hours ago
It’s so easy. I can’t tell you how many “backed up” environments I’ve run into that simply cannot be restored. Often people set them up, but never test them, and assume the snaps are working.
Backups are typically only thought about when you need them, and by then it’s often too late. Real backups need testing and validation frequently, they need remote, off-site storage, with a process to restore that as well.
Been doing this shit for 30 years and people will never learn.
bss03@infosec.pub 4 hours ago
I was a professional, and I didn’t have a backup of my personal system for about 2 decades. I just didn’t have another 4TiB of storage to copy my media library onto. I’m now on backblaze, but there was a long time there when I didn’t not have a backup even tho I knew better.
Also, even in a professional setting, I’ve seen plenty of “production support” systems that didn’t have a backup because they grew ad-hoc, weren’t the “core business”, and no one both recognized and spoke up about, how important they were until after some outage. There’s virtually never a test-restore schedule with such systems, so the backups are always somewhat suspect anyway.
It’s very easy to find you (or your organization) without a backup, even if you “know better”.
MountingSuspicion@reddthat.com 3 hours ago
Thank you for this comment. I have backups I tested on implementation and rummaged through two years ago after a weird corruption issue, but not once since. I still get alerts about them, so I just assume they’re fine, but first thing Monday I’m gonna test them. I feel stupid for not having implemented regular checks already, but will do so now.
Deestan@lemmy.world 9 hours ago
We don’t need cautionary tales about how drinking bleach caused intestinal damage.
The people needing the caution got it in spades and went off anyway.
Or maybe the cautionary tale is to take caution dealing with the developers in question, as they are dangerously inept.
Scipitie@lemmy.dbzer0.com 9 hours ago
Yeah this is beyond ridiculous to blame anything or anyone else.
I mean accidently letting lose an autonomous non-tested non-guarailed tool in my dev environment… Well tough luck, shit, something for a good post mortem to learn from.
Having an infrastructure that allowed a single actor to cause this damage? This shouldn’t even be possible for a malicious human from within the system this easily.
eleitl@lemmy.zip 7 hours ago
Most devs are ops-tarded.
msage@programming.dev 6 hours ago
Even dev-impaired
Bieren@lemmy.today 2 hours ago
Ai or not. This is on the person who gave it prod access. I don’t care if the dev was running CC in yolo mode, not paying attention to it or CC went completely rogue. Why would you give it prod access, this is human error.
melfie@lemy.lol 4 hours ago
First time anything this ever happened and it’s just a freak accident. Nobody could’ve predicted this.
bss03@infosec.pub 4 hours ago
/s ?
plateee@piefed.social 7 hours ago
Jesus Christ people. Terraform has a plan output option to allow for review prior to an apply. It’s trivial to make a script that’ll throw the json output into something like terraform visual if you don’t like the diff format.
I’ve fucked up stuff with Terraform, but just once before I switched to a rudimentary script to force a pause, review, and then apply.
cmhe@lemmy.world 7 hours ago
Don’t worry, review was done By an LLM AS well. ;)
zebidiah@lemmy.ca 7 hours ago
tl;dr
daychilde@lemmy.world 5 hours ago
Do you realize how difficult it was to upvote that comment? I viscerally hate that. lol. But the sarcasm is perfect here, of course. But I still hate you <3
ColeSloth@discuss.tchncs.de 7 hours ago
If your dumb fucking ass let an ai near your work AND you didn’t have any recent backups that it couldnt have access to; you’re really extra fucking stupid.
Deestan@lemmy.world 9 hours ago
According to mousetrap manufacturers, putting your tongue on a mousetrap causes you to become 33% sexier, taller and win the lottery twice a week.
While some experts have argued caution that it may cause painful swelling, bleeding, injury, and distress, and that the benefits are yet to be unproven, affiliated marketers all over the world paint a different, sexier picture.
However, it is not working out for everyone. Gregory here put his tongue in the mousetrap the wrong way and suffered painful swelling, bleeding, injury and distress while not getting taller or sexier.
Gregory considers this a learning experience, and hopes this will serve as a cautionary tale for other people putting their tongue on mousetraps: From now on he will use the newest extra-strength mousetrap and take precautions like Hope Really Hard that it works when putting his tongue in the mousetrap.
peopleproblems@lemmy.world 4 hours ago
The real reason I hate using LLMs is because I have to think like a social human non software engineer.
For whatever fucking reason, I just can’t get these things to be useful. And then I see idiots connecting an LLM to production like this.
Is that the problem? I literally can’t turn my brain off. The only other nearly universal group of people that seems opposed to LLMs are psychologists and social workers who seem to be universally concerned about its negative effects on mental health and it’s encouragement of abandoning critical thinking.
Like I can’t NOT think through a problem. I already know more about my software than the AI could actually figure out. Anytime I go into GitHub Copilot and say “I want this feature” I get some code and the option to apply it. But the generated code is usually used and doesn’t usually pick up or update existing models. The security flaws are rampant, and the generated tests don’t do much of any real testing.
jbloggs777@discuss.tchncs.de 3 hours ago
It would be interesting to see the logs of your sessions, and compare them to the session logs of happy/productive-AI-coders.
I suspect that some people just think and express themselves in ways that don’t vibe with LLMs. eg. Men are from Mars, AI coding agents are from Venus.
sheetzoos@lemmy.world 7 hours ago
They had a backup and restored everything. This is clickbait.
eleitl@lemmy.zip 7 hours ago
No, they had only snapshots. Which is not a backup. They were lucky support could restore the data which by rights should have been wiped.
sheetzoos@lemmy.world 6 hours ago
…this resulted in a full wipe of the setup for both sites, including a database with 2.5 years of records, and database snapshots that Grigorev had counted on as backups. The operator had to contact Amazon Business support, which helped restore the data within about a day.
Correct, the developer only had snapshots, but the article doesn’t state how Amazon Business restored their data. Amazon business offers both snapshots and full backups.
Regardless of the developer’s shoddy version control, they got their data restored and this non-issue is being used as clickbait to feed people’s confirmation bias.
edgemaster72@lemmy.world 4 hours ago
lol, lmao even
atlasraven@sh.itjust.works 8 hours ago
Skill issue
etchinghillside@reddthat.com 9 hours ago
This is like blaming the gun for killing people.
queermunist@lemmy.ml 9 hours ago
More a problem with the marketing, right? Imagine if guns were marketed as safe and helpful back scratchers, and then someone shoots themselves because they used the gun to scratch their back.
voidsignal@lemmy.world 8 hours ago
They would still be fucking dumb. Believing marketing is a mark of idiocy
surewhynotlem@lemmy.world 9 hours ago
So you’re saying it’s a tool designed to be used by anyone, including idiots, and is dangerous in the hands of idiots. And we as a society should do better to make sure this potentially dangerous tool shouldn’t be used by idiots.
Yep, agree.
KairuByte@lemmy.dbzer0.com 7 hours ago
mereo@piefed.ca 9 hours ago
Given that the infrastructure description included the DataTalks.Club website, this resulted in a full wipe of the setup for both sites, including a database with 2.5 years of records, and database snapshots that Grigorev had counted on as backups. The operator had to contact Amazon Business support, which helped restore the data within about a day.
*sigh*, SNAPSHOTS ARE NOT BACKUPS!
kamen@lemmy.world 1 hour ago
You either have a backup or will have a backup next time.
Something that is always online and can be wiped while you’re working on it (by yourself or with AI, doesn’t matter) shouldn’t count as backup.