I see some problems here.
An LLM providing “an opinion” is not a thing, as far as current tech does. It’s just statistically right or wrong, and put that into word, which does not fit nicely with real use cases. Also, lots of tools already have autofix that can (on demand) handle many minor issues you mention, without any LLM. Assuming static analysis is already in place and decent tooling is used, this would not have to reach either a human or an AI agent or anything before getting fixed with little resources.
As anecdotal evidence, we regularly look into those tools on the job. Granted, we don’t have billions of lines of code to check, but so far it’s at best useless. Another anecdotal evidence is the recent outburst from the curl project (and other, following suite) getting a mountain of issues that are bogus.
I have no doubt that there is a place for human-sounding review and advice, alongside other more common uses like completion and documentation, but ultimately these systems are not able to think by design. The work still has to be done. And can’t go much beyond platitudes. You ask how common the horrible cases are, but that might not be the correct question. Horrific comments are easy to spot and filter out. Perfectly decent looking “minor fixes” that are well worded, follow guidelines, and pass all checks, while introducing an off by one error or suddenly decides to swap two parameters that happens to be compatible and make sense in context are the issue. And those, even if rare (empirically I’d say they are not that rare for now) are so much harder to spot without full human analysis, are a real threat.
Yet another anecdotal… yes, that’s a lot. Given the current hype, I can only base my findings on personal experience, mostly. I use AI-based code completion, assuming it’s short enough to check at a glance, and the context is small enough that it can’t make mistakes. At most two-three lines at time. Even in this context, while checking that the generated code matches what I was going to write, I’ve seen a handful of mistakes slip through over a few months. It makes me dread what could get through a PR system, where the codebase is not necessarily fresh in the mind of the reviewer.
This is not to say that none of that is useful, but if it were to be, it would require extremely high level of trust, far higher than current human intervention (which is also not great and source of mistakes, I’m very aware of that) to be. The goal should not be to emulate human mistakes, but to make something better.
spankmonkey@lemmy.world 3 days ago
The “AI agent” approach’s goal doesn’t include a human reviewer. As in the agent is independent, or is reviewed by other AI agents. Full automation.
They are selling those AI agents as working right now despite the obvious flaws.
mcv@lemm.ee 19 hours ago
From what I know, those agents can be absolutely fantastic as long as they run under strict guidance of a senior developer who really knows how to use them. Fully autonomous agents sound like a terrible idea.
MangoCats@feddit.it 3 days ago
They’re also selling self-driving cars… the question is: when will the self driving cars kill fewer people per passenger-mile than average human drivers?
spankmonkey@lemmy.world 3 days ago
Right now they do between a combination of extra oversight, generally travelling at slow speeds, and being resticted in area. Kind of like how children are less likely to die in a swimming pool with lifeguards compared to rivers and beaches without lifeguards.
Once they are released into the wild I expect a number of high profile deaths, but also assume that those fatalities will be significantly lower than the human average due to being set to be overly cautious. I do expect them to have a high rate of low speed collisions when they encounter confusing or absent road markings in rural areas.
MangoCats@feddit.it 3 days ago
Not self driving but “driver assist” on a rental we had recently would see skid marks on the road and swerve to follow them - every single time. That’s going to be a difference between the automated systems and human drivers - humans do some horrifically negligent and terrible things, but… most humans tend not to repeat the same mistake too many times.
With “the algorithm” controlling thousands or millions of vehicles, when somebody finds a hack that causes one to crash, they’ve got a hack that will cause all similar ones to crash. I doubt we’re anywhere near “safe” learn from their mistakes self-recoding on these systems yet, that has the potential for even worse and less predictable outcomes.
echodot@feddit.uk 2 days ago
There’s more to it than that, there’s also the cost of implementation.
If a self-driving car killed on average one less human than your average human does, but costs $100,000 to install in the car, then it still isn’t worth implementing.
Yes I know that puts a price on human life but that is how economics works.
MangoCats@feddit.it 2 days ago
$100K for a safer driver might be well worth it to a lot of people, particularly if it’s a one-time charge. If that $100K autopilot can serve for seven years, that’s way cheaper than paying a chauffeur.
Initiateofthevoid@lemmy.dbzer0.com 3 days ago
The issue will remain that liability will be completely transferred from individual humans to faceless corporations. I want self-driving cars to be a thing - computers can certainly be better than humans at driving - but I don’t want that technology to be profit-motivated.
They will inevitably cause some accidents that could have been prevented if not for the “move fast and break things” style of tech development. A negligent driver can go to jail, a negligent corporation gets a slap on the wrist in our society. And traffic collisions will result in having to face powerful litigation teams when they inevitably refuse to pay for damages through automated AI refusal like private health insurance companies.