I’ve seen it generate working unit tests plenty. In the sense that they pass.
…they do not actually test the functionality.
Of course that function returns what you’re asserting - you overwrote its actual output and checked against that!
Comment on AI Coding Is Massively Overhyped, Report Finds
Baguette@lemmy.blahaj.zone 3 weeks ago
I’d be inclined to try using it if it was smart enough to write my unit tests properly, but it’s great at double inserting the same mock and have 0 working unit tests.
I might try using it to generate some javadoc though… then when my org inevitably starts polling how much ai I use I won’t be in the gutter lol
I’ve seen it generate working unit tests plenty. In the sense that they pass.
…they do not actually test the functionality.
Of course that function returns what you’re asserting - you overwrote its actual output and checked against that!
One of the guys at my old job submitted a PR with tests that basically just mocked everything, tested nothing. Like,
with patch("something.whatever", return_value=True): assert whatever(0) is True assert whatever(1) is True
Except for a few dozen lines, with names that made it look like they were doing useful.
He used AI to generate them, of course. Pretty useless.
We have had guys submit tests like that, long before AI was a thing.
At least in those situations, the person writing the tests knows they’re not testing anything…
Some do, some don’t, but more importantly: most just don’t care.
I had a tester wander into a set of edge cases which weren’t 100% properly handled and their first reaction was “gee, maybe I didn’t see that, it sounds like I’m going to have a lot more work because I did.”
True, I do feel mocked by this code.
sugar_in_your_tea@sh.itjust.works 3 weeks ago
I personally think unit tests are the worst application of AI. Tests are there to ensure the code is correct, so ideally the dev would write the tests to verify that the AI-generated code is correct.
I personally don’t use AI to write code, since writing code is the easiest and quickest part of my job. I instead use it to generate examples of using a new library, give me comparisons of different options, etc, and then I write the code after that. Basically, I use it as a replacement for a search engine/blog posts.
MangoCats@feddit.it 3 weeks ago
Ideally, there are requirements before anything, and some TDD types argue that the tests should come before the code as well.
Ideally, the customer is well represented during requirements development - ideally, not by the code developer.
Ideally, the code developer is not the same person that develops the unit tests.
Ideally, someone other than the test developer reviews the tests to assure that the tests do in-fact provide requirements coverage.
Ideally, the modules that come together to make the system function have similarly tight requirements and unit-tests and reviews, and the whole thing runs CI/CD to notify developers of any regressions/bugs within minutes of code check in.
In reality, some portion of that process (often, most of it) is short-cut for one or many reasons. Replacing the missing bits with AI is better than not having them at all.
sugar_in_your_tea@sh.itjust.works 3 weeks ago
Why? The developer is exactly the person I want writing the tests.
There should also be integration tests written by a separate QA, but unit tests should 100% be the responsibility of the dev making the change.
I disagree. A bad test is worse than no test, because it gives you a false sense of security. I can identify missing tests with coverage reports, I can’t easily identify bad tests. If I’m working in a codebase with poor coverage, I’ll be extra careful to check for any downstream impacts of my change because I know the test suite won’t help me. If I’m working in a codebase with poor tests but high coverage, I may assume a test pass indicates that I didn’t break anything else.
If a company is going to rely heavily on AI for codegen, I’d expect tests to be manually written and have very high test coverage.
MangoCats@feddit.it 3 weeks ago
True enough
Also agree, if your org has trimmed to the point that you’re just making tests to say you have tests, with no review as to their efficacy, they will be getting what they deserve soon enough.
If a company is going to rely heavily on AI for anything I’d expect a significant traditional human employee backstop to the AI until it has a track record. Not “buckle up, we’re gonna try somethin’” track record, more like two or three full business cycles before starting to divest of the human capital that built the business to where it is today. Though, if your business is on the ropes and likely to tank anyway… why not try something new?
Was a story about IBM letting thousands of workers go, replacing them with AI… then hiring even more workers in other areas with the money saved from the AI retooling. Apparently they let a bunch of HR and other admin staff go and beefed up on sales and product development. There are some jobs that you want more predictable algorithms in than potentially biased people, and HR seems like an area that could have a lot of that.
Nalivai@lemmy.world 3 weeks ago
It’s better if it’s a different developer, so they don’t know the nuances of your implementation and test functionality only, avoids some mistakes. You’re correct on all the other points.
Nalivai@lemmy.world 3 weeks ago
Nah, bullshit tests that pretend to be tests but are essentially “if true == true then pass” is significantly worse than no test at all.
MangoCats@feddit.it 3 weeks ago
Sure. But, unsupervised developers who: write the code, write their own tests, change companies every 18 months, are even more likely to pull BS like that than AI is.
You can actually get some test validity oversight out of AI review of the requirements and tests, not perfect, but better than self-supervised new hires.
themaninblack@lemmy.world 3 weeks ago
Saved this comment. No notes.
Baguette@lemmy.blahaj.zone 3 weeks ago
To preface I don’t actually use ai for anything at my job, which might be a bad metric but my workflow is 10x slower if i even try using ai
That said, I want AI to be able to do unit tests in the sense that I can write some starting ones, then it be able to infer what branches aren’t covered and help me fill the rest
Obviously it’s not smart enough, and honestly I highly doubt it will ever be because that’s the nature of llm, but my peeve with unit test is that testing branches usually entail just copying the exact same test but changing one field to be an invalid value, or a dependency to throw. It’s not hard, just tedious. Branching coverage is already enforced, so you should know when you forgot to test a case.
I also think you should treat ai code as a pull request and actually review what it writes. My coworkers that do use it don’t really proofread, so it ends up having some bad practices and code smells.
sugar_in_your_tea@sh.itjust.works 3 weeks ago
That’s what parameterization is for. In unit tests, most dependencies should be mocked, so expecting a dependency to throw shouldn’t really be a thing much of the time.
You can get the first half with coverage tools. The second half should be fairly straightforward, assuming you wrote the code. If a branch is hard to hit (i.e. it happens if an OS or library function fails), either mock that part or don’t bother with the test. I ask my team to hit 70-80% code coverage because that last 20-30% tends to be extreme corner cases that are hard to hit.
And this is the problem. Reviewers only know so much about the overall context and often do a surface level review unless you’re touching something super important.
We can make conventions all we want, but people will be lazy and submit crap, especially when deadlines are close. >
Baguette@lemmy.blahaj.zone 3 weeks ago
The issue with my org is the push to be ci/cd means 90% line and branch coverage, which ends up being you spend just as much time writing tests as actually developing the feature, which already is on an accelerated schedule because my org has made promises that end up becoming ridiculous deadlines, like a 2 month project becoming a 1 month deadline
Mocking is easy, almost everything in my team’s codebase is designed to be mockable. The only stuff I can think of that isn’t mocked are usually just clocks, which you could mock but I actually like using fixed clocks for unit testing most of the time. But mocking is also tedious. Lots of mocks end up being:
Chances are, if you wrote it you should already know what branches are there. It’s just translating that to actual unit tests that’s a pain. Branching logic should be easy to read as well. If I read a nested if statement chances are there’s something that can be redesigned better.
I also think that 90% of actual testing should be done through integ tests. Unit tests to me helps to validate what you expect to happen, but expectations don’t necessarily equate to real dependencies and inputs. But that’s a preference, mostly because our design philosophy revolves around dependency injection.
MangoCats@feddit.it 3 weeks ago
A software tester walks into a bar, he orders a beer.
He orders -1 beers.
He orders 0 beers.
He orders 843909245824 beers.
He orders duck beers.
AI can be trained to do that, but if you are in a not-well-trodden space, you’ll want to be defining your own edge cases in addition to whatever AI comes up with.
ganryuu@lemmy.ca 3 weeks ago
Way I heard this joke, it continues with:
A real customer enters.
He asks where the toilets are.
Thw bar explodes.
FishFace@lemmy.world 3 weeks ago
The reason tests are a good candidate is that there is a lot of boilerplate and no complicated business logic. It can be quite a time saver. You probably know some untested code in some project - you could get an llm to write some tests that would at least poke some key code paths, which is better than nothing. If the tests are wrong, it’s barely worse than having no tests.
theolodis@feddit.org 3 weeks ago
Wrong tests will make you feel safe. And in the worst case, the next developer that is going to port the code will think that somebody wrote those tests with intention, and potentially create broken code to make the test green.
sugar_in_your_tea@sh.itjust.works 3 weeks ago
Exactly! I’ve seen plenty of tests where the test code was confidently wrong and it was obvious the dev just copied the output into the assertion instead of asserting what they expect the output to be. In fact, when I joined my current org, most of the tests were snapshot tests, which automated that process. I’ve pushed to replace them such with better tests, and we caught bugs in the process.
FishFace@lemmy.world 3 weeks ago
Then write comments in the tests that say they haven’t been checked.
That is indeed the absolute worst case though, and most of the tests that are so produced will be giving value because checking a test is easier than checking the code (this is kind of the point of tests) and so most will be correct.
The risk of regressions covered by the good tests is higher than someone writing code to the rare bad test that you’ve marked as suspicious because you (for whatever reason) are not confident in your ability to check it.
sugar_in_your_tea@sh.itjust.works 3 weeks ago
I disagree. I’d much rather have a lower coverage with high quality tests than high coverage with dubious tests.
If your tests are repetitive, you’re probably writing your tests wrong, or at least focusing on the wrong logic to test. Unit tests should prove the correctness of business logic and calculations. If there’s no significant business logic, there’s little priority for writing a test.
FishFace@lemmy.world 3 weeks ago
The actual risk of those tests being wrong is low because you’re checking them.
If your tests aren’t repetitive they’ve got no setup or mocking in so they don’t test very much.
Draces@lemmy.world 3 weeks ago
What model are you using? I’ve had such a radically different experience but I’ve only bothered with the latest models. The old ones weren’t even worth trying with
sugar_in_your_tea@sh.itjust.works 3 weeks ago
I’ll have to check, we have a few models hosted at our company and I forget the exact versions and whatnot. They’re relatively recent, but not the highest end since we need to host them locally.
But the issue here isn’t directly related to which model it is, but to the way LLMs work. They cannot reason, they can only give believable output. If the goal is code coverage, it’ll get coverage, but not necessarily be well designed.
If both the logic and the tests are automated, humans will be lazy and miss stuff. If only the logic is generated, humans can treat the code as a black box and write good tests that way. Humans will be lazy with whatever is automated, so if I have to pick one to be hand written, it’ll be the code that ensures the logic is correct.
wesley@yall.theatl.social 3 weeks ago
We’re mandated to use it at my work. For unit tests it can really go wild and it’ll write thousands of lines of tests to cover a single file/class for instance whereas a developer would probably only write a fourth as much. You have to be specific to get any decent output from them like “write a test for this function and use inputs x and y and the expected output is z”
Personally I like writing tests too and I think through what test cases I need based on what the code is supposed to do. Maybe if there are annoying mocks that I need to create I’ll let the AI do that part or something.