Comment on The Great Software Quality Collapse: How We Normalized Catastrophe
panda_abyss@lemmy.ca 14 hours agoI’ve been thinking of having a small model like a long context qwen 4b run and do quick code review to check for these issues, then just correct the main model.
It feels like a secondary model that only exists to validate that a task was actually completed could work.
FishFace@lemmy.world 13 hours ago
Yeah, it can work, because it’ll trigger the recall of different types of input data. But it’s not magic and if you have a 25% chance of the model you’re using hallucinating, you probably end up still with an 8.5% chance of getting bullshit after doing this.