If they weren’t trained on the same data, it ends up similar
Training inferior models with superior models output can lower the gap between both. It’ll not be optimal by any means and you might fuck its future learning, but it will work to an extent