I’d also guess that the large number of vowels in English has to do with it, General American English has around 16 vowels (counting both monophthongs and diphthongs, other varieties of English have similar amounts)
I feel that when there’s that many vowels, the exact quality of the vowel is less important and thus they can shift around more