Interesting stuff. I used to work under an engineer who was obsessed with performance tuning. I remember him converting all of our foreach loops to for loops. He probably told me, but I never knew/retained that it allocated on the heap. I also had always assumed reflection = slower.
Though, as he points out, .NET 10 does a similar kind of optimization for you. So, it seems like largely an unnecessary optimization. I can only imagine a 40-byte allocation optimization matters in an extremely low memory environment or at extreme scale.