My understanding is this claim is basically entirely false. The tests done by these researchers had some glaring errors that when corrected, show gpt-4 is getting slightly better at math, if anything. See this video that describes some of the issues: https://youtu.be/YSokS2ivf7U
TL;DR The researchers gave new GPT questions from two different pools. It's no surprise they got worse answers.