I've found that the LLMs tend to over classify and nitpick a fair bit, often missing broader context that accounts for the flaw being tolerated or undiscovered.
They're not wrong, but have no context for triage and so give far too many results. It forces you to consider an LLM subscription yourself just to keep up with the other LLM users which is starting to feel like some form of zero sum red queen's race.
The tsunami of reports won't be receding for a while yet, and we can only hope the teams on the receiving end don't drown in it.