I feel that auto-generated descriptions are going to generally be terrible, even with the new GPT AIs. There's too much context needed to do a good job to be able to just feed an image into some code and get something useful.
On the other hand, transcriptions should be able to be done more accurately, particularly with a bit of extra logic to recognise forms like Twitter posts.
Some database of alt-texts might be possible by scraping for alt-texts and transcriptions from the fediverse, reddit, etc, but a quick search didn't come up with anything.