If you read the article you find this was a dataset from a nonprofit, available to anyone. The nonprofit used captions from a set of YouTube videos.
“Most of the Pile’s datasets are accessible and open for anyone on the internet with enough space and computing power to access them.”
That anyone included a lot of other big names in tech, not just Apple.
Also I wasn’t aware that Apple had its own AI. I thought they were licensing stuff from others like OpenAI. I guess maybe this is some research project for an unannounced project?