Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.
We primarily used Spotify’s “popularity” metric to prioritize tracks. View the top 10,000 most popular songs in this HTML file (13.8MB gzipped).
For popularity>0, we got close to all tracks on the platform. The quality is the original OGG Vorbis at 160kbit/s. Metadata was added without reencoding the audio (and an archive of diff files is available to reconstruct the original files from Spotify, as well as a metadata file with original hashes and checksums).
For popularity=0, we got files representing about half the number of listens (either original or a copy with the same ISRC). The audio is reencoded to OGG Opus at 75kbit/s — sounding the same to most people, but noticeable to an expert.
Perhaps I'm reading this wrong, but is this not a little backwards? Since unpopular music is poorly preserved, shouldn't the focus be on getting the least popular music first?
The politics of preservation is definitely an interesting one. I suppose one argument in favor of preserving more popular music is that there are going to be fewer popular tracks than unpopular tracks - and they're already at 300TB, which is nothing to sneeze at, especially since it's a third the size of their existing library of ebooks.
Perhaps I'm reading this wrong, but is this not a little backwards? Since unpopular music is poorly preserved, shouldn't the focus be on getting the least popular music first?
The politics of preservation is definitely an interesting one. I suppose one argument in favor of preserving more popular music is that there are going to be fewer popular tracks than unpopular tracks - and they're already at 300TB, which is nothing to sneeze at, especially since it's a third the size of their existing library of ebooks.