Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

Backing up Spotify

⁨441⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨days⁩ ago⁩ by ⁨JensSpahnpasta@feddit.org⁩ to ⁨technology@lemmy.world⁩

https://annas-archive.li/blog/backing-up-spotify.html

source

Comments

Sort:hotnewtop
  • kindred@lemmy.dbzer0.com ⁨2⁩ ⁨days⁩ ago

    This is by far the largest music metadata database that is publicly available. For comparison, we have 256 million tracks, while others have 50-150 million. Our data is well-annotated: MusicBrainz has 5 million unique ISRCs, while our database has 186 million.

    Does this mean the MusicBrainz database will soon go from 5 million to 186 million tracks?

    source
    • xploit@lemmy.world ⁨2⁩ ⁨days⁩ ago

      Asking the real questions here…

      source
    • exu@feditown.com ⁨2⁩ ⁨days⁩ ago

      Probably not worth it to store the AI tracks

      source
    • zingo@sh.itjust.works ⁨1⁩ ⁨day⁩ ago

      That’s exactly what I was wondering too.

      Acquiring high quality music is already nontrivial in most cases.

      What I am interested in is the metadata. Accurate tagging of all my files is of high interest.

      source
    • purplemonkeymad@programming.dev ⁨1⁩ ⁨day⁩ ago

      If I ran mb, I would be cautious importing the data directly. I’m sure Spotify would consider it trade information and go after anyone directly using it. However if a few million people added the tracks with individual edits then it probably won’t take too long.

      source
      • Knock_Knock_Lemmy_In@lemmy.world ⁨1⁩ ⁨day⁩ ago

        I thought metadata couldn’t be copyrighted though?

        source
        • -> View More Comments
  • massive_bereavement@fedia.io ⁨2⁩ ⁨days⁩ ago

    I'll strongly suggest to take out all the cheaply AI generated music from this "back up" and save themselves some space.

    source
    • AnarchistArtificer@slrpnk.net ⁨2⁩ ⁨days⁩ ago

      I’m not sure how they would go about doing that at scale without also getting some false positives and removing human music too

      source
      • cheesybuddha@lemmy.world ⁨1⁩ ⁨day⁩ ago

        You could cut off your search around the time AI tracks started to appear. Not sure when that was, maybe 2023. You’d miss a lot of recent stuff, but you’d filter out a lot of spam too

        source
        • -> View More Comments
    • nibbler@discuss.tchncs.de ⁨2⁩ ⁨days⁩ ago

      do you have any numbers on the AI share? I doubt it’s more than a 2%, so I assume you are just virtue signalling on a completely unrelated topic here :-)

      source
      • FG_3479@lemmy.world ⁨2⁩ ⁨days⁩ ago

        AI slop can be made and distributed in ginourmous numbers. I wouldn’t be suprised if at least 3/4 of uploads from the past 2 years are AI.

        source
        • -> View More Comments
  • arcterus@piefed.blahaj.zone ⁨2⁩ ⁨days⁩ ago
    1. Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.
    • We primarily used Spotify’s “popularity” metric to prioritize tracks. View the top 10,000 most popular songs in this HTML file (13.8MB gzipped).
    • For popularity>0, we got close to all tracks on the platform. The quality is the original OGG Vorbis at 160kbit/s. Metadata was added without reencoding the audio (and an archive of diff files is available to reconstruct the original files from Spotify, as well as a metadata file with original hashes and checksums).
    • For popularity=0, we got files representing about half the number of listens (either original or a copy with the same ISRC). The audio is reencoded to OGG Opus at 75kbit/s — sounding the same to most people, but noticeable to an expert.

    Perhaps I’m reading this wrong, but is this not a little backwards? Since unpopular music is poorly preserved, shouldn’t the focus be on getting the least popular music first?

    source
    • WolfLink@sh.itjust.works ⁨1⁩ ⁨day⁩ ago

      Unfortunately if you sort by least popular musicon Spotify, you’ll get nothing but spam

      source
    • Techlos@lemmy.dbzer0.com ⁨20⁩ ⁨hours⁩ ago

      If you want that long tail, bandcamp and soundcloud are better sources. The barrier to entry is low with those, and there’s a plethora of small, niche artists just doing their own thing.

      For a representative snapshot of music though, it’s pretty amazing. It shows what a massive percentage of the planet listens to, preserved hopefully across many seeds, and historians will love shit like this in the future.

      source
    • JensSpahnpasta@feddit.org ⁨2⁩ ⁨days⁩ ago

      It depends on what your goal is: If you want to preserve the music that is important to most people or to the era, you should start with the most popular stuff. And Spotify has a big spam problem. Everybody who thinks he is a DJ wants his music to be on there and there is so much AI music flooding the scene. So it does make sense to backup what people are actually listening and not some AI-generated music spam nobody cares about.

      source
      • mrdown@lemmy.world ⁨19⁩ ⁨hours⁩ ago

        I am pretty sure the major labels are already preserving the most mainstream artists. Msybe it should be sorting by the most popular independent artists

        source
      • arcterus@piefed.blahaj.zone ⁨2⁩ ⁨days⁩ ago

        I mean, they say earlier that music is actually well-preserved, but it’s disproportionately popular music. If the goal is then to preserve everything, I’d expect them to go for stuff that isn’t likely to be in some random audiophile’s collection or whatever then.

        source
    • UltraMagnus@startrek.website ⁨2⁩ ⁨days⁩ ago

      The politics of preservation is definitely an interesting one. I suppose one argument in favor of preserving more popular music is that there are going to be fewer popular tracks than unpopular tracks - and they’re already at 300TB, which is nothing to sneeze at, especially since it’s a third the size of their existing library of ebooks.

      source
    • dustyData@lemmy.world ⁨2⁩ ⁨days⁩ ago

      If we were talking about the ethnic music of an extinct tribe that uses a language on risk of disappearing, sure, you would be right.

      But think about it for a bit longer. They are just a commercial production that had no cultural impact in a population. They are still getting preserved in a format with a quality degradation that is imperceptible to the human ear. That’s usually enough. Audiophiles are usually overzealous about fidelity preservation. But the efforts are often misguided and discussions abound on technical topics that ultimately don’t matter.

      source
    • thermal_shock@lemmy.world ⁨1⁩ ⁨day⁩ ago

      I agree. I seed torrents/files that took me a long time to finish.

      source
  • helpImTrappedOnline@lemmy.world ⁨2⁩ ⁨days⁩ ago

    The data they compiled is really cool.

    If reading the chart right, the genera with the most artists is opera.

    Even if they didn’t have the music files, the analysis on the metadata is insane.

    Publicly admitting they are the origin of the torrents is definitely a risky move. I don’t think they want Sony going after them, but also fuck Sony for locking art behind shitty contracts that forces these kind of projects to exist.

    source
    • JensSpahnpasta@feddit.org ⁨2⁩ ⁨days⁩ ago

      Publicly admitting they are the origin of the torrents is definitely a risky an insane move. I don’t think they want Sony going after them

      Let’s be honest: Everybody is trying to go after Annas Archive. Every book publisher wants to get them, the US government, too and it really doesn’t matter if every music publisher wants them also. I hope that they are based in a country where the western systems can’t get them

      source
      • Tangent5280@lemmy.world ⁨1⁩ ⁨day⁩ ago

        I hope (also assume since it hasn’t been taken down yet) it’s more of a decentralised deal with servers in many places and backups in every nation under the sun

        source
    • douglasg14b@lemmy.world ⁨2⁩ ⁨days⁩ ago

      Yeah, it’s a wild move admitting that they are the source of pirated content for music here.

      We don’t need Anna’s Archive to go under as a result of Sony going after them because of this…

      source
      • rainwall@piefed.social ⁨2⁩ ⁨days⁩ ago

        They have had to hop countries a half dozen times. They are already enemy #1, in piracy terms, so I expect they are okay leaning into it and doijg mkre good for the world.

        source
    • mrdown@lemmy.world ⁨1⁩ ⁨day⁩ ago

      The 3 major labels are equally predatory not only Sony

      source
  • lietuva@lemmy.world ⁨2⁩ ⁨days⁩ ago

    There’s definitely gonna be some crazy guy who will but this on their server and stream it to their phones lol

    source
    • extremeboredom@lemmy.world ⁨2⁩ ⁨days⁩ ago

      Hi it’s me

      source
    • thermal_shock@lemmy.world ⁨1⁩ ⁨day⁩ ago

      I stream mine through Plexamp. Up to almost 400k tracks.

      source
    • Agility0971@lemmy.world ⁨2⁩ ⁨days⁩ ago

      Oh im thinking of it lol

      source
      • JoeKrogan@lemmy.world ⁨23⁩ ⁨hours⁩ ago

        Please do if you can and keep seeding it if possible.

        source
    • cheesybuddha@lemmy.world ⁨1⁩ ⁨day⁩ ago

      If I had an extra 300 tb I’d do it.

      source
      • EpicFailGuy@lemmy.world ⁨23⁩ ⁨hours⁩ ago

        Tagging /datahoarded

        source
    • Knock_Knock_Lemmy_In@lemmy.world ⁨1⁩ ⁨day⁩ ago

      Just a random question. What would the cost be?

      source
      • IsoKiero@sopuli.xyz ⁨1⁩ ⁨day⁩ ago

        You can get refurbished hard drives for around 300$/20TB (quickly searched estimation). So, 15 drives plus maybe another 5 for raid reundancy takes you back 6k$. Server to hold those drives 1-2k$ (used), UPS, internet connection and other bits’n’bobs and your total is very roughly around 8k$ (or €, as I threw the estimations on a pretty big ballpark).

        source
        • -> View More Comments
    • Barthosw@lemmy.world ⁨2⁩ ⁨days⁩ ago

      My first though as well

      source
  • jaschen306@sh.itjust.works ⁨1⁩ ⁨day⁩ ago

    I guess I gotta donate more to anna

    source
  • Zachariah@lemmy.world ⁨2⁩ ⁨days⁩ ago

    This is the one thing on Spotify I can’t get elsewhere. Would be nice to have a non transcode copy.

    open.spotify.com/album/4emoC6C9fCDkWPdTuxN9an

    …Like Cologne (Spotify Exclusive)
    Queens of the Stone Age
    2013 • 3 songs • 14 min 5 sec

    source
    • archonet@lemy.lol ⁨2⁩ ⁨days⁩ ago

      try OnTheSpot.

      source
      • Zachariah@lemmy.world ⁨2⁩ ⁨days⁩ ago

        Does it circumvent the drm, or does it re-encode decompressed audio?

        source
        • -> View More Comments
    • pulsewidth@lemmy.world ⁨2⁩ ⁨days⁩ ago

      Well, since this archive says it contains the original ogg @160kbps for all artists with a popularity >0, it’ll be in this collection. Your wait may be over soon.

      source
      • Zachariah@lemmy.world ⁨2⁩ ⁨days⁩ ago

        sweet

        source
    • thermal_shock@lemmy.world ⁨1⁩ ⁨day⁩ ago

      github.com/justin025/onthespot/releases/…/v0.7.2

      source
  • JoeKrogan@lemmy.world ⁨1⁩ ⁨day⁩ ago

    Dont have the space but love to see this. I hope people seed this for a long time

    source
  • nymnympseudonym@piefed.social ⁨2⁩ ⁨days⁩ ago

    Spotify is why I set up a Funkwhale server

    source
    • Valmond@lemmy.world ⁨2⁩ ⁨days⁩ ago

      Is funkwhale also a sort of soulseek?

      source
      • Prunebutt@slrpnk.net ⁨2⁩ ⁨days⁩ ago

        AFAIK: Yes. But it’s supposedly a pain to set up, so I’ll never know the difference.

        source
        • -> View More Comments
      • Wolf314159@startrek.website ⁨2⁩ ⁨days⁩ ago

        No. Soulseek is old school P2P. All you need to do is run the client software, set a local shared folder, and your are client and server in one. Funkwhale is more like running your own Lemmy instance and building a community. The difference between them is like the difference between using Airdrop or Syncthing to share files and hosting hosting your own domain and server.

        source
        • -> View More Comments
      • three@lemmy.zip ⁨2⁩ ⁨days⁩ ago

        Oh no, around here we mention esoteric software but we will never include any extra information in the post. If you know you know.

        source
        • -> View More Comments
      • nymnympseudonym@piefed.social ⁨2⁩ ⁨days⁩ ago

        Soulseek afict requires dedicated clients. The Subsonic standard is supported by more & more mobile/PC apps, I wish it was supported

        source
  • Mihies@programming.dev ⁨2⁩ ⁨days⁩ ago

    So the artists get paid even less than from Spotify?

    source
    • gravitywell@sh.itjust.works ⁨2⁩ ⁨days⁩ ago

      No, its so Sony, UMG, and all the other leeches can get paid even less.

      I dont feel like editing the image but imagine the guy with most of the cookies in this picture was UMG and the artists are the guy on the right.

      Image

      source
      • Mihies@programming.dev ⁨2⁩ ⁨days⁩ ago

        Yes, sure, but if those don’t get paid, artists don’t get paid. And artists are not forced to pick a label, they are free to go solo, but they still prefer labels, so it’s not that black and white labels bad, artists good

        source
        • -> View More Comments
    • noodlejetski@piefed.social ⁨2⁩ ⁨days⁩ ago

      a few years ago, back when I was still using Spotify, I checked my Wrapped and apparently I was using Spotify more than 99.5% of users in my country, and when it came to my most listened artist, I was in top 0.05% listeners worldwide. doing some back-of-the-napkin math with the data I got online about Spotify’s payouts, it turned out the money the artist got during that year from me amounted to less than a dollar.

      if you want to support artists, use the money you’d pay for your music streaming subscription and buy their album or a piece of merch every two months.

      source
      • AnarchistArtificer@slrpnk.net ⁨2⁩ ⁨days⁩ ago

        Yeah, I’ve been seeing an increasing number of artists who are pro piracy, who basically say “steal our music, save your money, and if you want to support us, come to a gig and buy some merch”.

        I’ve also seen more and more artists staying off Spotify entirely. One such artist is the wonderful folk artist Lucy & Hazel . This was the first time I actually bought music in years, and a big part of that was because I wanted to support their active choice to stay off Spotify.

        An unexpected side effect of this is that because I’m aware these guys are situated less optimally for algorithmic discoverability, I find myself actively recommending them to people. It feels nice compared to the more passive mode of algorithmic music discovery

        source
      • HereIAm@lemmy.world ⁨2⁩ ⁨days⁩ ago

        I’ve had Spotify since it basically released. I fully switched to a self hosted music library about 5 months ago. I imagine I’ve supported artists more in those 5 months than I did during my 18-ish years of Spotify premium. I still use Soulseek for large artists or quite old albums, but most new releases and remix tracks I pay for.

        source
      • Mihies@programming.dev ⁨2⁩ ⁨days⁩ ago

        How many buyers are there is entire archive is available for free? 10? 20?

        source
        • -> View More Comments
    • CoyoteFacts@piefed.ca ⁨2⁩ ⁨days⁩ ago

      I’m guessing this is more about preserving culture and art. I find it unlikely that this post would be someone’s first clue that they could listen to music for free, and listening to music out of this dump would be way harder than any other method.

      source
      • Mihies@programming.dev ⁨2⁩ ⁨days⁩ ago

        You underestimate people and their motivation to listen for free no matter what.

        source
    • Dyskolos@lemmy.zip ⁨2⁩ ⁨days⁩ ago

      Who’s fault is it that there’s no fair systems one could use (except maybe bandcamp)? Not mine at least, I don’t use Spotify at all. I would not sell my music there if I would be an artist.

      source
      • bridgeenjoyer@sh.itjust.works ⁨23⁩ ⁨hours⁩ ago

        Bandcamp is good. Bands still have websites and mailing lists too. There was never anything wrong with these but big tech wants to keep you in their walled garden and forget the TRUE internet still exists out there.

        Also, record stores bro.

        source
        • -> View More Comments
      • Mihies@programming.dev ⁨2⁩ ⁨days⁩ ago

        I don’t use Spotify, either. And do what you want to do, nobody is forcing your to put your music on Spotify.

        source
    • zingo@sh.itjust.works ⁨1⁩ ⁨day⁩ ago

      Well, we are talking pennies here so… /s

      source
      • Mihies@programming.dev ⁨1⁩ ⁨day⁩ ago

        It’s not just Spotify, it content is free for all, then who is buying?

        source
  • gtr@programming.dev ⁨2⁩ ⁨days⁩ ago

    Damn, boy!

    source
  • exu@feditown.com ⁨2⁩ ⁨days⁩ ago

    Oo, I’ll have to check those when they release. I follow some artists that only upload to YouTube and Spotify, neither of which is ideal.

    source