Is there an open source package that the Internet Archive run? What is it? I assume sites like archive.is run the same.
I believe they used heritrix at one point. The important bit is that there is a special archive format that they use which is a standard. There are several tools that support it - it allows for capturing a website in a ‘working’ condition with history or something. I’m a bit fuzzy on it since it’s been some time since I looked into it.
kittykittycatboys@lemmy.blahaj.zone 9 months ago
afaik, archive.org isnt open source. id recommend something like archivebox.io
possiblylinux127@lemmy.zip 9 months ago
Archive box is a piece of software and the Internet archive is a organization that is focused on predicting the content on the internet.
The Internet Archive has PBs worth of data. I doubt any home user could manage that.
z00s@lemmy.world 9 months ago
kittykittycatboys@lemmy.blahaj.zone 9 months ago
i dont think op is looking to mirror archive.org, my take was that they wanted someyhing like archive.org but selfhosted and for personal / small-scale use
avidamoeba@lemmy.ca 9 months ago
Oh yes, this looks like a winner. Thanks!
On a far side of the moon note, I wonder if ActivityPub could be used to federate multiple archiveboxes to create a more resilient Internet Archive. 🤔
density@kbin.social 9 months ago
a network between networks to make them more resilient i think you've just invented the arpanet?.
Dehydrated@lemmy.world 9 months ago
+1 for ArchiveBox