Is anyone aware of an existing project that can do something like this:
- Access an RSS feed.
- Parse the contents of the items in the feed, and fetch linked images.
- Take the new feed elements and add them to previously fetched elements.
- Store all of the content in a merged RSS/XML file, or something like a SQLite DB.
Context: I’d like to archive Mastodon posts of an account automatically. I’d prefer it to be a script/binary I could run on Linux as I’d likely throw it in a GitHub action and save the resulting output in the git repo.
I could probably whip something together but I’m lazy and I’d prefer to use something that already exists.
abhibeckert@lemmy.world 1 year ago
I don’t know of a project that does this, but if I was to tackle it I would convert the RSS to the Activity Streams standard - www.w3.org/TR/activitystreams-core/.
Activity Streams are basically the new RSS and it’s a lot better. You have a separate record for each value in the feed, each one could be stored in a database row or just as a separate file on disk. It also uses JSON instead of XML, which is easier to work with.
Mastodon is built on Activity Streams - so you shouldn’t even need to touch RSS at all. The RSS feed is an alternate version of the stream.
bogo@sh.itjust.works 1 year ago
Yes, the “Request Archive” method may be the “don’t over engineer this stupid” option I go with.