We really need efforts made to bulk upload historical posts of value to lemmy. If done right, we could significantly expand the amount of subs and content, even if they are ghost towns initially with just the old posts from reddit. Build it and they will migrate.
njordomir@lemmy.world 1 week ago
Is there anywhere I can find a complete scrape of Reddit threads and comments from before the 3rd party app apocalypse? There was a lot of useful info shared on there, but I don’t want anything to do with what that site has become. I’m happy just to CTRL+F a big dataset. It’ll probably still work better than either Reddit or Google does nowadays. Without media I imagine I could fit it somewhere.
Also, Spez is a greedy little pig boy.
Bonskreeskreeskree@lemmy.world 6 days ago
MeThisGuy@feddit.nl 6 days ago
are you going to use it to train your deepseek?
EngineerGaming@feddit.nl 6 days ago
Not everyone is perverted like you.
njordomir@lemmy.world 6 days ago
I never understood the desire to search in conversational language via AI. It’s gone to far for my taste. I just want to be able to scour a huge volume of info for my exact search terms, maybe with a few synonyms or misspellings included. Google and AI keep trying to assume they know what I’m looking for, but they’re always wrong (intentionally wrong based on their own motives).
The reason the dataset interests me is that search has gotten so bad that I can’t get any non-corporate information from search engines anymore, just more pig swill, chumbucket ads, and misinformation slop. Anything I search for would probably give better results if I just searched old reddit, Wikipedia, and a few other datasets locally in a simple way. Not sure what software is best to use for something like that, but I’d like to collect a few mostly pre-AI datasets now to get the ball rolling before you can’t find those online anymore either.
MunkysUnkEnz0@lemmy.world 6 days ago
Yes, there’s a torrent somewhere…