Comment on Smaug-72B-v0.1: The New Open-Source LLM Roaring to the Top of the Leaderboard
girsaysdoom@sh.itjust.works 9 months agoI’m pretty sure you can load the model using RAM like another poster said. Here’s a used server under $600 that could theoretically run it: ebay.
brick@lemm.ee 9 months ago
You would want to look for an R730, which can be had for not too much more. The 20 series was the “end of an era” and the 30 series was the beginning of the next era. Most importantly for this application, R30s use DDR4 whereas R20s use DDR3.
RAM speed matters a lot for ML applications and DDR4 is about 2x as fast as DDR3 in all relevant measurements.
If you’re going to offload any part of these models to CPU, which you 99.99% will have to do for a model of this size with this class of hardware, skip the 20s and go to the 30s.