Comment

Comment on Nvidia loses $500 bn in value as Chinese AI firm jolts tech shares

UnderpantsWeevil@lemmy.world ⁨5⁩ ⁨months⁩ ago

No I dont have thousands of almost top of the line graphics cards to retain an LLM from scratch

Fortunately, you don’t need thousands of top of the line cards to train the DeepSeek model. That’s the innovation people are excited about. The model improves on the original LLM design to reduce time to train and time to retrieve information.

Contrary to common belief, an LLM isn’t just a fancy Wikipedia. Its a schema for building out a graph of individual pieces of data, attached to a translation tool that turns human-language inputs into graph-search parameters. If you put facts about Tianamen Square in 1989 into the model, you’ll get them back as results through the front-end.

You don’t need to be scared of technology just because the team that introduced the original training data didn’t configure this piece of open-source software the way you like it.

that’s still no excuse to sweep under the rug blatant censorship of topics the CCP dont want to be talked about.

Wow ok, you really dont know what you’re talking about huh?

source

Sort:hotnew top

Womble@lemmy.world ⁨5⁩ ⁨months⁩ ago
www.analyticsvidhya.com/blog/2024/…/deepseek-v3/

Huh I guess 6 million USD is not millions eh? The innovation is it’s comparatively cheap to train, compared to the billions OpenAI et al are spending. (and that is with acquiring thousands of H800s not included in the cost)

source
- UnderpantsWeevil@lemmy.world ⁨5⁩ ⁨months⁩ ago
  
  The innovation is it’s comparatively cheap to train, compared to the billions
  
  Smaller builds with less comprehensive datasets take less time and money. Again, this doesn’t have to be encyclopedic. You can train your model entirely on a small sample of historical events in and around Beijing in 1989 if you are exclusively fixated on getting results back about Tienanmen Square.
  
  source
  - Womble@lemmy.world ⁨5⁩ ⁨months⁩ ago
    Oh, by the way, as to your theory of “maybe it just doesnt know about Tienanmen, its not an encyclopedia”…
    
    Image
    
    source
    Dhs92@programming.dev ⁨5⁩ ⁨months⁩ ago
    I don’t think I’ve seen that internal dialog before with LLMs. Do you get that with most models when running using ollama?
    
    source
    -> View More Comments
  - Womble@lemmy.world ⁨5⁩ ⁨months⁩ ago
    Ok sure, as I said before I am grateful that they have done this and open sourced it. But it is still deliberately politically censored, and no “Just train your own bro” is not a reasonable reply to that.
    
    source
    Rai@lemmy.dbzer0.com ⁨5⁩ ⁨months⁩ ago
    They know less than I do about LLMs of that’s something they think you can just DO… and that’s saying a lot.
    
    source
MrTolkinghoen@lemmy.zip ⁨5⁩ ⁨months⁩ ago
Idk why you’re getting downvoted. This right here.

source
- UnderpantsWeevil@lemmy.world ⁨5⁩ ⁨months⁩ ago
  Just another normal day in the Lemmyverse
  
  Image
  
  source
  - MrTolkinghoen@lemmy.zip ⁨5⁩ ⁨months⁩ ago
    Lol well. When I saw this I knew the model would be censored to hell, and then the ccp abliteration training data repo made a lot more sense. That being said, the open source effort to reproduce it is far more appealing.
    
    source
- JasSmith@sh.itjust.works ⁨5⁩ ⁨months⁩ ago
  Because the parent comment by Womble is about using the Chinese hosted DeepSeek app, not hosting the model themselves. The user above who responded either didn’t read the original comment carefully enough, or provided a very snarky response. Neither is particularly endearing.
  
  source
  - MrTolkinghoen@lemmy.zip ⁨5⁩ ⁨months⁩ ago
    But yeah. Anyone who thinks the app / stock model isn’t going to be heavily censored…
    
    Image
    
    Why else would it be free? It’s absolutely state sponsored media. Or it’s a singularity and they’re just trying to get people to run it from within their networks, the former being far more plausible.
    
    source
  - Womble@lemmy.world ⁨5⁩ ⁨months⁩ ago
    No, that was me running the model on my own machine not using deepseek’s hosted one. What they were doing was justifying blatent politcal censorship by saying anyone could spend millions of dollars themselves to follow their method and make your own model.
    
    source