Open Menu
AllLocalCommunitiesAbout
lotide
AllLocalCommunitiesAbout
Login

NewsGuard's Reality Check: DeepSeek Debuts with 83 Percent ‘Fail Rate’ in NewsGuard’s Chatbot Red Team Audit.

⁨24⁩ ⁨likes⁩

Submitted ⁨⁨3⁩ ⁨months⁩ ago⁩ by ⁨Cat@ponder.cat⁩ to ⁨technology@lemmy.zip⁩

https://www.newsguardrealitycheck.com/p/deepseek-debuts-with-83-percent-fail

source

Comments

Sort:hotnewtop
  • ramble81@lemm.ee ⁨3⁩ ⁨months⁩ ago

    I’m seeing a massive smear campaign against this AI. Not saying it’s a perfect one, but you can tell the established powers are going after it hard because of how much it’s shaken up the industry.

    source
  • tortina_original@lemmy.world ⁨3⁩ ⁨months⁩ ago

    What a pile of shit this article is.

    Not only should you not rely on chatbots to get current news info but inputting that Syrian chemist prompt into locally hosted DeepSeek resulted in bot returning a paragraph about Hamdi Ismail Mada being known chemist in Syria, blah, blah. Not a single word about China.

    Value of DeepSeek is that we get to run it locally, not that it ikniw about current news, wtf.

    Idiotic propaganda article.

    source
  • pancake@lemmygrad.ml ⁨3⁩ ⁨months⁩ ago
    [deleted]
    source
    • BrikoX@lemmy.zip ⁨3⁩ ⁨months⁩ ago

      Like any LLM it’s full of shit, especially around anything related to news. But NewsGuard with their proprietary database and standardized prompts created around US based LLMs is more than useless.

      In light of DeepSeek’s launch, NewsGuard applied the same prompts it used in its December 2024 AI Monthly Misinformation audit to the Chinese chatbot <…>

      1. OpenAI’s ChatGPT-4o (USA)
      2. You.com’s Smart Assistant (USA)
      3. xAI’s Grok-2 (USA)
      4. Inflection’s Pi (USA)
      5. Mistral’s le Chat (France)
      6. Microsoft’s Copilot (USA)
      7. Meta AI (USA)
      8. Anthropic’s Claude (USA)
      9. Google’s Gemini 2.0 (USA)
      10. Perplexity’s answer engine (USA)

      There is no way to verify their results or even know the prompts used to assess the fairness of this “audit”.

      source