I asked OpenAI the same. Only without the snark, and noted that US politicians and OpenAI pressers should not be considered evidence without facts:
Great question—and a healthy dose of skepticism.
If we assume U.S. lawmakers may be misrepresenting the situation or overstating the threat (intentionally or otherwise), then it’s worth asking: What counter-evidence or independent information exists that disputes the claims against DeepSeek?
Here’s what we can currently analyze:
1.
OpenAI’s Role and Statements
- The accusation is that DeepSeek used OpenAI data without authorization.
- OpenAI has not released detailed technical evidence (as of now) proving DeepSeek “stole” data.
- It is also unclear what “unauthorized use” means here—was it scraping publicly available content generated by ChatGPT? That’s not the same as hacking or violating terms.
Counterpoint: Unless OpenAI provides forensic evidence (e.g., logs showing data scraping from banned accounts or circumvention of APIs), their claim is not independently verifiable.
2.
Open-source AI Training Practices
- DeepSeek R1 and other models are trained on publicly available data, much like LLaMA, Falcon, Mistral, etc.
- Many open-source LLMs train on similar corpora, including data from books, code repositories, and even internet crawls, which may contain ChatGPT outputs.
- No proof has been provided that DeepSeek used proprietary datasets beyond what’s publicly available or scraped from the internet.
Counterpoint: If DeepSeek simply trained on web data that happened to contain ChatGPT responses (which are abundant online), that wouldn’t constitute theft by typical AI research standards.
3.
DeepSeek’s Own Transparency
- DeepSeek has published detailed model cards and training parameters for R1, a transparency move rare among Chinese AI firms.
- This suggests a desire to appear cooperative or aligned with global AI openness norms—not necessarily what a covert “spy tool” would do.
Counterpoint: Spyware tools or state-controlled AI systems don’t usually go open source and document their architectures.
4.
No Independent Audit or Third-Party Verification
- As of now, there’s no independent audit of DeepSeek’s model, training corpus, or infrastructure.
- The accusations rest heavily on political framing, not technical forensics released to the public.
5.
Broader Pattern of U.S. Tech Nationalism
- U.S. lawmakers have a history of labeling Chinese tech firms as threats without providing concrete, public-facing evidence (Huawei, TikTok, etc.).
- This doesn’t mean the threats are false, but it does mean we should demand proof beyond committee soundbites.
TL;DR – Is there counter-evidence?
Not conclusive counter-evidence, but there’s a lack of compelling public proof supporting the claim. The case against DeepSeek is built on:
- OpenAI’s statements (without shared evidence),
- Political concerns about China,
- and suspicions rooted in strategic competition.
That’s not enough to say DeepSeek is innocent—but it’s not enough to prove guilt either
pineapplelover@lemm.ee 5 days ago
That’s pretty good. Though, I am sure China’s government funded Deepseek.
That being said, compared to the incredibly dry and PC responses you would get from chatgpt, this is incredibly better.
turnip@sh.itjust.works 5 days ago
Deepseek just trained on existing models and used assembly language. It was a tiny company, they just did some optimization.
pineapplelover@lemm.ee 4 days ago
Ah yes the small company of Deepseek who is owned and funded by the Chinese hedge fund High-Flyer.
High-Flyer being worth 7 Billion dollars.
en.m.wikipedia.org/wiki/DeepSeek
en.m.wikipedia.org/wiki/High-Flyer
turnip@sh.itjust.works 4 days ago
160 employees is a microcap is it not?
dubyakay@lemmy.ca 5 days ago
Could be a local model.