DeepSeek
Updated: 12/10/2025, 4:53:32 PM Wikipedia source
Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., doing business as DeepSeek, is a Chinese artificial intelligence (AI) company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, Deepseek is owned and funded by the Chinese hedge fund High-Flyer. DeepSeek was founded in July 2023 by Liang Wenfeng, the co-founder of High-Flyer, who also serves as the CEO for both of the companies. The company launched an eponymous chatbot alongside its DeepSeek-R1 model in January 2025. Released under the MIT License, DeepSeek-R1 provides responses comparable to other contemporary large language models, such as OpenAI's GPT-4 and o1. Its training cost was reported to be significantly lower than other LLMs. The company claims that it trained its V3 model for US$6 million—far less than the US$100 million cost for OpenAI's GPT-4 in 2023—and using approximately one-tenth the computing power consumed by Meta's comparable model, Llama 3.1. DeepSeek's success against larger and more established rivals has been described as "upending AI". DeepSeek's models are described as "open weight," meaning the exact parameters are openly shared, although certain usage conditions differ from typical open-source software. The company reportedly recruits AI researchers from top Chinese universities and also hires from outside traditional computer science fields to broaden its models' knowledge and capabilities. DeepSeek significantly reduced training expenses for their R1 model by incorporating techniques such as mixture of experts (MoE) layers. The company also trained its models during ongoing trade restrictions on AI chip exports to China, using weaker AI chips intended for export and employing fewer units overall. Observers say this breakthrough sent "shock waves" through the industry which were described as triggering a "Sputnik moment" for the US in the field of artificial intelligence, particularly due to its open-source, cost-effective, and high-performing AI models. This threatened established AI hardware leaders such as Nvidia; Nvidia's share price dropped sharply, losing US$600 billion in market value, the largest single-company decline in U.S. stock market history.
Infobox
Tables
| Major versions | Release date | Status | Major variants | Remarks |
| DeepSeek Coder | November 2, 2023 | Discontinued | Base (pretrained); Instruct (with instruction-finetuned) | The architecture is essentially the same as Llama. |
| DeepSeek-LLM | November 29, 2023 | Discontinued | Base; Chat (with SFT) | |
| DeepSeek-MoE | January 9, 2024 | Discontinued | Base; Chat | Developed a variant of mixture of experts (MoE). |
| DeepSeek-Math | April 2024 | Discontinued | Base | Initialized with DS-Coder-Base-v1.5 |
| Instruct (with SFT) | ||||
| RL (using a process reward model) | Developed Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO). | |||
| DeepSeek V2 | May 2024 | Discontinued | DeepSeek-V2, DeepSeek-V2-Chat DeepSeek-V2-Lite, DeepSeek-V2-Lite-Chat DeepSeek-Coder-V2 DeepSeek-V2.5 | Developed multi-head latent attention (MLA). Also used mixture of experts (MoE). Implemented KV caching. |
| DeepSeek V3 | December 2024 | Active | DeepSeek-V3-BaseDeepSeek-V3 (a chat model) | The architecture is essentially the same as V2. Updated on 2025-03-24. |
| DeepSeek-Prover-V2 | May 1, 2025 | Active | DeepSeek-Prover-V2-671BDeepSeek-Prover-V2-7B | |
| DeepSeek VL2 | December 13, 2024 | Active | ||
| DeepSeek R1 | November 20, 2024 | Active | DeepSeek-R1-Lite-Preview | Only accessed through API and a chat interface. |
| January 20, 2025 | Active | DeepSeek-R1 DeepSeek-R1-Zero | Initialized from DeepSeek-V3-Base and sharing the V3 architecture. | |
| Distilled models | Initialized from other models, such as Llama, Qwen, etc. Distilled from data synthesized by R1 and R1-Zero. | |||
| May 28, 2025 | Active | DeepSeek-R1-0528 | ||
| DeepSeek V3.1 | August 21, 2025 | Active | DeepSeek-V3.1-BaseDeepSeek-V3.1 (a chat model) | Hybrid architecture (thinking and non-thinking modes available). Trained on over 800B additional tokens on top of V3. |
| September 22, 2025 | Active | DeepSeek-V3.1-Terminus | Reducing instances of mixed Chinese-English text and occasional abnormal characters on top of V3.1. | |
| DeepSeekMath-V2 | November 27, 2025 | Active |
| Params. | # Layers | Model dim. | Intermediate dim. | # Heads | # Kv-heads |
| 1.3B | 24 | 2048 | 5504 | 16 | 16 |
| 5.7B | 32 | 4096 | 11008 | 32 | 1 |
| 6.7B | 32 | 4096 | 11008 | 32 | 32 |
| 33B | 62 | 7168 | 19200 | 56 | 7 |
| Params. | # Layers | Model dim. | Intermediate dim. | # Heads | # Kv-heads |
| 7B | 30 | 4096 | 11008 | 32 | 32 |
| 67B | 95 | 8192 | 22016 | 64 | 8 |
| Name | Params. | Active params | # Layers | Context length | # Shared experts | # Routed experts |
| V2-Lite | 15.7B | 2.4B | 27 | 32K | 2 | 64 |
| V2 | 236B | 21B | 60 | 128K | 2 | 160 |
| Name | Params. | Active params | # Layers | Context length | # Shared experts | # Routed experts |
| V3 | 671B | 37B | 61 | 128K | 1 | 256 |
| Stage | Cost (in one thousand GPU hours) | Cost (in one million US$) |
| Pre-training | 2,664 | 5.328 |
| Context extension | 119 | 0.24 |
| Fine-tuning | 5 | 0.01 |
| Total | 2,788 | 5.576 |
References
- Chinese: 杭州深度求索人工智能基础技术研究有限公司. Sometimes simply referred to in English as Hangzhou DeepSeek Artificial Intelligence.
- Chinese: 深度求索; pinyin: Shēndù Qiúsuǒ
- 宁波程信柔兆企业管理咨询合伙企业(有限合伙) and 宁波程恩企业管理咨询合伙企业(有限合伙)
- The number of heads does not equal the number of KV heads, due to GQA.
- Inexplicably, the model named DeepSeek-Coder-V2 Chat in the paper was released as DeepSeek-Coder-V2-Instruct in HuggingF
- At that time, the R1-Lite-Preview required selecting "Deep Think enabled", and every user could use it only 50 times a d
- "DeepSeek突传消息"https://finance.sina.com.cn/jjxw/2025-02-01/doc-inehyqcx9694053.shtml
- Financial Timeshttps://www.ft.com/content/fb5c11bb-1d4b-465f-8283-451a19a3d425
- Bloomberg L.P.https://www.bloomberg.com/profile/company/2544189D:CH
- DeepSeekhttps://chat.deepseek.com/downloads/DeepSeek%20Coder%20Model%20Service%20Agreement_1019.pdf
- DeepSeekhttps://chat.deepseek.com/downloads/DeepSeek%20Coder%20Privacy%20Policy_1019.pdf
- beian.mps.gov.cnhttps://beian.mps.gov.cn/#/query/webSearch?code=33010502011812
- South China Morning Posthttps://www.scmp.com/tech/policy/article/3295662/beijing-meeting-puts-spotlight-chinas-new-face-ai-deepseek-founder-liang-wenfeng
- Reutershttps://www.reuters.com/technology/deepseek-founder-liang-wenfeng-puts-focus-chinese-innovation-2025-01-28/
- The Economisthttps://www.economist.com/china/2025/02/19/behind-deepseek-lies-a-dazzling-chinese-university
- Naturehttps://www.nature.com/articles/d41586-025-00229-6
- The Guardianhttps://www.theguardian.com/commentisfree/2025/jan/28/deepseek-r1-ai-world-chinese-chatbot-tech-world-western
- The New York Timeshttps://www.nytimes.com/2025/01/23/technology/deepseek-china-ai-chips.html
- Business Insiderhttps://www.businessinsider.com/explaining-deepseek-chinese-models-efficiency-scaring-markets-2025-1
- The New York Timeshttps://www.nytimes.com/2025/01/27/technology/what-is-deepseek-china-ai.html
- The New York Timeshttps://www.nytimes.com/2025/01/28/technology/why-deepseek-could-change-what-silicon-valley-believes-about-ai.html
- Popular Mechanicshttps://www.popularmechanics.com/science/a63633889/deepseek-open-weight/
- The New York Timeshttps://www.nytimes.com/2025/02/12/technology/deepseek-ai-chip-costs.html
- Center for Strategic and International Studieshttps://www.csis.org/analysis/deepseek-huawei-export-controls-and-future-us-china-ai-race
- The Guardianhttps://www.theguardian.com/technology/2025/jan/28/who-is-behind-deepseek-and-how-did-it-achieve-its-ai-sputnik-moment
- The New Yorkerhttps://www.newyorker.com/news/the-financial-page/is-deepseek-chinas-sputnik-moment
- NPRhttps://www.npr.org/2025/01/28/g-s1-45061/deepseek-did-a-little-known-chinese-startup-cause-a-sputnik-moment-for-ai
- Liberation News – The Newspaper of the Party for Socialism and Liberationhttps://liberationnews.org/deepseek-sends-shock-waves-across-silicon-valley/
- Sky Newshttps://news.sky.com/story/deepseek-us-tech-stocks-tumble-on-fears-of-cheaper-chinese-ai-13297788
- MIT Technology Reviewhttps://www.technologyreview.com/2025/01/24/1110526/china-deepseek-top-ai-despite-sanctions/
- High-Flyerhttps://www.high-flyer.cn/history/
- ChinaTalkhttps://www.chinatalk.media/p/deepseek-from-hedge-fund-to-frontier
- Financial Timeshttps://www.ft.com/content/747a7b11-dcba-4aa5-8d25-403f56216d7e
- CNBChttps://www.cnbc.com/2023/02/23/nvidias-a100-is-the-10000-chip-powering-the-race-for-ai-.html
- High-Flyerhttps://www.high-flyer.cn/blog/hf-reduce/
- DeepSeek-V3 Technical Reporthttps://arxiv.org/abs/2412.19437
- SC24: International Conference for High Performance Computing, Networking, Storage and Analysishttps://arxiv.org/abs/2408.14158
- Yicaihttps://www.yicai.com/news/101732215.html
- Yicai Globalhttps://www.yicaiglobal.com/news/exclusive-chinese-quant-fund-high-flyer-will-not-use-agi-to-trade-stocks-managing-director-says
- South China Morning Posthttps://www.scmp.com/tech/tech-trends/article/3293050/meet-deepseek-chinese-start-changing-how-ai-models-are-trained
- Financial Timeshttps://www.ft.com/content/357f3c68-b866-4c2e-b678-0d075051a260
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermismhttps://arxiv.org/abs/2401.02954
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Modelshttps://arxiv.org/abs/2401.06066
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Modelshttps://arxiv.org/abs/2402.03300
- DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligencehttps://arxiv.org/abs/2406.11931
- Hugging Facehttps://huggingface.co/deepseek-ai/DeepSeek-V2.5
- DeepSeekhttps://chat.deepseek.com/sign_in
- DeepSeek API Docshttps://web.archive.org/web/20241120141324/https://api-docs.deepseek.com/news/news1120
- CNBChttps://www.cnbc.com/2025/01/27/chinas-deepseek-ai-tops-chatgpt-app-store-what-you-should-know.html
- CBS Newshttps://www.cbsnews.com/news/what-is-deepseek-ai-china-stock-nvidia-nvda-asml/
- VentureBeathttps://venturebeat.com/ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/
- Hugging Facehttps://huggingface.co/deepseek-ai/DeepSeek-V3-0324
- huggingface.cohttps://huggingface.co/deepseek-ai/DeepSeek-R1-0528
- China Media Projecthttps://chinamediaproject.org/2025/06/12/chinas-global-ai-firewall/
- huggingface.cohttps://huggingface.co/deepseek-ai/DeepSeek-V3.1
- api-docs.deepseek.comhttps://api-docs.deepseek.com/news/news250821
- huggingface.cohttps://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attentionhttps://arxiv.org/abs/2502.11089
- huggingface.cohttps://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp
- www.cls.cnhttps://www.cls.cn/detail/1672635
- ChinaTalkhttps://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas
- The New York Timeshttps://www.nytimes.com/2025/04/16/technology/nvidia-deepseek-china-ai-trump.html
- Machine Decision is Not Final: China and the History and Future of Artificial Intelligence
- Rai, Saritha, Loni Prinsloo, and Helen Nyambura "China's DeepSeek Is Beating Out OpenAI and Google in Africa" Bloomberghttps://www.bloomberg.com/news/features/2025-10-22/china-s-deepseek-pushes-into-africa-making-ai-accessible-to-millions?embedded-checkout=true
- High-Flyerhttps://www.high-flyer.cn/blog/3fs/
- deepseek-ai/3FShttps://github.com/deepseek-ai/3FS
- High-Flyerhttps://github.com/HFAiLab/hai-platform
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learninghttps://arxiv.org/abs/2501.12948
- GitHubhttps://github.com/deepseek-ai/DeepSeek-Coder/blob/main/LICENSE-MODEL
- DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligencehttps://arxiv.org/abs/2401.14196
- deepseekcoder.github.iohttps://deepseekcoder.github.io/
- deepseek-ai/DeepSeek-Coderhttps://github.com/deepseek-ai/deepseek-coder/
- Hugging Facehttps://huggingface.co/deepseek-ai/deepseek-coder-5.7bmqa-base
- deepseek-ai/DeepSeek-LLMhttps://github.com/deepseek-ai/DeepSeek-LLM
- Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotationshttps://arxiv.org/abs/2312.08935
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Modelhttps://arxiv.org/abs/2405.04434
- YaRN: Efficient Context Window Extension of Large Language Modelshttps://arxiv.org/abs/2309.00071
- Hugging Facehttps://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/config.json
- Hugging Facehttps://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/config.json
- South China Morning Posthttps://www.scmp.com/tech/big-tech/article/3303798/deepseeks-upgraded-foundational-model-excels-coding-and-maths
- Hugging Facehttps://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/config.json
- SemiAnalysishttps://semianalysis.com/2025/01/31/deepseek-debates/
- TechSpothttps://www.techspot.com/news/106612-deepseek-ai-costs-far-exceed-55-million-claim.html
- Yahoo Newshttps://www.yahoo.com/news/research-exposes-deepseek-ai-training-165025904.html
- TheRecursive.comhttps://therecursive.com/martin-vechev-of-insait-deepseek-6m-cost-of-training-is-misleading/
- South China Morning Posthttps://www.scmp.com/tech/tech-trends/article/3292507/chinese-start-deepseek-launches-ai-model-outperforms-meta-openai-products
- VentureBeathttps://venturebeat.com/ai/deepseek-v3-ultra-large-open-source-ai-outperforms-llama-and-qwen-on-launch/
- TechCrunchhttps://techcrunch.com/2024/12/26/deepseeks-new-ai-model-appears-to-be-one-of-the-best-open-challengers-yet/
- Ars Technicahttps://arstechnica.com/ai/2025/01/china-is-catching-up-with-americas-best-reasoning-ai-models/
- VentureBeathttps://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/
- The Wall Street Journalhttps://www.wsj.com/tech/ai/china-ai-advances-us-chips-7838fd20
- GitHubhttps://github.com/deepseek-ai/DeepSeek-R1/commit/23807ced51627276434655dd9f27725354818974
- Reutershttps://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/
- Bloomberghttps://www.bloomberg.com/news/articles/2025-05-29/deepseek-says-upgraded-model-reasons-better-hallucinates-less
- Reutershttps://www.reuters.com/world/china/deepseek-r2-launch-stalled-ceo-balks-progress-information-reports-2025-06-26/
- Financial Timeshttps://www.ft.com/content/eb984646-6320-4bfe-a78d-a1da2274b092
- Reutershttps://www.reuters.com/world/china/china-cautions-tech-firms-over-nvidia-h20-ai-chip-purchases-sources-say-2025-08-12/
- Naturehttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC12443585
- The New York Timeshttps://www.nytimes.com/2025/01/28/technology/why-deepseek-could-change-what-silicon-valley-believes-about-ai.html
- UC Institute on Global Conflict and Cooperation (IGCC)https://ucigcc.org/interview/beyond-the-headlines-on-deepseeks-sputnik-moment-a-conversation-with-jimmy-goodrich/
- LCFI - Leverhulme Centre for the Future of Intelligencehttps://www.lcfi.ac.uk/news-events/blog/post/is-sputnik-moment-an-appropriate-analogy-for-the-launch-of-deepseek
- Forbeshttps://www.forbes.com/sites/maryroeloffs/2025/01/27/what-is-deepseek-new-chinese-ai-startup-rivals-openai-and-claims-its-far-cheaper/
- arXivhttps://arxiv.org/abs/2412.19437
- TIMEhttps://time.com/7211646/is-deepseek-panic-overblown/