DeepSeek, a Chinese AI startup, has made waves in the technology sector both domestically and internationally, marking a significant breakthrough in the industry. Founded by Liang Wenfeng and composed of a small team of 139 engineers and researchers, DeepSeek is capturing attention with its innovative approach and impressive technical achievements. In comparison, larger companies like OpenAI and Anthropic boast teams of 1,200 and over 500 researchers, respectively.
Despite its smaller size, DeepSeek stands out by emphasizing qualities such as “no external funding,” “founder as a card-carrying tycoon,” and “team members all graduates from top Chinese universities like Tsinghua and Peking University.” The company has emerged as a notable player in the AI startup scene.
In December 2024, DeepSeek unveiled its latest open-source model, V3, which has shown exceptional performance in evaluations. It has not only surpassed top open-source models like Alibaba’s Qwen2.5-72B and Meta’s Llama 3.1-405B, but it also competes favorably with leading closed-source models such as GPT-4o and Claude 3.5-Sonnet.
One striking aspect of the DeepSeek V3 model is its complete open-source release post-launch and its significantly lower training costs compared to its competitors. According to data from SemiAnalysis, while training the OpenAI GPT-4 cost around $63 million, the cost for DeepSeek V3 is less than one-tenth of that amount.
Moreover, DeepSeek completed the training of its V3 model on just 2,000 NVIDIA H800 GPUs, a stark contrast to the hundreds of thousands of GPUs typically used by larger Silicon Valley firms. This milestone breaks down barriers in the development of domestic large-scale models, showcasing that high-quality data and superior algorithms can yield high-performance models even with limited computational resources.
Andrej Karpathy, a founding member of OpenAI, expressed admiration for DeepSeek’s performance, stating that the V3 model outperforms Llama3’s strongest variant while using only a fraction of the resources. He remarked, “In the future, we might not need massive GPU clusters anymore.”
Meta scientist Tian Yuandong was equally impressed, highlighting the model’s capabilities, saying, “FP8 pre-training, MoE, powerful performance on a very limited budget, extracting guidance from CoT… wow! This is great work!”
Liang Wenfeng, the founder of DeepSeek, is not surprised by the accolades, attributing them to the company’s role as an innovative contributor in the global AI competition. He emphasizes that China must gradually become a contributor to technology rather than solely relying on Western innovation.
He also highlighted the importance of establishing a robust technological ecosystem for AI development in China, similar to how Western technology communities have fostered advancements like Moore’s Law and Scaling Law. Liang noted that many domestic chip developments struggle due to a lack of supporting technological communities and access to cutting-edge technology.
DeepSeek is part of a larger company, Huanfang Quantitative, which originates from a quantitative fund and shares similar hiring practices that value local talent. Before officially launching its AI products, DeepSeek invested significant time into internal development and hiring liberal arts graduates to provide essential knowledge. This unique journey positions DeepSeek as a standout in China’s AI innovation landscape, reflecting its commitment to pushing the boundaries of technology.