Chinese startup unveils an “Open” AI model with record-breaking capabilities
Chinese AI firm DeepSeek has launched what could be one of the most powerful openly available AI models to date. The model, DeepSeek V3, was released under a permissive license, enabling developers to download, modify, and use it for various applications—including commercial ones.
A new benchmark in AI performance
DeepSeek V3 excels across a range of text-based tasks, such as coding, translating, and content generation. According to DeepSeek’s internal benchmarks, the model outperforms leading AI models, including Meta’s Llama 3.1, OpenAI’s GPT-4o, and Alibaba’s Qwen 2.5.
Notably, in coding competitions on Codeforces, a prominent platform for programming contests, DeepSeek V3 achieved superior results compared to its competitors. It also dominated the Aider Polyglot test, which evaluates a model's ability to generate new code and integrate it seamlessly into existing frameworks.
Massive scale and efficient development
DeepSeek V3 boasts 671 billion parameters (685 billion as hosted on Hugging Face), making it one of the largest AI models ever created. To put this into perspective, it’s approximately 1.6 times the size of Llama 3.1’s 405 billion parameters. The model was trained on a colossal 14.8 trillion tokens, equating to roughly 11 trillion words—an immense dataset that bolsters its versatility and accuracy.
Despite its scale, DeepSeek achieved a remarkably efficient development process, using Nvidia H800 GPUs and completing training in just two months at a cost of $5.5 million—a fraction of the budget required for models like OpenAI’s GPT-4. This is particularly notable given U.S. restrictions on high-end GPU exports to China.
Strengths and limitations
While DeepSeek V3 sets new standards in technical performance, it has limitations tied to its regulatory environment. As a Chinese-developed model, it adheres to China’s internet regulations, ensuring responses align with “core socialist values.” For instance, it avoids controversial topics such as the Tiananmen Square protests or critiques of the Chinese government.
This regulatory compliance ensures its accessibility within China but may pose challenges for developers seeking models with unrestricted outputs.
A milestone in “Open” AI
DeepSeek V3 represents a significant achievement in the global AI race, pushing the boundaries of what “open” models can achieve. Its permissive licensing and technical prowess make it an attractive choice for developers, though its operational requirements—such as the need for high-end GPUs—may limit its practicality for some users.