deepseek-ai / DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing deepseek-ai/DeepSeek-V2 in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewModel Download | Evaluation Results | Model Architecture | API Platform | License | Citation Paper Link 👁️ DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model • Introduction Today, we’re introducing DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrained DeepSeek-V2 on a diverse and high-quality corpus comprising 8.1 trillion tokens. This comprehensive pretraining was followed by a process of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unleash the model's capabilities. The evaluation results validate the effectiveness of our approach as DeepSeek-V2 achieves remarkable performance on both standard benchmarks and open-ended generation evaluation. • News • 2024.05.16: We released the DeepSeek-V2-Lite. • 2024.05.06: We released the DeepSeek-V2. • Model Downloads | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** | | :------------: | :------------: | :------------: | :------------: | :------------: | | DeepSeek-V2-Lite | 16B | 2.4B | 32k | 🤗 HuggingFace | | DeepSeek-V2-Lite-Chat (SFT) | 16B | 2.4B | 32k | 🤗 HuggingFace | | DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace | Due to the constraints of HuggingFace, the open-source code currently experiences slower performance than our internal codebase when running on GPUs with Huggingface. To facilitate the efficient execution of our model, we offer a dedicated vllm solution that optimizes performance for running our model effectively. • Evaluation Results Base Model Standard Benchmark (Models larger than 67B) | **Benchmark** | **Domain** | **LLaMA3 70B** | **Mixtral 8x22B** | **DeepSeek-V1 (Dense-67B)** | **DeepSeek-V2 (MoE-236B)** | |:-----------:|:--------:|:------------:|:---------------:|:-------------------------:|:------------------------:| | **MMLU** | English | 78.9 | 77.6 | 71.3 | 78.5 | | **BBH** | English | 81.0 | 78.9 | 68.7 | 78.9 | | **C-Eval** | Chinese | 67.5 | 58.6 | 66.1 | 81.7 | | **CMMLU** | Chinese | 69.3 | 60.0 | 70.8 | 84.0 | | **HumanEval** | Code | 48.2 | 53.1 | 45.1 | 48.8 | | **MBPP** | Code | 68.6 | 64.2 | 57.4 | 66.6 | | **GSM8K** | Math | 83.0 | 80.3 | 63.4 | 79.2 | | **Math** | Math | 42.2 | 42.5 | 18.7 | 43.6 | Standard Benchmark (Models smaller than 16B) | **Benchmark** | **Domain** | **DeepSeek 7B (Dense)** | **DeepSeekMoE 16B** | **DeepSeek-V2-Lite (MoE-16B)** | |:-------------:|:----------:|:--------------:|:-----------------:|:--------------------------:| | **Architecture** | - | MHA+Dense | MHA+MoE | MLA+MoE | | **MMLU** | English | 48.2 | 45.0 | 58.3 | | **BBH** | English | 39.5 | 38.9 | 44.1 | | **C-Eval** | Chinese | 45.0 | 40.6 | 60.3 | | **CMMLU** | Chinese | 47.2 | 42.5 | 64.3 | | **HumanEval** | Code | 26.2 | 26.8 | 29.9 | | **MBPP** | Code | 39.0 | 39.2 | 43.2 | | **GSM8K** | Math | 17.4 | 18.8 | 41.1 | | **Math** | Math | 3.3 | 4.3 | 17.1 | For more evaluation details, such as few-shot settings and prompts, please check our paper. Context Window Evaluation results on the Needle In A Haystack (NIAH) tests. DeepSeek-V2 performs well across all context window lengths up to **128K**. Chat Model Standard Benchmark (Models larger than 67B) | Benchmark | Domain | QWen1.5 72B Chat | Mixtral 8x22B | LLaMA3 70B Instruct | DeepSeek-V1 Chat (SFT) | DeepSeek-V2 Chat (SFT) | DeepSeek-V2 Chat (RL) | |:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|:-------------:|:-----------------------:|:----------------------:| | **MMLU** | English | 76.2 | 77.8 | 80.3 | 71.1 | 78.4 | 77.8 | | **BBH** | English | 65.9 | 78.4 | 80.1 | 71.7 | 81.3 | 79.7 | | **C-Eval** | Chinese | 82.2 | 60.0 | 67.9 | 65.2 | 80.9 | 78.0 | | **CMMLU** | Chinese | 82.9 | 61.0 | 70.7 | 67.8 | 82.4 | 81.6 | | **HumanEval** | Code | 68.9 | 75.0 | 76.2 | 73.8 | 76.8 | 81.1 | | **MBPP** | Code | 52.2 | 64.4 | 69.8 | 61.4 | 70.4 | 72.0 | | **LiveCodeBench (0901-0401)** | Code | 18.8 | 25.0 | 30.5 | 18.3 | 28.7 | 32.5 | | **GSM8K** | Math | 81.9 | 87.9 | 93.2 | 84.1 | 90.8 | 92.2 | | **Math** | Math | 40.6 | 49.8 | 48.5 | 32.6 | 52.7 | 53.9 | Standard Benchmark (Models smaller than 16B) | Benchmark | Domain | DeepSeek 7B Chat (SFT) | DeepSeekMoE 16B Chat (SFT) | DeepSeek-V2-Lite 16B Chat (SFT) | |:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:| | **MMLU** | English | 49.7 | 47.2 | 55.7 | | **BBH** | English | 43.1 | 42.2 | 48.1 | | **C-Eval** | Chinese | 44.7 | 40.0 | 60.1 | | **CMMLU** | Chinese | 51.2 | 49.3 | 62.5 | | **HumanEval** | Code | 45.1 | 45.7 | 57.3 | | **MBPP** | Code | 39.0 | 46.2 | 45.8 | | **GSM8K** | Math | 62.6 | 62.2 | 72.0 | | **Math** | Math | 14.7 | 15.2 | 27.9 | English Open Ended Generation Evaluation We evaluate our model on AlpacaEval 2.0 and MTBench, showing the competitive performance of DeepSeek-V2-Chat-RL on English conversation generation. Chinese Open Ended Generation Evaluation **Alignbench** (https://arxiv.org/abs/2311.18743) | **模型** | **开源/闭源** | **总分** | **中文推理** | **中文语言** | | :---: | :---: | :---: | :---: | :---: | | gpt-4-1106-preview | 闭源 | 8.01 | 7.73 | 8.29 | | DeepSeek-V2 Chat (RL) | 开源 | 7.91 | 7.45 | 8.36 | | erniebot-4.0-202404 (文心一言) | 闭源 | 7.89 | 7.61 | 8.17 | | DeepSeek-V2 Chat (SFT) | 开源 | 7.74 | 7.30 | 8.17 | | gpt-4-0613 | 闭源 | 7.53 | 7.47 | 7.59 | | erniebot-4.0-202312 (文心一言) | 闭源 | 7.36 | 6.84 | 7.88 | | moonshot-v1-32k-202404 (月之暗面) | 闭源 | 7.22 | 6.42 | 8.02 | | Qwen1.5-72B-Chat (通义千问) | 开源 | 7.19 | 6.45 | 7.93 | | DeepSeek-67B-Chat | 开源 | 6.43 | 5.75 | 7.11…