DeepSeek has released preview versions of DeepSeek-V4-Pro and DeepSeek-V4-Flash, its most capable models to date, arriving within hours of OpenAI’s GPT-5.5 launch in what the artificial intelligence community has interpreted as a pointed statement of timing from a Chinese lab that has spent three years operating under United States chip export restrictions. V4-Pro carries 1.6 trillion total parameters but activates only 49 billion per inference pass through the Mixture-of-Experts architecture DeepSeek has refined since V3, making it the largest open-weight model currently available, while V4-Flash carries 284 billion total parameters with 13 billion active, designed for speed and cost efficiency. Both models support one million token context windows as a standard feature and are available under a Massachusetts Institute of Technology licence on Hugging Face, free for anyone capable of running them locally.
The pricing gap with Western competitors is striking: V4-Pro costs USD 1.74 per million input tokens and USD 3.48 per million output tokens, compared to GPT-5.5 Pro at USD 30 input and USD 180 output per million tokens — a 98 percent cost difference on output. V4-Flash goes further at USD 0.14 input and USD 0.28 output, undercutting every comparable budget model from major frontier labs. Cline Chief Executive Officer Saoud Rizwan noted that if Uber had used DeepSeek instead of Claude, its 2026 artificial intelligence budget reportedly sized for four months of usage would have stretched to seven years. DeepSeek trained V4 partly on Huawei Ascend chips, directly circumventing United States export restrictions on Nvidia graphics processing units, and has indicated that once 950 new Huawei Ascend 950 supernodes come online later in 2026, the already-low pricing on V4-Pro will fall further.
The efficiency gains behind this pricing are architectural. DeepSeek developed two new attention mechanisms: Compressed Sparse Attention, which compresses groups of tokens then selects only the most relevant entries using a Lightning Indexer; and Heavily Compressed Attention, which collapses every 128 tokens into a single entry for an extremely cheap global view of long contexts. The result is that at one million tokens, V4-Pro uses only 27 percent of the compute its predecessor V3.2 required, while key-value cache memory drops to just 10 percent of V3.2. On benchmarks, V4-Pro-Max scored 90.2 percent on Apex Shortlist against Claude Opus 4.6’s 85.9 percent, matched Claude Opus 4.6 on SWE-Verified at 80.6 percent for resolving real GitHub issues, and ranked first among all open-weight models on GDPval-AA, an agentic real-world work benchmark covering finance, legal, and research tasks, scoring 1,554 Elo against Claude Opus 4.6’s 1,619. The models are text-only for now, with multimodal capabilities still in development, and the existing DeepSeek API endpoints will be retired on July 24, 2026.
Follow the SPIN IDG WhatsApp Channel for updates across the Smart Pakistan Insights Network covering all of Pakistan’s technology ecosystem.