Select Language:
A Chinese AI startup has announced a permanent reduction in the API pricing for its flagship V4-Pro model, offering the service at just 25% of its previous cost, making it one of the most cost-efficient options globally.
The API fee for the V4-Pro remains at only 2.5 Chinese cents (approximately 0.36 US cents) per one million cache-hit input tokens. This discount, originally set to expire at the end of this month, was revealed on May 22. The cost for cache-miss input tokens is CNY3 (roughly 44 US cents) per million, and the charge for output tokens is CNY6 per million.
The company introduced its next-generation flagship model, V4, on April 24, featuring substantial improvements in inference speed, long-context processing, and proactive capabilities. The V4-Pro has demonstrated top performance among open-source models on Agentic Coding benchmarks, with output quality nearing that of Claude Opus 4.6 in non-thinking mode, based on internal assessments.
This strategic price reduction contrasts sharply with the broader industry trend of increasing API costs. Major cloud providers like Amazon, Microsoft, and several leading Chinese providers have raised their rates by as much as 463% amid rising compute expenses. Additionally, prices for high-bandwidth memory have surged over sixfold in the past six months, while increased token consumption from AI agents has driven operating costs beyond what cloud services can offset through subsidies.
The price drop was achieved not through external subsidies but through fundamental architectural redesigns that significantly lowered costs. According to analysts, the company’s proprietary sparse attention mechanism and mixture-of-experts architecture enable the V4 series to manage million-token-long contexts at just 27% of the compute cost of its predecessor. Key-value cache memory usage has been reduced to just 10%.
Furthermore, the company has extensively optimized its models for domestic AI chips, including Huawei Technologies’ Ascend series, substantially reducing hardware procurement expenses. Engineering improvements in inference processes have also enabled fixed costs to be spread across a broader user base.
This pricing strategy appears to be a deliberate effort to secure a stronger ecosystem presence. By lowering barriers to API access, the company aims to attract developers and enterprise clients to its platform, creating a self-sustaining cycle of decreasing prices, increasing usage, a thriving application ecosystem, and ongoing cost reductions.



