• About Us
  • Contact Us
  • Advertise
  • Privacy Policy
  • Guest Post
No Result
View All Result
Digital Phablet
  • Home
  • NewsLatest
  • Technology
    • Education Tech
    • Home Tech
    • Office Tech
    • Fintech
    • Digital Marketing
  • Social Media
  • Gaming
  • Smartphones
  • AI
  • Reviews
  • Interesting
  • How To
  • Home
  • NewsLatest
  • Technology
    • Education Tech
    • Home Tech
    • Office Tech
    • Fintech
    • Digital Marketing
  • Social Media
  • Gaming
  • Smartphones
  • AI
  • Reviews
  • Interesting
  • How To
No Result
View All Result
Digital Phablet
No Result
View All Result

Home » Testing 7 AI Models’ True Performance with 2025 Beijing Exam

Testing 7 AI Models’ True Performance with 2025 Beijing Exam

Seok Chen by Seok Chen
July 5, 2025
in AI
Reading Time: 3 mins read
A A
5776B5F3DFDDC7E02AB10E6947236814B3BE7825 size117 w600 h259.jpg
ADVERTISEMENT

Select Language:

The 2025 Beijing High School Entrance Examination has concluded successfully, with over 110,000 students completing the test. This marks the first implementation of a new reform in the examination process, reducing the testing duration from three days to just two.

ADVERTISEMENT

The most significant changes in this year’s exam are a reduction in the total score from 670 to 510 and the introduction of an open-book format for the Morality and Law section. This score adjustment implies that each point holds greater value, intensifying competition among high-scoring students. Each subject’s questions will now focus more on assessing students’ core competencies and essential skills.

For example, in mathematics, the proportion of simpler questions has been reduced, and the difficulty and innovation of question types have increased. The Chinese language section emphasizes students’ foundational language skills and comprehension, encouraging them to think critically about how to use language effectively in problem-solving scenarios.

Feedback from students assessing the difficulty of the exam can be summed up in three words: “It was tough.”

ADVERTISEMENT

Take this year’s Chinese essay prompt, for instance. Students had to choose between two topics: one focused on health and science—”Living a Healthier Life”—and the other on scientific literacy and practical life—”A Science Class.” While the topics may seem straightforward, crafting a standout essay proved challenging, with some students commenting, “I understand the topic, but writing about it was too difficult!”

This raises an intriguing question: if the current mainstream AI models were subject to the same entrance exam, what would their performance look like? Would they measure up to the so-called top students?

To explore this, seven prominent AI models were tested on selected subjects from the 2025 Beijing Entrance Exam, providing insight into their capabilities. The subjects included second essay prompts in both Chinese and English, along with the full mathematics exam.

The competitors in this test were DeepSeek, ByteDance’s Doubao, iFlyTek’s Spark, Tongyi Qianwen, Tencent’s Hunyuan, Wenxin Yiyuan, and GPT. These models were chosen for their widespread use and relevance.

To ensure fairness, all models were disconnected from the internet and configured for deep reasoning. The methodology for scoring the essays involved inviting expert educators and examiners to evaluate the outputs. Separate panels graded the Chinese and English essays, and the average scores were used for final assessments.

The mathematics component employed two evaluation formats: image scanning and LaTeX. Scores were determined based on uniform standards, separating objective questions from subjectively graded ones. For instance, multiple-choice and fill-in-the-blank questions only considered the final answers, while more complex problems were graded based on step-by-step solutions.

ADVERTISEMENT

Examining the results reveals notable trends in performance across all tested models:

Mathematics:
The analysis of the mathematics scores demonstrated that iFlyTek’s Spark, Doubao, and GPT ranked the highest, scoring over 85 points. In contrast, Tongyi Qianwen, Wenxin Yiyuan, and DeepSeek ranked lower, with scores of 73, 68, and 63, respectively. DeepSeek faced significant challenges due to image recognition issues, which directly impacted its performance.

Chinese Writing:
In terms of essay writing, all AI models scored between 81-94% on a standard scale, with average scores around 86. While all models showed significant writing capability, the nuances in detail and emotional expression highlighted some differences. iFlyTek’s Spark stood out for its ability to present profound themes smoothly and coherently.

English Writing:
The English essays displayed a larger range in scores, from 7 to a perfect 10. iFlyTek’s Spark achieved the highest score, demonstrating strong thematic coverage and detail. In contrast, GPT fell short of expectations, scoring only 7.5 points despite covering all main points.

Overall, these tests showcase that AI models have advanced significantly beyond being mere text generators; they are now capable of producing thoughtful, reasoned responses. The results challenge students to transition from rote memorization and mechanical practices to more integrated, thoughtful approaches to learning.

This examination serves as an invitation to rethink educational engagement amid rapid technological changes. The potential collaboration between humans and AI promises to create a new chapter in learning, pushing the boundaries of creativity and understanding.

ChatGPT Add us on ChatGPT Perplexity AI Add us on Perplexity
ADVERTISEMENT
Seok Chen

Seok Chen

Seok Chen is a mass communication graduate from the City University of Hong Kong.

Related Posts

Top 20 Countries by Life Expectancy 1925 vs 2025
Infotainment

Top 20 Countries with Highest Life Expectancy in 1925 and 2025

September 2, 2025
How to Participate in GitHub Community Discussions
How To

How to Participate in GitHub Community Discussions

September 2, 2025
What Is the Snake vs. Monkey Minigame in Metal Gear Solid Delta: Completing and Solving?
Gaming

What Is the Snake vs. Monkey Minigame in Metal Gear Solid Delta: Completing and Solving?

September 2, 2025
Top 20 Countries Debt to China
Infotainment

Top 20 Countries with the Highest Debt to China

September 2, 2025
Next Post
Countries Home to the Most Millionaires in 2025  

1.  United States – 23,831K
2

Top Countries Housing the Most Millionaires in 2025

  • About Us
  • Contact Us
  • Advertise
  • Privacy Policy
  • Guest Post

© 2025 Digital Phablet

No Result
View All Result
  • Home
  • News
  • Technology
    • Education Tech
    • Home Tech
    • Office Tech
    • Fintech
    • Digital Marketing
  • Social Media
  • Gaming
  • Smartphones

© 2025 Digital Phablet