OpenAI Strawberry Model Strikes Late Night! PhD-Level, Better Than GPT-4o, ChatGPT Ready

Select Language:

In a surprising move, OpenAI unveiled a preview of its much-anticipated “Strawberry” model, named the OpenAI o1, during the early hours of September 13. This new series of AI models is designed to tackle complex reasoning tasks, outperforming prior models in fields such as science, programming, and mathematics.

The OpenAI o1 model stands out with its advanced reasoning capabilities, generating extensive internal thought processes before providing answers. It achieved a ranking of 89th in competitive programming challenges and placed in the top 500 in the U.S. Mathematics Olympiad qualifying rounds. Furthermore, its accuracy in benchmark tests for physical, biological, and chemical questions exceeded that of human PhDs.

OpenAI also introduced the o1 mini, a faster and smaller version of the o1 model, trained using a framework similar to its predecessor. The o1 mini excels in STEM subjects, particularly in mathematics and programming, and is priced 80% lower than the o1 preview version.

These two models signify a major advancement in handling intricate reasoning tasks, prompting OpenAI to reset its model nomenclature with the designation “o1” instead of continuing the GPT series. Despite these advancements, however, the o1 model still struggles with high-level comparison questions.

Andrej Karpathy, a co-founder of OpenAI and former senior director of AI at Tesla, took to social media to express his frustration, stating that the o1 mini was unwilling to tackle the Riemann Hypothesis, dubbing the model “lazy.”

OpenAI has rigorously tested the o1 preview model to ensure its safety for launch. Users of ChatGPT Plus and Team can now access the new models, while Tier 5 developers have also been granted early API access.

OpenAI revealed the core team behind the o1 model, consisting of 21 foundational contributors, including former chief scientist Ilya Sutskever. The management team comprises seven individuals who have played pivotal roles in the project’s development.

In terms of capabilities, the OpenAI o1 model demonstrates a remarkable ability to handle reasoning tasks. It generates long internal thought chains prior to responding, allowing it to refine its reasoning processes. As an early preview model, it currently supports only text-based interactions and lacks multimodal capabilities such as web browsing or file uploads.

Compared to the previous generation model, GPT-4o, the OpenAI o1 model has shown substantial improvements in performance across various benchmark tasks, showcasing capabilities akin to those of expert human evaluators. For instance, in the International Mathematical Olympiad’s qualifying exams, GPT-4o achieved a correct response rate of just 13%, while the o1 model soared to 83%.

The new models have significant implications for educational contexts and competitive programming, reflecting OpenAI’s commitment to developing AI technologies that can think critically and solve complex problems. As OpenAI continues to refine these models, it remains to be seen how they will impact the broader landscape of artificial intelligence development.