o1 AI: STEM Powerhouse or AGI Precursor? OpenAI's Game-Changing Model

Scott Wilkinson

OpenAI’s new o1 model is being hailed as a leap forward in artificial intelligence, particularly in its ability to tackle complex tasks in fields such as science, coding, and mathematics. Media outlets including CNET and The Verge have confirmed that the model outperforms its predecessors on several benchmarks, with performance reportedly comparable to PhD-level expertise in various STEM disciplines.

Enhanced Reasoning Capabilities

The o1 model distinguishes itself by its novel approach to reasoning. Unlike previous models, it is designed to 'think' before responding, employing reinforcement learning to address problems independently. This training method enables the AI to approach multi-step challenges holistically, learning from mistakes and planning solutions over time. Unlike earlier models that required specific "chain-of-thought" prompting, o1 shows its reasoning by default, making its problem-solving process more transparent and human-like.

Impressive Benchmark Results

The o1 model has posted striking results in several key tests. It achieved an 83% accuracy rate on a qualifying exam for the International Mathematics Olympiad, far outpacing GPT-4o's 13% score. In competitive programming, the model performed in the 89th percentile in Codeforces contests. Its capability extends across STEM subjects, where it matches the problem-solving ability of PhD students in fields such as physics, chemistry, and biology. These benchmarks underscore the model’s potential to revolutionise AI applications in technical research and other complex domains.

Strengths in STEM and Fact-Checking

The model’s strengths lie primarily in STEM-related problems. It has demonstrated an advanced ability to handle tasks in mathematics, coding, and the physical sciences. A key feature of the o1 model is its capacity to fact-check itself. By spending more time analysing queries, it produces more accurate and reliable responses. This ability to self-verify, combined with its reasoning skills, allows the model to handle multi-step problems with unprecedented sophistication. Its default display of reasoning, without needing prompts, further sets it apart from its predecessors.

Limitations and Availability

Despite its strengths, the o1 model has some notable limitations. It lacks capabilities such as web browsing, file uploads, and image processing that are available in the GPT-4o model. The new model can also be slower, sometimes taking over 10 seconds to generate responses to complex queries. Additionally, it is more costly to operate, with input and output costs significantly higher than GPT-4o. The o1-preview and o1-mini versions are currently available to ChatGPT Plus and Team users, with Enterprise and Edu access expected next week. Developers using API tier 5 can also begin experimenting with the models, though they are restricted to a rate limit of 20 requests per minute during the beta phase.

Early Reviews and Impressions

Early feedback has noted a distinct improvement in o1’s performance compared to earlier models. On platforms like Reddit, users have praised the model’s ability to solve intricate puzzles and story problems on the first attempt, outperforming GPT-4 and 4o. Reviewers have also highlighted o1’s increased confidence and clarity in its responses, along with its tendency to ask for clarification before addressing complex problems—a feature seen as a positive shift in AI behaviour. However, some have pointed out that the model’s propensity for deep thinking can sometimes slow it down, particularly when handling simpler queries. Despite these minor setbacks, early impressions suggest o1 marks a significant step forward in AI reasoning, especially for tasks requiring complex logic and problem-solving.

Debating Progress Towards AGI

The release of the o1 model has reignited debate over whether OpenAI is moving closer to artificial general intelligence (AGI). Sam Altman, OpenAI’s CEO, has positioned the model as a major development in this quest, suggesting it is nearing the second level in the company’s five-stage AGI classification system. Experts remain divided on the issue. Ethan Mollick from the University of Pennsylvania’s Wharton School has pointed to the model’s ability to solve intricate puzzles as a sign of progress in AI reasoning. On the other hand, Gary Marcus, a cognitive scientist at New York University, has praised o1 as an impressive feat of engineering but maintains that it remains far from true AGI. This debate underscores the ongoing challenges in defining and achieving AGI, with o1 representing a clear step forward in AI capabilities, but still falling short of human-level intelligence.

‍

Back To All Insights

Everything You Need To Know About ChatGPT's New Model o

Enhanced Reasoning Capabilities

Impressive Benchmark Results

Strengths in STEM and Fact-Checking

Limitations and Availability

Early Reviews and Impressions

Debating Progress Towards AGI

Start your Ai journey today

Have a question?

We send really great emails.