Select Language:
DeepMind’s Gemini 3 represents a remarkable leap forward in AI development, a feat achieved through extensive teamwork and a multitude of innovative improvements. At its core, this model utilizes a Transformer-based hybrid expert architecture, distinguished by its ability to separate computational utilization from parameter scale. This design approach underscores a growing understanding within the industry that while increasing a model’s size enriches its performance during pretraining, architecture and data innovations are now taking precedence in driving progress.
As the field shifts from the era of “limitless data” to one emphasizing “finite data,” cautious use of synthetic data has become essential. Improvements in model architecture are helping to achieve better results with less data, highlighting the ongoing challenge of effective evaluation—a crucial yet complex aspect of training large models.
Sebastian Borgeaud, pretraining lead for Gemini 3 and a co-author of the pioneering RETRO paper, recently shared insights into the model’s development during his first podcast interview. He explained that Gemini 3’s significant advancements are primarily due to a collaborative effort of a large, dedicated team employing multiple innovative strategies, rather than relying solely on brute-force computing power.
In a conversation with industry analyst Matt Turck, Sebastian emphasized that the success of Gemini 3 hinges on a combination of high-quality pretraining and post-training techniques. He dismissed the notion of a single secret contributing to breakthroughs, attributing improvements largely to the convergence of efforts across diverse teams. Turck asked whether these developments suggest that we are on the path toward genuine artificial general intelligence or if they are merely benchmarks. Sebastian responded that ongoing progress on benchmarks, coupled with practical internal efficiencies, indicates a tangible trajectory toward increasingly capable models—though he is cautious about prematurely declaring AGI.
Sebastian highlighted that despite the increasing competence of models, the pace of progress often surpasses initial expectations. Reflecting on the journey since 2019, he pointed out that current models and their capabilities were once beyond imagination, signaling a rapid evolution driven by scaling laws, architectures, and data strategies. He envisioned a future where AI might enable significant scientific breakthroughs, potentially earning accolades like Nobel Prizes, alongside continued improvements in understanding and deploying these systems.
Regarding the industry landscape, Sebastian observes that while many labs are pursuing similar core technologies—primarily Transformer architectures—they are exploring different specialized branches, such as vision or multimodal domains. He notes that the field is characterized by both convergence and healthy competition, with larger teams and resources at companies like Google and DeepMind providing advantages. Yet, he remains optimistic about disruptive, potentially revolutionary breakthroughs emerging from smaller or less resource-rich teams in the future.
He also emphasized the critical role of research taste—a nuanced intuition that guides choices between complexity, performance, and risk. This judgment is vital when balancing short-term fixes against long-term exploratory projects, especially given the unpredictable nature of research success.
Organizational structure and collaboration are equally vital. DeepMind’s approach involves large, specialized teams focusing on pretraining, post-training, infrastructure, and evaluation, with evaluation systems increasingly built internally to ensure integrity amid the proliferation of benchmarks that can become contaminated.
On the technical side, Sebastian explained that Gemini 3’s architecture is an evolution rather than a revolution—it maintains a Transformer backbone enriched with mixture-of-expert layers that separate parameter count from computational load. This blend supports multimodality, allowing the model to process text, images, and videos simultaneously, albeit with higher computational costs balanced by efficiency gains from innovative approaches.
Data remains central to progress. Gemini 3’s training data is a mixture of multiple sources, reflecting a multimodal universe that combines text, images, and other modalities. Sebastian cautioned about synthetic data’s potential pitfalls, stressing careful use to avoid misleading results, but seeing it as a promising tool when handled responsibly.
The conversation also touched on the purported end of scale laws, with Sebastian asserting that, in his experience, scaling continues to significantly enhance model performance—though architecture and data remain critical complements. Meanwhile, evaluation strategies have become more sophisticated, with internal benchmarks designed to predict how models will perform at large scales, aiding rapid iteration and improvement.
Finally, Sebastian looked ahead to exciting research directions, including long context handling, more efficient attention mechanisms, and the integration of retrieval with reasoning. He also underscored the importance of understanding systems comprehensively—spanning hardware, algorithms, and infrastructure—to make meaningful advances in AI.
In summary, the field stands at a dynamic juncture, driven by collaborative initiatives, innovative architectures, and evolving paradigms that emphasize data efficiency and system understanding. As Sebastian noted, the rapid progress is likely to continue, making AI’s future both promising and profoundly complex.





