As we enter 2025, it is impossible to escape the fact that the era of the Transformer is upon us. 2021-2022 were the years when software engineers and AI enthusiasts first got our hands on modern production LLMs through services like GitHub Copilot and ChatGPT, as well as diffusion models like Dali-E and Midjourney. Then, 2023 saw the proliferation of LLMs from the likes of Google, Meta, and Anthropic, and now 2024-2025 is generating peak AI hype in media, enterprise, and dinner table conversations across the planet.
But what does it take to build a system that can truly match human-like intelligence? Is Attention truly All You Need? Or is the success of Transformers simply a distraction from the real challenges we face in building Artificial General Intelligence?
There is no denying that the Transformer architecture has re-awakened and perhaps revolutionized the field of AI. However, as we scale Transformers to larger and larger models, we are also discovering their limitations. The design of these models leads to inefficiencies in training and inference, and they require vast amounts of training data and computational resources that are not sustainable in the long run.
In comparison, biological neural networks are incredibly efficient, using only an estimated 20 watts of power, while state-of-the-art AI models require thousands of times more energy to train and run. Futhermore, biological systems are highly adaptable, capable of learning from minimal data and adapting to new tasks with ease, whereas current AI systems struggle with transfer learning and generalization.
While it is possible that a variant of Transformer architecture could address these limitations and provide the foundation for a future AGI system (perhaps building on something like Titans from Google Research), we believe that the existence of these limitations warrant a more fundamental rethinking of the design of AI systems. We are not interested in simply scaling up existing architectures, but rather in exploring new approaches that can match the efficiency and adaptability of biological neural networks.
On this path, we are exploring the following challenges as stepping stones towards AGI:
The Five Stepping Stones To Superintelligence
- Continual Learning: The system should be able to learn new tasks continuously without forgetting previously acquired knowledge. A sub-challenge is incorporating long-term memory and the ability to integrate new information with existing knowledge.
- Transfer Learning / Generalization: The system should be able to apply knowledge gained in one context to different, but related, contexts. Indeed, learning a new task after learning many previous tasks should be easier than learning the first task.
- Sample & Compute Efficiency: The system should learn efficiently and from a minimal amount of data, provided it has been primed with simple priors (positional, temporal, causal) learned in an unsupervised fashion.
- Robustness: The system should handle noise and uncertainty in its environment. A sub-challenge is the ability to deal with adversarial examples and unexpected changes in the environment.
- Meta-learning: The system should revise and continually improve its own algorithms.
We believe that these challenges are essential for building a system that can match our expectations of humanlike AGI. Solving the above challenges would give rise to a universal learning machine with drastically improved efficiency and adaptability, capable of learning from minimal data and generalizing knowledge across tasks.
Out Of Scope
Our focus is on building scalable superhuman intelligence, not on creating sentient life, so many aspects of the human experience are out of scope: Emotions, consciousness, qualia, intrinsic motivation, emergent goal formation, etc.
Also, biological plausibility is not a goal in itself, but can serve as a helpful guide for developing AGI systems that are efficient and effective. In particular, any time the state of the art AIs are lagging dramatically behind biological efficiency, we find it useful to consider biological systems as a source of inspiration for novel approaches.
Finally, while Spiking Neural Networks are an interesting area of research, current neuromorphic hardware is not yet capable of achieving the practical efficiency and performance we require, and we do not have the resources to develop our own neuromorphic hardware at this time. However, we are keeping an eye on developments in this area and may revisit it in the future.
Until next time.
- dev/213