Andy Beam, Ph.D.
Chief Technology Officer, Lila Sciences
1/12/2026
Scaling Autonomous Science to Build Scientific Superintelligence
Large-scale training on internet data has given rise to AI with general-purpose capabilities. These models can reason, write, code, and solve problems across nearly every domain—a remarkable achievement built on the accumulated text of the web.
But the field of AI has run out of internet data: it is finite, and we've largely exhausted it. The path forward requires a new source of tokens: one that is evergreen, information-dense, and verifiable. We believe that source is science.
The scientific method is humanity's most reliable engine for generating new knowledge. Hypothesis, experiment, observation, revision—this loop has driven discovery for centuries. The knowledge it produces is uniquely valuable: verified through physical reality, dense with causal structure, and fundamentally different from the text that dominates the web.
Our mission at Lila Sciences is to build scientific superintelligence. Our thesis is that the path to this goal runs through scaling the scientific method itself—through autonomous experimentation that generates the scientific tokens necessary to train the next generation of AI. Rather than building narrow systems for specific domains, we have built general infrastructure capable of being applied to nearly any field of science. The bitter lesson of AI holds: scalable methods that learn from data outperform approaches that attempt to encode human expertise directly. We are betting that this lesson applies to science itself.
After two years of building in stealth, we have generated a proprietary scientific corpus spanning biology, chemistry, and materials science—from genetic sequences to molecular structures to catalyst compositions. We're excited that Lila is now at trillion-token scale.
To date, Lila's platform and autonomous labs have produced internet-scale data: trillions of scientific reasoning and data tokens. To put this in context, the entire internet represents a substrate of approximately 15 trillion tokens—and, of course, only a small fraction of this is science.
This approach is already yielding important new discoveries. Across drug development, genetic medicines, gene editing, electrocatalysts, metal-organic frameworks, and other domains, Lila's platform has identified novel solutions with real-world significance. Stay tuned for more about these discoveries in the near future.
Scaling this work requires extraordinary compute infrastructure. We are excited to be partnering with NVIDIA to bring the resources necessary to run the scientific method at the scale the world's challenges demand. Integrating Megatron to accelerate LLM training yielded a 3X speedup in our core reinforcement learning loop when leveraging NVIDIA libraries. Training at scale is central to Lila's mission of Scientific Superintelligence™ and NVIDIA tools and libraries are a critical part of our technology stack.
The internet taught AI to process existing knowledge. Science will teach it to create new knowledge.
Interested in partnering with us? Contact our team.