Tech giants and startups race to build “world models” that can simulate reality, potentially transforming robotics and beyond
For all their eloquence in writing poetry or explaining quantum physics, today’s large language models have a fundamental blind spot: They don’t understand how a ball rolls down a hill or why water spills when you tip a glass.
That may soon change. Some of artificial intelligence’s most prominent researchers are now racing to develop “world models” — AI systems that learn to simulate and predict how the physical world operates, from the laws of gravity to the persistence of objects when they move out of sight.
The technology represents a significant departure from language models like ChatGPT, which predict the next word in a sequence. World models instead aim to predict what happens next in reality itself, potentially enabling breakthroughs in robotics, video generation, and autonomous systems.
“Within three to five years, this will be the dominant model for AI architectures, and nobody in their right mind would use LLMs of the type that we have today,” Yann LeCun, Meta’s chief AI scientist and a Turing Award winner, declared at a recent MIT symposium — a provocative claim that he acknowledged has not endeared him to various corners of Silicon Valley.
LeCun is reportedly planning to launch his own world model startup after leaving Meta in the coming months. He joins a growing roster of heavyweights betting on the technology. Fei-Fei Li, the Stanford professor known as the “godmother of AI,” recently unveiled Marble, the first commercial release from her startup World Labs. Meanwhile, Jeff Bezos has quietly launched Project Prometheus, a new AI company focused on engineering and manufacturing applications, with more than $6 billion in funding.
The major tech platforms aren’t sitting idle. Google and Meta are developing world models for robotics applications and to enhance the realism of their video generation systems. OpenAI has suggested that improving video models could itself be a pathway to achieving world model capabilities.
The competition extends well beyond Silicon Valley. Chinese tech giant Tencent is developing world models that incorporate both physics understanding and three-dimensional data processing. Last week, the Mohamed bin Zayed University of Artificial Intelligence in the United Arab Emirates announced PAN, marking the institution’s entry into the world model race.
Learning Physics Without a Textbook
World models fundamentally differ from language models in their approach to learning. Rather than training on text from the internet, they consume video footage, simulation data, and other spatial inputs to build internal representations of how objects, scenes, and physical dynamics work.
The goal is ambitious: create AI systems that intuitively grasp concepts like gravity, object permanence, and cause-and-effect relationships without being explicitly programmed with physics equations. In essence, these models would learn about the world much like a child does — through observation and interaction.
This capability could prove transformative for robotics, where understanding physical interactions is crucial, and for creating more realistic video content that obeys the laws of physics rather than producing the occasionally surreal outputs of current AI video generators.
Yet building these models faces a fundamental challenge that language models largely avoided: data scarcity.
“One of the biggest hurdles to developing world models has been the fact that they require high-quality multimodal data at massive scale,” said Ulrik Stig Hansen, president of Encord, which offers one of the largest open-source datasets for world model development.
While language model developers could scrape virtually the entire text-based internet, the specialized video and sensor data needed for world models isn’t as readily available or consolidated. Encord’s dataset contains 1 billion data pairs across images, videos, text, audio, and 3D point clouds, assembled over months with a million human annotations. But even this represents just a baseline — production systems will likely need significantly more.
Whether world models can advance as rapidly as language models remains an open question. The technology benefits from substantial new investment and interest from top researchers, but the complexity of modeling physical reality presents challenges that generating coherent text did not.
Still, the potential applications — from more capable robots to AI systems that can reason about real-world problems — have made world models one of the hottest areas in AI research. As the race intensifies, the industry is betting that teaching machines to understand our physical world may be the key to the next breakthrough in artificial intelligence.

Folks act like those girls were hog-tied and forced to attend epstien's parties. They went willing and were paid well…
Hear, hear! Thank YOU Wayne Creed! You keep the Shore SAFE from the LEFTY screwballs!
I really hope Mr. Trump isn't caught up in this Epsteen scandal. This has Democrat hoax written all over it.…
Love all these knee-jerk reactions! So easy to harvest.
For Lieutenant Governor, John Reid won 7,263 votes compared to 5,603 for Hashni. Should have read, "For Lieutenant Governor, John…