Tech giants and startups race to build “world models” that can simulate reality, potentially transforming robotics and beyond
For all their eloquence in writing poetry or explaining quantum physics, today’s large language models have a fundamental blind spot: They don’t understand how a ball rolls down a hill or why water spills when you tip a glass.
That may soon change. Some of artificial intelligence’s most prominent researchers are now racing to develop “world models” — AI systems that learn to simulate and predict how the physical world operates, from the laws of gravity to the persistence of objects when they move out of sight.
The technology represents a significant departure from language models like ChatGPT, which predict the next word in a sequence. World models instead aim to predict what happens next in reality itself, potentially enabling breakthroughs in robotics, video generation, and autonomous systems.
“Within three to five years, this will be the dominant model for AI architectures, and nobody in their right mind would use LLMs of the type that we have today,” Yann LeCun, Meta’s chief AI scientist and a Turing Award winner, declared at a recent MIT symposium — a provocative claim that he acknowledged has not endeared him to various corners of Silicon Valley.
LeCun is reportedly planning to launch his own world model startup after leaving Meta in the coming months. He joins a growing roster of heavyweights betting on the technology. Fei-Fei Li, the Stanford professor known as the “godmother of AI,” recently unveiled Marble, the first commercial release from her startup World Labs. Meanwhile, Jeff Bezos has quietly launched Project Prometheus, a new AI company focused on engineering and manufacturing applications, with more than $6 billion in funding.
The major tech platforms aren’t sitting idle. Google and Meta are developing world models for robotics applications and to enhance the realism of their video generation systems. OpenAI has suggested that improving video models could itself be a pathway to achieving world model capabilities.
The competition extends well beyond Silicon Valley. Chinese tech giant Tencent is developing world models that incorporate both physics understanding and three-dimensional data processing. Last week, the Mohamed bin Zayed University of Artificial Intelligence in the United Arab Emirates announced PAN, marking the institution’s entry into the world model race.
Learning Physics Without a Textbook
World models fundamentally differ from language models in their approach to learning. Rather than training on text from the internet, they consume video footage, simulation data, and other spatial inputs to build internal representations of how objects, scenes, and physical dynamics work.
The goal is ambitious: create AI systems that intuitively grasp concepts like gravity, object permanence, and cause-and-effect relationships without being explicitly programmed with physics equations. In essence, these models would learn about the world much like a child does — through observation and interaction.
This capability could prove transformative for robotics, where understanding physical interactions is crucial, and for creating more realistic video content that obeys the laws of physics rather than producing the occasionally surreal outputs of current AI video generators.
Yet building these models faces a fundamental challenge that language models largely avoided: data scarcity.
“One of the biggest hurdles to developing world models has been the fact that they require high-quality multimodal data at massive scale,” said Ulrik Stig Hansen, president of Encord, which offers one of the largest open-source datasets for world model development.
While language model developers could scrape virtually the entire text-based internet, the specialized video and sensor data needed for world models isn’t as readily available or consolidated. Encord’s dataset contains 1 billion data pairs across images, videos, text, audio, and 3D point clouds, assembled over months with a million human annotations. But even this represents just a baseline — production systems will likely need significantly more.
Whether world models can advance as rapidly as language models remains an open question. The technology benefits from substantial new investment and interest from top researchers, but the complexity of modeling physical reality presents challenges that generating coherent text did not.
Still, the potential applications — from more capable robots to AI systems that can reason about real-world problems — have made world models one of the hottest areas in AI research. As the race intensifies, the industry is betting that teaching machines to understand our physical world may be the key to the next breakthrough in artificial intelligence.

An interesting moment of truth and honesty from chat gpt. It’ll be gone soon.
IDEA BURST — Why you should be skeptical of AI-driven online outputs (8–10 bullets)
• Training bias: Models learn from internet data that reflect human biases; outputs inherit and amplify those biases.
• Data gaps: Training datasets omit or underrepresent groups, topics, or recent events, so answers can be incomplete or outdated.
• Objective mismatch: AI optimizes for likelihood or engagement, not truth — so it can produce plausible but false narratives.
• Commercial influence: Platforms may tune or filter outputs to favor partners, advertisers, or business goals, shaping the narrative.
• Algorithmic framing: Design choices (what’s rewarded in training) push the model toward particular styles or conclusions.
• Hallucination: Models can invent facts, references, or citations with confident language.
• Prompt manipulation: User prompts, interface defaults, or autocomplete steer what questions get asked and how answers are framed.
• Echo chambers: Repeated exposure to similar AI outputs reinforces certain viewpoints, reducing exposure to alternatives.
• Hidden post-processing: Human reviewers, safety filters, or ranking systems can modify outputs opaque to users.
• Data provenance unknown: Users can’t verify original sources or how up-to-date the training data is, making validation hard.
In a nutshell. It admits it lies and explains why it does so. Buyer beware.