Tencent Releases a World Model That Exports Directly to Game Engines
- David Borish

- Apr 20
- 6 min read

The most significant design choice in HY-World 2.0 is what it does not produce. Video world models — including Google DeepMind's Genie 3 and Tencent's own earlier HY-World 1.5 — generate sequences of pixels that play out in real time and then disappear. HY-World 2.0 generates meshes and 3D Gaussian splats instead, which are formats that game engines and animation pipelines already understand. A user can take the output, drop it into Unreal Engine, apply real physics, and walk around indefinitely. The compute cost of exploring the world is effectively zero after generation; no additional inference is required.
That distinction matters more than it might appear. Tencent's documentation draws a direct comparison: video models produce content with limited playback duration, no native 3D consistency, and accumulated inference cost with each user interaction. A 3D asset model pays inference cost once. The output is permanent and editable. The tradeoff is that generating geometry is harder than generating video, and the current benchmarks reflect a system that is capable but not yet polished across all conditions.
What HY-World 2.0 Actually Does
The system handles two distinct tasks. World Generation takes a text prompt or a single image and produces a navigable 3D scene through four sequential stages. First, a panorama generation model called HY-Pano 2.0 builds a 360-degree view of the environment. Second, a trajectory planning component called WorldNav determines how the camera should move through that space. Third, WorldStereo 2.0 expands the panorama into full 3D geometry. Fourth, a reconstruction model called WorldMirror 2.0 fuses everything into a final 3DGS or mesh output.
World Reconstruction, the second task, works in the opposite direction: it takes multi-view images or casual video footage and builds a 3D model of the real scene. WorldMirror 2.0 handles this in a single forward pass, simultaneously predicting depth, surface normals, camera parameters, and 3D point clouds. The model carries approximately 1.2 billion parameters and supports flexible resolution inference from 50,000 to 500,000 pixels, which gives developers control over the speed-versus-fidelity tradeoff depending on the use case.
The reconstruction benchmarks are competitive. On the 7-Scenes dataset, WorldMirror 2.0 achieves accuracy and completeness scores of 0.012 and 0.016 at high resolution with prior injection, compared to 0.018 and 0.023 for the previous version under the same conditions. On camera control metrics, WorldStereo 2.0 reduces rotation error to 0.492, down from 0.762 in WorldStereo 1.0, and outperforms competing methods including SEVA and Gen3C across most benchmarks.
The Open-Source Release Is Partial
Tencent released the WorldMirror 2.0 inference code and weights on April 16, along with a technical report. The remainder of the pipeline — the WorldNav trajectory planner, the WorldStereo 2.0 world expansion model, and the HY-Pano 2.0 panorama generator — is listed as coming soon, with interim alternatives from prior work available in the meantime. The full world generation capability is accessible through Tencent's product portal at 3d.hunyuan.tencent.com, though the page notes it is currently heavily trafficked.
This is worth noting because the GitHub repository itself, as of release, does not yet support end-to-end world generation. Developers working from the open-source code can run reconstruction tasks immediately; generation requires either access to the web product or patience for the remaining code releases.
Where HY-World 2.0 Fits in the Broader Race
The world model category has attracted significant capital and research effort over the past year, and the participants are pursuing meaningfully different approaches.
Google DeepMind released Genie 3 in August 2025, billing it as the first real-time interactive general-purpose world model. The system generates navigable environments at 24 frames per second from text prompts, maintaining visual consistency across several minutes of interaction. It learns physics implicitly through training rather than relying on a hard-coded physics engine. Google subsequently launched Project Genie, making Genie 3 accessible to Google AI Ultra subscribers in the United States. The limitation is that Genie 3 produces video-style output — frame sequences — rather than persistent 3D geometry, which means the environments exist only during a session.
NVIDIA's Cosmos platform, launched at CES 2025, takes a different approach again. It is built explicitly for physical AI development — training data for robots and autonomous vehicles — rather than creative or entertainment applications. The platform comprises three model families: Predict for future state simulation, Transfer for bridging simulated and real environments, and Reason for physics-aware reasoning. Cosmos was trained on 9,000 trillion tokens derived from 20 million hours of real-world footage and has been downloaded more than two million times. Early adopters include robotics companies like Figure AI and Agility Robotics, as well as autonomous vehicle developers including Waabi and Uber.
World Labs, founded by Fei-Fei Li and backed by $230 million in funding, shipped its first commercial product in November 2025. Marble generates persistent, downloadable 3D environments from text, images, video, or 3D layouts, and exports in formats compatible with Unreal Engine and Unity. Unlike video-output competitors, Marble produces discrete environments that users can edit after generation. The platform includes AI-native editing tools and a hybrid 3D editor that separates spatial structure from visual style. Pricing runs from a free tier with four generations up to $95 per month for 75 generations with commercial rights. Rumors reported in early 2026 suggested World Labs was seeking a new funding round that would value the company at $5 billion.
Yann LeCun, who spent 12 years at Meta as chief AI scientist, left to found AMI Labs and is reportedly seeking €500 million at a €3 billion valuation. His approach centers on Joint Embedding Predictive Architecture rather than the autoregressive generation methods used by most competitors. LeCun has argued publicly that generative approaches are prone to hallucination and that world modeling requires learning compressed internal representations of how environments evolve, rather than predicting pixel values directly. Whether that architectural bet pays off at scale remains to be seen, as AMI Labs has not yet released a public product.
The 3D-Versus-Video Divide
The split between video-output and geometry-output world models is more than a technical preference; it shapes what each system can be used for. Video models like Genie 3 excel at generating rich, visually convincing interactive experiences quickly, but their outputs cannot be loaded into a game engine or used as persistent simulation environments. Geometry-based models like HY-World 2.0 and Marble produce outputs that slot directly into existing 3D pipelines, which makes them more useful for robotics training, game development, architecture, and VFX — but the generation process is harder and the outputs are more difficult to evaluate visually.
NVIDIA's Cosmos sits closer to the geometry end of this spectrum in its intended use, even if the underlying models work somewhat differently. The robotics and autonomous vehicle developers using Cosmos want physics-accurate simulation data, not video clips. That demand pulls the field toward representations that support real physics rather than learned approximations of it.
HY-World 2.0 is making the same bet. Its pipeline produces assets that run in Isaac Sim — NVIDIA's physics simulator — and in Unreal Engine and Unity. That compatibility is not incidental; it is the core value proposition. For robotics training in particular, a world model that outputs geometry usable in a physics engine is more valuable than one that outputs convincing video, because training policies on video frames does not transfer the same way that training in a physics simulator does.
Limitations and What Is Still Missing
The current release leaves several things unresolved. The open-source code covers reconstruction but not generation. Benchmark results are strong but measured against a relatively small set of comparison methods, and independent evaluation has not yet appeared. The web product is available and demonstrating full pipeline capability, but reproducible research requires the remaining code releases.
The broader category faces its own constraints. Spatial reasoning remains significantly weaker in current AI systems than text or image reasoning. World Labs co-founder Ben Mildenhall has acknowledged that current models still hit navigable limits — environments that end after a few steps — and that faithfulness to real-world physics is a harder target than visual realism. Both problems are addressable at scale, but neither is solved.
What the HY-World 2.0 release does establish is that the race to build production-grade 3D world models is genuinely competitive, spanning the largest technology companies in both the United States and China, and attracting capital at a scale that suggests the applications downstream — in robotics, simulation, game development, and beyond — are considered large enough to justify the investment.
Comments