A mechanical hand is on show on the Robotic Mall, world’s first embodied clever robotic 4S retailer, on August 13, 2025 in Beijing, China.
Vcg | Visible China Group | Getty Photos
BEIJING — Alibaba Cloud is investing in a brand new kind of synthetic intelligence designed to higher replicate the true world utilizing a special method from chatbots akin to OpenAI’s ChatGPT.
The shift acknowledges the bounds of “massive language fashions” educated totally on textual content. As an alternative, builders are beginning to focus extra on “world fashions” constructed on movies and real-life bodily eventualities.
To leap on the development, Alibaba led a 2 billion yuan ($290 million) funding in ShengShu, the startup behind the AI video technology device Vidu, the corporate introduced Friday. TAL Training and Baidu Ventures additionally participated within the sequence B funding spherical.
The funding comes about two months after ShengShu raised 600 million yuan from Qiming Enterprise Companions and different backers. The startup declined to reveal its valuation.
ShengShu stated the newest funding will help the event of a “common world mannequin” that makes use of AI to bridge two at the moment separate domains: the digital world of video games and AI-generated video, and the bodily world of autonomous driving and robots.
“ShengShu believes {that a} common world mannequin, constructed on multimodal knowledge akin to imaginative and prescient, audio, and contact, extra naturally captures how the bodily world works than massive language fashions,” the three-year-old startup stated in an announcement.

“We intention to attach notion and motion,” Zhu Jun, founding father of ShengShu, added in an announcement, permitting AI techniques to higher mannequin and predict real-world habits persistently.
ShengShu’s newest Vidu Q3 Professional mannequin, launched in January, ranks among the many prime 10 AI fashions for producing movies from textual content and pictures, in line with Synthetic Evaluation.
The corporate launched Vidu globally months earlier than OpenAI made its now-shuttered Sora device for AI video technology extensively out there. Chinese language short-video corporations Kuaishou and ByteDance have additionally launched comparable competing AI instruments for producing movies.
World mannequin competitors
Alibaba has expanded its investments in associated startups.
The Chinese language tech big and Baidu Ventures final month led a $50 million funding in Tripo AI, a platform that makes use of AI to shortly generate digital 3D fashions from images. Tripo stated additionally it is shifting away from strategies utilized by language fashions towards AI instruments grounded in bodily house and is growing its personal world mannequin.
In September, Alibaba additionally led a $60 million funding in PixVerse, which launched an AI world mannequin earlier this 12 months that permits customers to direct how a video unfolds whereas it’s being generated.
Alibaba, which received its begin in e-commerce, has additionally launched free, open-source AI fashions for video technology and, in February, launched one for powering robots.
Shengshu stated Friday it has strategic partnerships with corporations growing embodied AI — techniques akin to humanoid robots that work together with the bodily world — to be used throughout industrial, industrial and residential settings.
World fashions are important for robotics as a result of the expertise wants greater than LLMs to work, Kevin Kelly, co-founder of the U.S. tech journal Wired, wrote final month on his Substack.
Finally, to copy human intelligence, AI will want three issues: reasoning, an understanding of the bodily world and steady studying, Kelly stated. Whereas AI for the training class hasn’t been developed but, LLM-powered chatbots have created the data aspect, he stated, making world fashions a key space requiring a breakthrough.

