Mar. 27, 2025

Advancing the Integration of Robots and AI to Drive Technological InnovationCollaborative Research on "Robotics Foundation Models"

Toyota Motor Corporation's Frontier Research Center (hereinafter referred to as Toyota) is engaged in research on robots that coexist with humans, aiming at "Mobility for All." The research method incorporates participatory collaboration with many researchers around the world, known as "collaborative research." Now, we will share our efforts in research and development of "Robotics Foundation Models" using the Human Support Robot (HSR) as a collaborative initiative to realize a society where robots can play active roles in various fields.

- First, what exactly are the Robotics Foundation Models?

Arima: The Robotics Foundation Models are robot control models that learn movement data collected from robots in diverse environments and can adapt generally to various tasks. For example, large language models like GPT and Gemini learn from vast amounts of text data to facilitate smooth conversations in chat applications. In contrast, the Robotics Foundation Models integrate and learn information obtained from real physical environments, such as sensor data and motion data. As a result, robots can execute tasks based on language instructions in various real-world environments, including homes, retail stores, factories, medical settings, and construction sites. This technology is expected to help solve social issues in Japan and has become an increasingly prominent research area in recent years. To build Robotics Foundation Models, we believe it is necessary to advance a series of cycles involving 1) large-scale data collection, 2) learning the Robotics Foundation Models, 3) validation, and 4) further data collection. While large language models can easily obtain vast amounts of data from the internet, collecting robot data at a similar scale is challenging, making it a significant hurdle to acquire a large amount of robot motion data.

- Could you explain how this project started?

Arima: Within the HSR Development Community, many professors and students have been discussing various aspects of robotics research. Around the fall of 2023, when attention was already turning to the potential of the Robotics Foundation Models, a casual conversation arose among the group: "Since we're operating HSR across so many sites, wouldn't it be interesting to gather all that data for robot training? Could Robotics Foundation Models be born out of such activities?" This informal discussion sparked enthusiasm when Tatsuya Matsushima from the University of Tokyo raised the idea within the community, and many supporters quickly joined in, setting the grassroots project in motion.

The HSR Development Community

The HSR development community aims to realize a society where humans and robots coexist by promoting collaborative research^*1. This initiative began in 2015 and is currently operated by the Intelligent Home Robotics Research Committee of the Robotics Society of Japan. Members of the HSR development community use HSR as a common platform for research and development, sharing results such as software and know-how. This approach reduces the effort required for traditional hardware development, thereby accelerating the development of component technologies and promoting demonstration experiments. As of March 27, 2025, 67 research institutions from 14 countries are participating as members. Additionally, HSR serves as the standard platform for RoboCup@Home^*2, where members of the HSR development community compete annually to showcase their research achievements.

- Could you share some insights into the challenges and measures taken for efficiently collecting data from multiple sites?

Arima

We encountered two major challenges in amassing large amounts of robot data.

The first challenge arises when a human operator remotely controls the robot to collect data. Although the quality of the data collected in this manner is high, it requires an operator to dedicate their full attention to one robot at a time. In addition, annotating the operation―such as specifying which task the robot is performing―adds extra time beyond the operation itself. Owing to HSR's role as the standard platform at RoboCup@Home and the presence of autonomous systems capable of, for instance, tidying up in home environments within the HSR Development Community, we addressed this challenge by leveraging data collection from these autonomous systems. This approach means that even during periods when a human operator cannot directly control the robot, data can still be efficiently gathered. Moreover, while these autonomous systems are running, there is no need for remote operation. The robot can use its object recognition capabilities to determine how to handle each object, automatically appending annotations to the data.

The second challenge is that when using the standard HSR remote control interface―a game controller―it is necessary to control each axis of the robot arm individually, making it difficult to efficiently collect diverse motion trajectories. To overcome this, we developed a new, more intuitive control interface that simplifies HSR's operation and facilitates the efficient gathering of diverse movement data.

- So, you have developed a new operating interface that allows for more intuitive control of HSR. What specifically does this interface look like?

Wakayama: The newly developed remote operation interface, "Teleoperation System for HSR" (THSR), is designed to replicate HSR's joint structure at half scale. The movements made with the THSR are linked directly to HSR, thereby enabling intuitive remote control. This system allows even first-time HSR users to become proficient in a short period of time, and with increased expertise, even complex tasks can be executed.

: Figure 1 The Human Support Robot "HSR" (left) and the remote operation interface "THSR" (right), both developed by Toyota

Wakayama

THSR was originally developed to test the limits of HSR's capabilities through remote control. The conventional use of a game controller made it challenging to move several axes simultaneously; complex operations were quite difficult to execute. To remedy this, we set out to develop an interface that was both more user-friendly and intuitive. Beginning with a prototype model featuring a scaled-down robot arm, we enhanced the system by adding a rotary axis and a vertical axis to address limitations in degrees of freedom, thereby creating THSR.

Generally, a robot needs to possess more degrees of freedom than the desired controlled parameters, such as 6 degrees of freedom for 3D position (x,y,z) and orientation (roll, pitch, yaw). However, the HSR's arm only has 4 degrees of freedom, which makes it difficult to manipulate objects flexibly. By combining rotation and vertical movement to reach a total of 6 degrees of freedom, the system is capable of controlling both the position and orientation of the robot's end-effector. In THSR, by integrating the robot arm, the rotary component, and the vertical component, we have created a system with 6 degrees of freedom that allows for free control of the end-effector's position and orientation.

: Video 1 Operating HSR with THSR to perform various tasks

- THSR's resemblance to HSR certainly makes it more intuitive to operate. By the way, where is all the data collected stored?

Takaba

After preprocessing, the collected data is stored on a cloud storage service. Not only does this storage solution facilitate data retention, but we have also built a system using various cloud services to ensure that the vast amounts of data are collected reliably and used efficiently in our research and development efforts.

One concrete example of our system architecture involves introducing MLOps to accelerate large-scale AI development. MLOps streamlines the entire machine learning process, from data preprocessing, model training, and validation of trained models to the deployment of models in real-world environments. Traditionally, we had to manually convert the data collected by the robots into a format suitable for training and then place it within the learning environment. By automating these tasks via an MLOps pipeline, we are able to efficiently advance the development of the Robotics Foundation Models.

In addition, we are developing a web service to visualize and analyze the collected robot data. This service offers features such as viewing videos of HSR's movements for each task based on the collected data, as well as analyzing the types of tasks performed. By learning from diverse task data, the Robotics Foundation Models can acquire the flexibility needed to execute a wide variety of tasks. With an environment for data visualization and analysis, we are able to devise effective plans for targeted data collection, such as focusing on tasks that are currently underrepresented.

: Figure 2 Data visualization and analysis service under development

- It is impressive that you have implemented measures for effective data collection. Could you update us on the current progress of data collection?

Arima: Using a combination of autonomous systems and remote operation, we have managed to collect around 350 hours of data from eight locations, including Toyota. While remote operation by humans has traditionally been conducted on-site where the robot was installed, we are now developing capabilities that allow control from remote locations. In fact, we have already conducted tests―such as remotely operating robots at the University of Tokyo from Toyota City―demonstrating that a broader range of participants can contribute to data collection, thereby accelerating the overall speed of gathering data.

- That sounds promising. And what about the progress on training the Robotics Foundation Models?

Arima: In parallel with data collection from HSR, we are also training the Robotics Foundation Models. At present, we have developed open Robotics Foundation Models "Octo^*3" and "π0^*4" that are beginning to execute short tasks such as picking up objects. However, because the volume of data is still not sufficient, the model is not yet flexible enough to handle diverse environments. Moving forward, we plan to both collect a wider range of data more efficiently and accelerate the research on learning algorithms for the Robotics Foundation Models.

- I see. Finally, could you share your vision for the future?

Arima: Although this project began with volunteers from the HSR development community, now it has evolved through collaboration with the AI Robot Association (AIRoA)^*5 and has grown into a more considerable collaborative research effort. Moving forward, we will further expand our network of cooperation, with researchers and companies from around the world working together to accelerate research on the Robotics Foundation Models. We aim to implement our initiatives in society as soon as possible.

Authors

Jumpei Arima
Behavior Learning Robotics Research Group, R-Frontier Div., Frontier Research Center
Joined Toyota Motor Corporation in 2021. He majored in robotics during his graduate studies. After joining Toyota, he worked on the development of advanced safety systems. Since 2024, he has been engaged in research and development related to robot learning using large-scale data.

Yuta Takaba
Collaborative Robotics Research Group, R-Frontier Div., Frontier Research Center
Joined Toyota Motor Corporation in 2018. He majored in information engineering during his graduate studies. At Toyota, he has been involved in the development of software platforms for autonomous driving and cloud systems. Currently, he builds the infrastructure for robot data collection and the MLOps pipeline.

Yuki Wakayama
Mobile Manipulator Group, R-Frontier Div., Frontier Research Center
Joined Toyota Motor Corporation in 2024. He majored in robotics during his graduate studies. After joining Toyota, he has focused on research and development for household support robots. His passion for science fiction led him to the world of robotics.

References

*1	Toyota Shifts Home Helper Robot R&D into High Gear with New Developer Community and Upgraded Prototype
*2	The RoboCup@Home league
*3	Octo Model Team: Octo: An Open-Source Generalist Robot Policy. Proceedings of Robotics: Science and Systems, 2024.
*4	K Black et al.: π0: A Vision-Language-Action Flow Model for General Robot Control. ArXiv preprint arXiv:2410.24164, 2024.
*5	AI Robot Association