May 17, 2024

Resilience for a Sustainable Society

With the recent advancements in Internet of Things (IoT) devices, various elements such as vehicles, robots, power sources, and distribution centers are interconnected to form a large network system that encompasses society, cities, and facilities. Examples include robot networks operating in factories and cities, power networks supporting local energy needs, and logistics networks that ensure smooth operations throughout all of society, all existing at different scales and levels of granularity.

The environments in which these network systems are deployed are open spaces where unexpected fluctuations can occur. To achieve a sustainable society in the face of challenges such as labor shortage, energy issues, and frequent disasters, future network systems need to be resilient, adapting flexibly to environmental changes and recovering quickly from unforeseen disruptions. If the components of the network system, such as vehicles, robots, and batteries, can autonomously recover in response to environmental changes, the system will become resilient.

At Toyota Motor Corporation's Frontier Research Center (hereinafter referred to as Toyota), we are researching the design technology for resilient network systems that can adapt to environmental changes. In this article, we would like to introduce four research cases.

Case 1: Robot Network Integrated with IoT Infrastructure

The introduction of autonomous robots that provide various services within facilities has been increasing due to labor shortage, longer working hours per person, and the need to mitigate infection risks. However, building a network system based on IoT infrastructure (cloud servers, data centers) and autonomous robots specialized for pre-defined services can complicate centralized management, increase system integration costs, and make it difficult to provide services continuously to customers. Therefore, we aimed to build a self-contained robot network that reduces the burden of centralized management by integrating with cloud servers.

There may be cases where a service granted by the user (in this case, a transportation task) is not feasible for the autonomous robot to perform alone or in coordination with others. In an environment where an arbitrary number of unknown tasks are assigned, we need a system design in which an arbitrary number of individual autonomous robots need to autonomously perform individual actions (division of labor) from an efficiency perspective, collaborative actions (cooperation) from an effectiveness perspective, and judge the feasibility of the current team from a resilience perspective, executing the entire task without stopping (Figure 1). This requires learning algorithms for autonomous distributed policies.

: Figure 1 Effect of autonomous robot network method

The left in Figure 2 shows the control structure in which robots autonomously allocate tasks and share information with cloud servers^*1*2. Each robot communicates with neighboring robots through local communication and updates the priority of surrounding objects. As a result, division of labor among robots is emergent. When encountering tasks that cannot be executed individually, robots autonomously share the priority of other robots from the cloud server (event-driven) at the timing they determine, resulting in emergent collaboration among multiple robots. Even when there are infeasible tasks assigned, a system that can handle them is considered resilient. In our method^*2, each robot temporarily avoids carrying objects that cannot be transported by updating the exclusion degree for each object based on task experience held by the cloud server, using the same structure as the priority (the right in Figure 2).

: Figure 2 Allocation with infeasible tasks^*2
Left: Calculate temporary avoidance based on task experience and degree of exclusion.
Right: Transport simulation when infeasible tasks are assigned.

: Video 1 Simulation of autonomous transport by six robots

In addition to task allocation, robot control is also required to cooperatively transport heavy loads when considering providing service in a real environment. For this reason, we also created a hierarchical multi-agent reinforcement learning framework with priorities in the middle layer (the left in Figure 3)^*3. The upper layer of each robot uses local information to update the priorities of neighboring objects, while the middle layer allows each robot to share the priorities of all objects with other robots in an event-driven manner^*1. Furthermore, each robot learns to control its forward, backward, and turning movements (lower-level robot control) in coordination with other robots to transport the largest priority cargo to the goal. By learning in a simulation environment similar to the real environment shown in the middle of Figure 3, we were able to confirm that task allocation and robot control can be applied simultaneously in the real environment shown on the right in Figure 3.

: Figure 3 Hierarchical reinforcement learning for task allocation and robot control^*3
Left: Learning to control multiple robots for cooperative transport. Middle: Learning environment.
Right: Verification experiment of individual and cooperative behavior.

: Video 2 Verification experiment of individual and collaborative actions

Case 2: Network with Heterogeneous Rotors Operating Autonomously

We believe that there is a high demand for aerial transportation of heavy objects in various situations such as rough terrains, construction and maintenance sites, and emergencies like disasters. However, fixed-rotor drones have difficulty accommodating non-standard payloads, and redundant drones increase the size of the system. Therefore, we considered a scalable system that can freely change the types and numbers of rotors, allowing them to be used collectively for cooperative transportation, similar to how humans cooperate to carry objects (the left in Figure 4). From a safety perspective, it is essential to ensure that the system can continue flying even if some of the rotors fail during transportation (the middle in Figure 4). Additionally, if the types and numbers of rotors can be freely changed and they can be collected and coordinated, they can be reused (the right in Figure 4).

: Figure 4 Effect of autonomous decentralized control method

To achieve this, we created a transportation drone that can attach multiple rotors to its payload (the top left in Figure 5)^*4*5. To realize Figure 4, decentralized control rather than centralized control is necessary. Each rotor autonomously determines its thrust required for stabilization and tracking goals by sharing information on payload specifications and position/orientation of the payload, thereby eliminating the need for system-wide reconfiguration even when the rotor configuration changes (the bottom left in Figure 5). As a result, even if the rotor configuration changes, the entire system does not need to be re-qualified. Through experiments, we confirmed that the system can continue flight even when the center of gravity of the payload shifts due to the combination of rotors with different maximum thrusts or when some rotors fail (the right in Figure 5).

: Figure 5 Flight experiment
Top left: Transport drone.
Bottom left: Autonomous decentralized controller for rotor.
Right: Flight experiment with autonomous decentralized control.

: Video 3 Flight experiment using autonomous decentralized control with collective rotors
One rotor fails around 1 min 12 sec. Also the center of gravity of the payload shifts.

Case 3: Electric Power Network with Electric Vehicles Used Secondarily as Batteries

In order to achieve a sustainable society, the introduction of renewable energy is essential. However, the amount of power generated from renewable sources can fluctuate depending on the time of day and weather conditions, which may disrupt the supply-demand balance of the power grid. Therefore, a Virtual Power Plant (VPP) that utilizes a large number of electric vehicles as batteries for secondary use, beyond their original purpose of transportation, is being anticipated (the left in Figure 6). The batteries used in these VPPs are numerous and diverse, varying in battery types and degradation levels. It is also expected that the batteries will frequently undergo plug-in and plug-out due to factors such as battery lifespan, malfunctions, and outings. As a result, it is predicted that complete centralized management will become difficult to control in terms of operation, computation, and communication costs.

To address this issue, we created a distributed control system where each battery operates autonomously (the right in Figure 6)^*6. In order to achieve the required power for the entire VPP with electric vehicles, each battery determines its output power based on its own characteristics using only the error signal transmitted from the management server. As a result, we successfully achieved the required power as a whole for the electric vehicle VPP (the left in Figure 7). Additionally, aligning the State of Charge (SOC) of each battery to an appropriate value to the best extent possible is highly effective in suppressing battery degradation and predicting the acceptable power capacity for the electric vehicle VPP as a whole. With this method, we achieved SOC equalization without sharing SOC information between batteries (the middle in Figure 7). Furthermore, even if some batteries are disconnected due to unexpected malfunctions, we were able to confirm through experiments that the required power for the entire system can be achieved without detailed centralized control from the server (the right in Figure 7). We have confirmed that the proposed distributed control can independently achieve the overall required power for VPP using 20 PHEVs in practice.

: Figure 6 Electric Vehicle VPP and Autonomous Decentralized Control

: Figure 7 Effect of ASC method
Even if one of batteries (#5) leaves the system in the middle of the run, the total power output is guaranteed to be the target value.

Case 4: Resilient Logistics Network

In recent years, there has been a demand for Supply Chain Resilience (SCR) that ensures economic activities can continue even during crises such as natural disasters. Since disruptions in the supply chain are often caused by logistical interruptions, it is necessary to not only optimize logistics efficiency but also achieve diversification such as by decentralizing production and logistics facilities and securing multiple transportation methods.

For example, let's consider a logistics network from Shipping Base 1 to nationwide sales nodes (the left in Figure 8). Now, let's assume a disaster occurs as shown in the right of Figure 8, where the direct land transportation costs between Shipping Base 1 and sales nodes in the Kanto region (7-11, 18) significantly increase. In addition to land transportation, we also have the option of using sea transportation. The transportation strategy with the minimum cost is to consolidate goods at the shipping base and directly transport them to each sales node (hub-and-spoke system). However, when considering resilience, it is necessary to take into account the diversity (entropy) of transportation methods in addition to an efficient strategy. Furthermore, on-site stakeholders may want to maintain the current transportation strategy (model) as much as possible. Therefore, this transportation problem is formulated as an imitation optimization problem that considers the proximity between the efficiency of transportation, diversity, and the model (the left (1) in Figure 9).

: Figure 8 Logistics network and assumed disasters
Left: Land transportation [blue] and sea transportation [brown]. Right: Example of transportation route disruption.

In our research^*7*8, we showed that this imitation optimization problem (the left (1) in Figure 9) is equivalent to a robust optimization problem that considers the upper limit of transportation cost variations (the left (2) in Figure 9), as well as an SB problem that takes the model into account (the left (3) in Figure 9). From these, we were able to consider transportation strategies that take into account the expected damage caused by disasters, which is one of the evaluation criteria for SCR, and solve them quickly. Furthermore, by approximating the cost and the model of transportation using Markov approximation^*8, we can decompose the transportation strategies obtained as patterns from the shipping base to the sales nodes into transition probabilities between nodes. This enables us to derive decentralized strategies for each node, making it possible to explore solutions in conjunction with inventory management issues at each location in the future.

The middle and right figures in Figure 9 represent the strategies with the minimum cost (the aforementioned hub-and-spoke system) and the resilient strategies, respectively. When assuming a disaster, the cost-prioritized strategy may result in some nodes not receiving the necessary parts. However, the resilient strategy was able to utilize costly sea transportation to reduce costs after the disaster.

: Figure 9 Transportation simulation before and after disasters
Left: Equivalent three optimal transportation problems. Middle: Efficiency-oriented. Right: Resilience-oriented.

In Conclusion

To achieve sustainable facilities, cities, and societies, resilient systems that can adapt to environmental changes are essential. At Toyota, we will continue to research the resilience of network systems to support services and businesses that will produce future happiness.

Author

Tomohiko Jimbo
Joined Toyota Central R&D Labs., Inc. in 2002. Engaged in research on modeling and control of automotive engines, machine learning-based health monitoring and optimal design of vehicles and structures, learning and control of aerial robots, as well as distributed control and reinforcement learning of multi-robots. Seconded to Toyota from April 2021 to March 2024, conducting research on distributed control and reinforcement learning of network systems, as well as optimal transportation. Returned to Toyota Central R&D Labs. in April 2024. Ph.D. in Engineering.

References

*1	Kazuki Shibata, Tomohiko Jimbo, T. Odashima, Keisuke Takeshita, Takamitsu Matsubara, "Learning Locally, Communicating Globally: Reinforcement Learning of Multi-robot Task Allocation for Cooperative Transport," The 22nd World Congress of the International Federation of Automatic Control (IFAC 2023), 2023.
*2	Yuma Shida, Tomohiko Jimbo, Tadashi Odashima, Takamitsu Matsubara, "Reinforcement Learning of Multi-robot Task Allocation for Multi-object Transportation with Infeasible Tasks," arXiv:2404.11817, 2024.
*3	Yusei Naito, Tomohiko Jimbo, Tadashi Odashima, Takamitsu Matsubara, "Task-priority Intermediated Hierarchical Distributed Policies: Reinforcement Learning of Adaptive Multi-robot Cooperative Transport," arXiv:2404.02362, 2024.
*4	Koshi Oishi, Yasushi Amano, Tomohiko Jimbo, "Cooperative Transportation using Multiple Single-Rotor Robots and Decentralized Control for Unknown Payloads," IEEE International Conference on Robotics and Automation (ICRA), 2022.
*5	Koshi Oishi, Yasushi Amano, Tomohiko Jimbo, "Scratch Team of Single-Rotor Robots and Decentralized Cooperative Transportation with Robot Failure," arXiv:2307.00705, 2023. Decentralized Control for Heterogeneous Battery Energy Storage System
*6	Yusuke Hakuta, Yasushi Amano, Tomohiko Jimbo, Shuji Tomura, "Decentralized Control for Heterogeneous Battery Energy Storage System," The 22nd World Congress of the International Federation of Automatic Control (IFAC 2023), 2023.
*7	Koshi Oishi, Yota Hashizume, Tomohiko Jimbo, Hirotaka Kaji, Kenji Kashima, "Resilience Evaluation of Entropy Regularized Logistic Networks with Probabilistic Cost," The 22nd World Congress of the International Federation of Automatic Control (IFAC 2023), 2023.
*8	Koshi Oishi, Yota Hashizume, Tomohiko Jimbo, Hirotaka Kaji, Kenji Kashima, "Imitation-regularized Optimal Transport on Networks: Provable Robustness and Application to Logistics Planning," arXiv:2402.17967, 2024.