This paper focuses on transferring control policies between robot manipulators with different morphology. While reinforcement learning (RL) methods have shown successful results in performing robot manipulation tasks, transferring a trained policy from simulation to a real robot or deploying it on a robot with different kinematics or dynamics is challenging. Our key insight to achieve cross embodiment policy transfer is to project the state and action spaces of the source and target robots into a common latent space representation. We first introduce encoders and decoders for the source robot for state-action projection between its own space and a latent space. To regularize the latent space such that latent state evolution remains consistent, we introduce a latent dynamics constraint. In this stage, the encoders, decoders and latent dynamics are trained simultaneously with RL. Next, we use adversarial training with a cycle consistency constraint to align the latent distributions from the source and target domains using unpaired, unaligned, randomly collected data. The latent policy trained in the first stage is combined with the encoders and decoders trained in the second stage to achieve policy transfer to the target robot without access to the task reward or reward tuning in the target domain.
(a) Adversarial loss (b) Cycle consistency loss (c) Latent dynamics loss
Transferring Panda robot policy (top row) to Sawyer robot (middle row) and xArm6 robot (bottom row) for the Lift task in robosuite
Simulation to real transfer for PickPlace task. The source policy is trained with behavior cloning in simulation on a Panda robot (top row) and transferred to a real xArm6 robot (bottom row).