We present a system for collision-free control of a robot manipulator that uses only RGB views of the world. Perceptual input of a tabletop scene is provided by multiple images of an RGB camera (without depth) that is either handheld or mounted on the robot end effector. A NeRF-like process is used to reconstruct the 3D geometry of the scene, from which the Euclidean full signed distance function (ESDF) is computed. A model predictive control algorithm is then used to control the manipulator to reach a desired pose while avoiding obstacles in the ESDF. We show results on a real dataset collected and annotated in our lab.
@inproceedings{tang2023icra:rgbonly,
author = "Zhenggang Tang and Balakumar Sundaralingam and Jonathan Tremblay and Bowen Wen and Ye Yuan and Stephen Tyree and Charles Loop and Alexander Schwing and Stan Birchfield",
title = "{RGB}-Only Reconstruction of Tabletop Scenes for Collision-Free Manipulator Control",
booktitle = "ICRA",
year = 2023
}