Top robotics names discuss humanoids, generative AI and more

ast month, I took an extended break. In a bid to keep my robotics newsletter Actuator (subscribe here) up and running, however, I reached out to some of the biggest names in the industry. I asked people from CMU, UC Berkeley, Meta, Nvidis, Boston Dynamics and the Toyota Research Institute the same six questions, covering topics like generative AI,

the humanoid form factor, home robots and more. You’ll find all of the answers organized by question below. You would be hard-pressed to find a more comprehensive breakdown of robotics in 2023 and the path it’s blazing for future technologies.

What role(s) will generative AI play in the future of robotics?

Matthew Johnson-Roberson, CMU: Generative AI, through its ability to generate novel data and solutions, will significantly bolster the capabilities of robots. It could enable them to better generalize across a wide range of tasks, enhance their adaptability to new environments and improve their ability to autonomously learn and evolve.

Dhruv Batra, Meta: I see generative AI playing two distinct roles in embodied AI and robotics research:

  • Data/experience generators
    Generating 2D images, video, 3D scenes, or 4D (3D + time) simulated experiences (particularly action/language conditioned experiences) for training robots because real-world experience is so scarce in robotics. Basically, think of these as “learned simulators.” And I believe robotics research simply cannot scale without training and testing in simulation.
  • Architectures for self-supervised learning
    Generating sensory observations that an agent will observe in the future, to be compared against actual observations, and used as an annotation-free signal for learning. See Yann’s paper on AMI for more details.

Aaron Saunders, Boston Dynamics: The current rate of change makes it hard to predict very far into the future. Foundation models represent a major shift in how the best machine learning models are created, and we are already seeing some impressive near-term accelerations in natural language interfaces. They offer opportunities to create conversational interfaces to our robots, improve the quality of existing computer vision functions and potentially enable new customer-facing capabilities such as visual question answering. Ultimately we feel these more scalable architectures and training strategies are likely to extend past language and vision into robotic planning and control. Being able to interpret the world around a robot will lead to a much richer understanding on how to interact with it. It’s a really exciting time to be a roboticist!

Russ Tedrake, TRI: Generative AI has the potential to bring revolutionary new capabilities to robotics. Not only are we able to communicate with robots in natural language, but connecting to internet-scale language and image data is giving robots a much more robust understanding and reasoning about the world. But we are still in the early days; more work is needed to understand how to ground image and language knowledge in the types of physical intelligence required to make robots truly useful.

Ken Goldberg, UC Berkeley: Although the rumblings started a bit earlier, 2023 will be remembered as the year when generative AI transformed robotics. Large language models like ChatGPT can allow robots and humans to communicate in natural language. Words evolved over time to represent useful concepts from “chair” to “chocolate” to “charisma.” Roboticists also discovered that large Vision-Language-Action models can be trained to facilitate robot perception and to control the motions of robot arms and legs. Training requires vast amounts of data so labs around the world are now collaborating to share data. Results are pouring in and although there are still open questions about generalization, the impact will be profound.

Leave a Comment