Danfei Xu: Human Data, Behavior Cloning, Robot GPT-3, Full Stack, EgoMimic, Teleoperation, UMI

00:02:31 - How Danfei Xu first became interested in robotics 00:05:34 - Why he decided in high school to go to the United States for college 00:06:38 - His experience applying to U.S. colleges on his own 00:08:44 - Setbacks in the U.S. college application process and choosing Dickinson College 00:11:40 - How his experiences before age 18 shaped his ability to adapt to uncertainty 00:13:40 - How he “cold-called” his way into SynTouch as an undergraduate to do robotics research 00:18:36 - Why he preferred “making things move” over pure algorithms 00:19:19 - A CMU summer research experience and the story of driving to knock on a professor’s office door 00:21:02 - Autonomous vehicle localization and early full-stack robotics experience 00:24:00 - Why he chose Stanford for his PhD, even though it was then seen as a “robotics desert” 00:26:34 - The atmosphere around Stanford CS and deep learning in 2015 00:27:13 - PhD rotations, VR headsets, and early human motion capture 00:28:34 - Why he refused to continue working on scene graphs and returned to robotics 00:30:18 - The two mainstream directions in robot learning around 2016/2017 00:32:06 - From one-shot imitation learning to task and motion planning 00:33:15 - How he now thinks about structure, compositionality, and reflections on the Bitter Lesson 00:35:51 - Why he moved from task-and-motion planning toward teleoperation and behavior cloning 00:36:12 - How a DeepMind internship convinced him that behavior cloning could work 00:39:47 - Why the field largely looked down on behavior cloning at the time 00:42:27 - His RSS 2020 paper: Franka teleoperation, a BC system, and a sign of life 00:44:31 - How academia can encourage systematic research that truly works 00:46:14 - Why he did not believe RL at the time could scale to real robots 00:47:08 - Why this behavior cloning work did not trigger a paradigm shift at the time 00:49:45 - Why the hardest part of behavior cloning is the system, not the model 00:52:53 - Key insights from several internships during his PhD 00:54:33 - Whether the future of robotics will be broken into pipelines like autonomous driving 00:54:59 - Looking back on his PhD: which judgments were right, and which directions he later abandoned 00:57:17 - Why he chose academia: freedom, resources, and independent research taste 00:59:24 - What robot learning is, and how it differs from traditional robotics 01:01:24 - What is most overestimated and underestimated in robot learning 01:01:53 - Basic categories of robot data, human data, and human-derived data 01:03:52 - The origins and evolution of EgoMimic: why he bet on first-person human data 01:09:39 - Why he shifted from teleoperation data to human data 01:11:40 - What can actually be learned from ego video 01:15:20 - Why first-person video is more critical than third-person YouTube video 01:20:17 - Why SLAM / VIO is key to treating humans as robots 01:24:07 - Where the moat of SLAM lies today 01:27:02 - How important touch and force really are in human data 01:30:16 - Ranking the importance of different modalities in human data 01:32:27 - Whether UMI data counts as robot data or human data 01:34:40 - The long-term relationship between teleoperation, UMI, and pure human data 01:36:35 - Why dexterous five-fingered hands and robot embodiment determine the upper limit of transfer 01:38:21 - Whether human data enables humanoid robots, or humanoid robots enable human data 01:39:46 - Whether human data will lock robots into human configurations and human-level capabilities 01:42:51 - The intelligence ceiling of human data if data, compute, and hardware were unlimited 01:44:16 - How an “internet data infrastructure” for robotics might emerge 01:47:22 - How much data is needed to train a human-level robot 01:49:08 - Why casual, everyday human data is the most valuable 01:53:37 - Whether the human data pipeline will become a moat or a commodity 01:55:03 - EgoVerse and the significance of open human data in academia 01:56:31 - Whether the success of human data will inevitably lead to closed commercial systems 01:58:31 - If human data does not become the foundation, which layer of assumptions might be wrong 01:59:39 - Why full-stack robotics is a core capability 02:00:47 - Buy or build: which capabilities robotics teams must keep in-house 02:02:34 - What kinds of modeling methods human data may favor 02:04:33 - How far today’s robots are from the tool intelligence of Betty the crow 02:08:26 - The culture of Danfei Lab: why everyone needs to be full stack 02:10:07 - How young researchers can find their place in academia and industry 02:11:10 - Whether doing a robotics PhD in 2026 is harder or easier than ten years ago 02:12:30 - How to judge whether a direction is merely trendy or truly important 02:13:47 - His personal goal: pushing robotics toward its GPT-3 moment 02:14:26 - Advice he would give to himself ten years ago, when he had just entered the field