中文标题#
LookOut:現實世界中的人形自我中心導航
英文标题#
LookOut: Real-World Humanoid Egocentric Navigation
中文摘要#
從第一人稱觀察中預測無碰撞的未來軌跡在人形機器人、VR/AR 和輔助導航等應用中至關重要。 在本工作中,我們引入了一個具有挑戰性的問題,即從第一人稱視頻中預測一系列未來的 6D 頭部姿態。 特別是,我們預測頭部的平移和旋轉,以學習通過頭部轉動事件表達的主動信息獲取行為。 為了解決這個任務,我們提出了一種框架,該框架對時間聚合的 3D 潛在特徵進行推理,該框架對環境靜態和動態部分的幾何和語義約束進行建模。 受該領域缺乏訓練數據的啟發,我們進一步貢獻了一個使用 Project Aria 眼鏡的數據收集管道,並通過這種方法展示了數據集。 我們的數據集名為 Aria Navigation Dataset(AND),包含用戶在現實場景中導航的 4 小時記錄。 它包括各種情況和導航行為,為學習現實世界的第一人稱導航策略提供了寶貴的資源。 大量實驗表明,我們的模型學到了類似人類的導航行為,例如等待 / 減速、重新規劃路線以及在陌生環境中四處張望以觀察交通情況。 請訪問我們的項目網頁https://sites.google.com/stanford.edu/lookout。
英文摘要#
The ability to predict collision-free future trajectories from egocentric observations is crucial in applications such as humanoid robotics, VR / AR, and assistive navigation. In this work, we introduce the challenging problem of predicting a sequence of future 6D head poses from an egocentric video. In particular, we predict both head translations and rotations to learn the active information-gathering behavior expressed through head-turning events. To solve this task, we propose a framework that reasons over temporally aggregated 3D latent features, which models the geometric and semantic constraints for both the static and dynamic parts of the environment. Motivated by the lack of training data in this space, we further contribute a data collection pipeline using the Project Aria glasses, and present a dataset collected through this approach. Our dataset, dubbed Aria Navigation Dataset (AND), consists of 4 hours of recording of users navigating in real-world scenarios. It includes diverse situations and navigation behaviors, providing a valuable resource for learning real-world egocentric navigation policies. Extensive experiments show that our model learns human-like navigation behaviors such as waiting / slowing down, rerouting, and looking around for traffic while generalizing to unseen environments. Check out our project webpage at https://sites.google.com/stanford.edu/lookout.
文章页面#
PDF 获取#
抖音掃碼查看更多精彩內容