“咖啡馆入口看起来可以进入吗？门在哪里？” 面向视觉询问的地理空间人工智能代理

2508.15752v1

中文标题#

“咖啡馆入口看起来可以进入吗？门在哪里？” 面向视觉询问的地理空间人工智能代理

英文标题#

"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries

中文摘要#

交互式数字地图彻底改变了人们旅行和了解世界的方式；然而，它们依赖于 GIS 数据库中的现有结构化数据（例如，道路网络、兴趣点索引），这限制了它们解决与世界外观相关的地理视觉问题的能力。我们提出了 Geo-Visual Agents 的愿景 —— 一种多模态 AI 代理，能够通过分析大规模地理空间图像库（包括街道景观（例如，Google 街景）、基于地点的照片（例如，TripAdvisor、Yelp）和航空影像（例如，卫星照片）以及传统 GIS 数据源，来理解和回应关于世界的细微视觉空间查询。我们定义了我们的愿景，描述了感知和交互方法，提供了三个示例，并列出了未来工作的关键挑战和机遇。

英文摘要#

Interactive digital maps have revolutionized how people travel and learn about the world; however, they rely on pre-existing structured data in GIS databases (e.g., road networks, POI indices), limiting their ability to address geo-visual questions related to what the world looks like. We introduce our vision for Geo-Visual Agents--multimodal AI agents capable of understanding and responding to nuanced visual-spatial inquiries about the world by analyzing large-scale repositories of geospatial images, including streetscapes (e.g., Google Street View), place-based photos (e.g., TripAdvisor, Yelp), and aerial imagery (e.g., satellite photos) combined with traditional GIS data sources. We define our vision, describe sensing and interaction approaches, provide three exemplars, and enumerate key challenges and opportunities for future work.

文章页面#

“咖啡馆入口看起来可以进入吗？门在哪里？” 面向视觉询问的地理空间人工智能代理

PDF 获取#

查看中文 PDF - 2508.15752v1

智能达人抖店二维码

抖音扫码查看更多精彩内容