誰が何を見た？ LLMsの認知推論のための構造化思考-行動シーケンス

2508.14564v1

日本語タイトル#

誰が何を見ているのか？ LLM における認識的推論のための構造化思考 - 行動シーケンス

英文タイトル#

Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs

日本語摘要#

近年、大規模言語モデル（LLMs）と推論フレームワークの進展は、自律エージェントの視点取得能力を向上させる新たな可能性を開きました。しかし、能動的な知覚、協調的推論、視点取得（別のエージェントが何を見たり知ったりできるかを理解する）を含むタスクは、現在の LLM ベースのシステムにとって依然として持続的な課題を提示しています。本研究は、Fast Downward プランナーによって生成された変換解決策グラフから導出された構造化された例の可能性を探求し、ReAct フレームワーク内で LLM ベースのエージェントのパフォーマンスを向上させることを目指しています。私たちは、最適な目標経路（G 型）、情報ノード経路（E 型）、および代替行動を対比する段階的最適意思決定シーケンス（L 型）の 3 つの異なるカテゴリの例を生成する構造化された解決策処理パイプラインを提案します。これらの解決策は、LLM に各決定の背後にある推論プロセスを明示的に表現するよう促すことによって、「思考 - 行動」例にさらに変換されます。L 型の例は、明確化要求と全体的な行動ステップをわずかに減少させますが、一貫した改善をもたらすことはありません。エージェントは基本的な注意フィルタリングを必要とするタスクで成功しますが、遮蔽された空間についてのメンタル化や認識的行動コストの評価を必要とするシナリオでは苦労します。これらの発見は、構造化された例だけでは堅牢な視点取得を実現するには不十分であることを示唆しており、LLM ベースのエージェントにおける社会的に基づいた協力を可能にするために、明示的な信念追跡、コストモデリング、およびより豊かな環境の必要性を強調しています。

英文摘要#

Recent advances in large language models (LLMs) and reasoning frameworks have opened new possibilities for improving the perspective-taking capabilities of autonomous agents. However, tasks that involve active perception, collaborative reasoning, and perspective taking (understanding what another agent can see or knows) pose persistent challenges for current LLM-based systems. This study investigates the potential of structured examples derived from transformed solution graphs generated by the Fast Downward planner to improve the performance of LLM-based agents within a ReAct framework. We propose a structured solution-processing pipeline that generates three distinct categories of examples: optimal goal paths (G-type), informative node paths (E-type), and step-by-step optimal decision sequences contrasting alternative actions (L-type). These solutions are further converted into ``thought-action'' examples by prompting an LLM to explicitly articulate the reasoning behind each decision. While L-type examples slightly reduce clarification requests and overall action steps, they do not yield consistent improvements. Agents are successful in tasks requiring basic attentional filtering but struggle in scenarios that required mentalising about occluded spaces or weighing the costs of epistemic actions. These findings suggest that structured examples alone are insufficient for robust perspective-taking, underscoring the need for explicit belief tracking, cost modelling, and richer environments to enable socially grounded collaboration in LLM-based agents.

文章ページ#

誰が何を見ているのか？ LLM における認識的推論のための構造化思考 - 行動シーケンス

PDF 获取#

日本語 PDF を表示 - 2508.14564v1

スマート達人抖店 QR コード

抖音でさらに素晴らしいコンテンツをチェック