特权自我访问对人工智能中的内省很重要

2508.14802v1

中文标题#

特权自我访问对人工智能中的内省很重要

英文标题#

Privileged Self-Access Matters for Introspection in AI

中文摘要#

AI 模型能否进行内省是一个日益重要的实际问题。但关于内省如何定义尚无共识。从一个最近提出的 “轻量级” 定义出发，我们认为应采用更厚重的定义。根据我们的提议，AI 中的内省是指任何通过比第三方可用的计算成本相等或更低的过程更可靠的过程，从而获得内部状态信息的过程。通过实验，我们发现当大语言模型对其内部温度参数进行推理时，它们似乎表现出轻量级的内省，但根据我们提出的定义，它们并未真正实现有意义的内省。

英文摘要#

Whether AI models can introspect is an increasingly important practical question. But there is no consensus on how introspection is to be defined. Beginning from a recently proposed ''lightweight'' definition, we argue instead for a thicker one. According to our proposal, introspection in AI is any process which yields information about internal states through a process more reliable than one with equal or lower computational cost available to a third party. Using experiments where LLMs reason about their internal temperature parameters, we show they can appear to have lightweight introspection while failing to meaningfully introspect per our proposed definition.

文章页面#

特权自我访问对人工智能中的内省很重要

PDF 获取#

查看中文 PDF - 2508.14802v1

智能达人抖店二维码

抖音扫码查看更多精彩内容