上下文窗口膨胀的杀手 Agent Skill

geekbing2026-03-14AIAgent

Skills 让你的 Agent 可以按需获取专项指令，而不会让系统提示词变得臃肿不堪。你无需将所有可能用到的指令都塞进一个提示词里面，通过可以定义模块化的技能包，Agent 只会在需要的时候，才去发现并激活对应的技能。

什么是 Skills（技能）

随着 Agent 承担的任务越来越复杂，系统提示词也会不断膨胀。一个同时处理 PDF 解析、数据分析、代码审查和邮件撰写的 Agent，最终可能会产生一个塞满各种功能指令的超长提示词。这会带来几个问题：

上下文窗口膨胀 - 过长的提示词会占用大量 token，挤压用于推理和对话的空间
指令混乱 - 把几十条互不相关的指令堆在一个提示词里面，模型很难准确遵循
维护困难 - 单体式提示词难以更新、版本管理，也不便于跨团队共享

Skills 通过把指令拆分成一个个独立完整的模块来解决这个问题。Agent 能看到一份可用技能的清单，只在需要某项技能时才加载对应的完整指令，就像开发者只有在做某个具体任务时，才会去翻对应的参考手册一样。

这就是为什么 Skills 被称为上下文窗口膨胀的杀手。假设你的 Agent 需要处理 PDF 解析、代码审查、数据分析、邮件撰写四个领域，传统方式下系统提示词可能需要 8000-12000 tokens 来容纳所有指令。而使用 Skills 后，基础提示词只需 200-300 tokens 存储技能元数据，每次只在需要时加载特定技能的 1000-2000 tokens 指令。这意味着：

成本降低 70-85% - 大部分调用只需要基础提示词
响应更精准 - Agent 不会被无关指令干扰
无限扩展 - 添加新技能只增加一行元数据

更重要的是，这种按需加载的机制可以为 Agent 配备几十甚至上百个专项技能，而不用担心提示词超出上下文窗口的限制。

Skills 的工作原理

下面我会用 Strands Agents SDKopen in new window 的 AgentSkills 插件来演示，带你快速了解 Skills 的工作原理和使用方式。

AgentSkills 插件遵循 Agent Skills 规范open in new window，采用渐进式信息披露的机制，只把轻量级的元数据（名称和描述）注入到系统提示词，当 Agent 通过工具调用激活某项技能时，才按需加载完整的指令内容。这样既能保持上下文窗口的精简，又能让 Agent 在需要时获取深度的专项知识。

AgentSkills插件分三个阶段运行：

发现阶段 - 插件在初始化时读取技能的元数据（名称和描述），并以 XML 块的形式注入到 Agent 的系统提示词中。Agent 可以看到有哪些技能可用，但不会加载完整的指令内容。
激活阶段 - 当 Agent 判断需要某项技能时，会带上技能名称调用 skills 工具，该工具会返回完整的指令内容、元数据以及所有可用资源文件的列表。
执行阶段 - Agent 按照加载的指令执行任务。如果该技能包含资源文件（脚本、参考文档、素材等），Agent 可以通过你提供的工具来访问这些文件。

注入到系统提示词中的元数据格式如下：

<available_skills>
<skill>
<name>pdf-processing</name>
<description>Extract text and tables from PDF files.</description>
<location>/path/to/pdf-processing/SKILL.md</location>
</skill>
</available_skills>

这个 XML 块在每次调用前都会刷新，所以通过 set_available_skillsopen in new window 对可用技能所做的任何修改，都会立即生效。已激活的技能会记录在 Agent 的状态open in new window 中，以便在整个会话期间持续保持。

set_available_skills 可以在运行时，完全替换 AgentSkills 插件所管理的可用技能列表，并让这一变化，在下一次 Agent 调用时，立即反映到系统提示词中。它是实现按需、动态、模块化技能系统的关键运行时接口。

使用方法

Strands Agents SDK 的 AgentSkills 插件支持多种形式的技能来源，包括文件系统路径、父目录或程序化的 Skill 实例，可以传入单个来源，也可以传入一个列表。

from strands import Agent, AgentSkills, Skill

# 单个 skill 目录，不需要使用列表
plugin = AgentSkills(skills="./skills/pdf-processing")

# 父目录，会加载所有包含 SKILL.md 的子目录
plugin = AgentSkills(skills="./skills/")

# 混合来源
plugin = AgentSkills(skills=[
    "./skills/pdf-processing",     # 单个 skill 目录
    "./skills/",                   # 父目录（加载所有子 skill）
    Skill(                         # 在代码中定义 skill
        name="custom-greeting",
        description="Generate custom greetings",
        instructions="Always greet the user by name with enthusiasm.",
    ),
])

agent = Agent(plugins=[plugin])

为资源访问提供工具

AgentSkills 插件只负责技能的发现和激活，不内置任何用于读取文件或执行脚本的工具。这是故意这么设计的，目的是让插件与技能的存储位置或资源的访问方式解耦。

当一项技能被激活时，工具返回的结果中会包含可用资源文件的列表（来自 scripts/、references/ 和 assets/ 子目录），但要真正读取这些文件或运行脚本，需要你自己提供相应的工具。这样你就能完全掌控 Agent 可以访问的内容。

对于基于文件系统的技能，strands-agents-tools 中的的 file_read 和 shell 是最简单的入门方式：

from strands import Agent, AgentSkills
from strands_tools import file_read, shell

plugin = AgentSkills(skills="./skills/")

agent = Agent(
    plugins=[plugin],
    tools=[file_read, shell],
)

你也可以根据自己的环境选择其他工具。比如，对于需要访问远程资源的技能可以用 http_request，对于需要在沙箱环境中执行脚本的场景可以用 AgentCoreopen in new window 代码解释器工具。根据技能的资源访问模式和你的安全要求来选择合适的工具即可。

在代码中定义 Skill

使用 Skill 数据类，无需文件系统目录，直接在代码中创建技能：

from strands import Skill

# 直接创建
skill = Skill(
    name="code-review",
    description="Review code for best practices and bugs",
    instructions="Review the provided code. Check for...",
)

# 从 SKILL.md 内容解析
skill = Skill.from_content("""---
name: code-review
description: Review code for best practices and bugs
---
Review the provided code. Check for...
""")

# 从指定目录加载
skill = Skill.from_file("./skills/code-review")

# 从父目录加载所有 skills
skills = Skill.from_directory("./skills/")

运行时管理技能

插件创建之后，你可以随时添加、替换或查看技能。插件在每次调用前都会刷新系统提示词中的 XML 块，因此修改会在下一次 Agent 调用时生效。

from strands import Agent, AgentSkills, Skill

plugin = AgentSkills(skills="./skills/pdf-processing")
agent = Agent(plugins=[plugin])

# 查看可用 skills
for skill in plugin.get_available_skills():
    print(f"{skill.name}: {skill.description}")

# 在运行时添加一个新 skill
new_skill = Skill(
    name="summarize",
    description="Summarize long documents",
    instructions="Read the document and produce a concise summary...",
)
plugin.set_available_skills(
    plugin.get_available_skills() + [new_skill]
)

# 替换所有 skills
plugin.set_available_skills(["./skills/new-set/"])

# 检查 agent 已激活了哪些 skills
activated = plugin.get_activated_skills(agent)
print(f"Activated skills: {activated}")

SKILL.md 的格式

技能遵循 Agent Skills 规范open in new window。一个技能就是一个子目录，里面包含一个带有 YAML 前置元数据和 Markdown 格式指令的 SKILL.md 文件。完整的技能编写规范请参考规范文档。

---
name: pdf-processing
description: Extract text and tables from PDF files
allowed-tools: file_read shell
---
# PDF processing

You are a PDF processing expert. When asked to extract content from a PDF:

1. Use `shell` to run the extraction script at `scripts/extract.py`
2. Use `file_read` to review the output
3. Summarize the extracted content for the user

前置元数据字段说明如下：

字段	是否必填	描述
`name`	是	唯一标识符，只能包含小写字母、数字和连字符，长度 1 到 64 个字符
`description`	是	技能的功能描述，这段文字会显示在系统提示词中
`allowed-tools`	否	该技能所使用的工具名称列表，以空格分隔
`metadata`	否	用于存储自定义数据的额外键值对
`license`	否	许可证标识符（例如 `Apache-2.0`）
`compatibility`	否	兼容性信息描述

资源目录

技能可以包含资源文件，按以下三个标准子目录组织：

my-skill/
├── SKILL.md
├── scripts/       # Agent 可执行的脚本
│   └── process.py
├── references/    # 参考文档和指南
│   └── API.md
└── assets/        # 静态文件（模板、配置、数据）
    └── template.json

当 Agent 激活某项技能时，工具返回的结果中会列出这些目录下所有可用的资源文件，Agent 随后可以通过你提供的工具来访问它们。

配置说明

AgentSkills 构造函数支持以下参数：

参数	类型	默认值	描述
`skills`	`SkillSources`	必填	一个或多个技能来源（路径、`Skill` 实例或两者混合）
`state_key`	`str`	`"agent_skills"`	用于在 `agent.state` 中存储插件状态的键名
`max_resource_files`	`int`	`20`	技能激活响应中列出的资源文件最大数量
`strict`	`bool`	`False`	设为 `True` 时，校验问题会抛出异常而不是记录警告

已激活的技能会按照配置的 state_key 记录在 Agent 状态中，这意味着在同一个会话内，已激活的技能会在多次调用之间持续保持，并且可以被序列化用于会话管理。

与其他方案的对比

当你的 Agent 需要处理多个专业领域，但又不需要一次性加载所有指令时，Skills 是最合适的选择。以下是各方案的对比：

方案	适用场景	取舍
系统提示词	简短且始终相关的指令	功能多了之后会变得难以维护
Steeringopen in new window	动态的、基于上下文的引导和校验	配置较为复杂
Skills	模块化的领域专项指令集	激活时需要一次工具调用
多 Agent	角色或模型存在本质差异的场景	复杂度和延迟更高

当你希望用一个 Agent 通过在合适的时机加载正确的指令来处理各种各样的任务，同时又不想引入多 Agent 架构带来的额外开销时，Skills 就是你需要的方案。

实际场景：构建一个多功能研究助手

假设你要构建一个能处理以下任务的研究助手：

PDF 文献提取和分析
代码审查和解释
数据可视化
学术论文撰写
邮件起草
网页内容抓取

传统方式的系统提示词：

你是一个研究助手...

# PDF 处理指令
当用户上传 PDF 时，你需要：
1. 使用 pymupdf 提取文本...
2. 识别表格结构...
3. 提取图片和图表...
[500-1000 tokens]

# 代码审查指令
审查代码时，检查以下方面：
1. 代码风格和规范...
2. 性能问题...
3. 安全漏洞...
[800-1200 tokens]

# 数据可视化指令
创建图表时遵循...
[600-800 tokens]

# 论文撰写指令
撰写学术论文时...
[1000-1500 tokens]

# 邮件起草指令
...
[400-600 tokens]

# 网页抓取指令
...
[500-700 tokens]

总计：约 4300-6300 tokens

使用 Skills 的系统提示词：

你是一个研究助手...

<available_skills>
<skill>
<name>pdf-processing</name>
<description>Extract text and tables from PDF files...</description>
</skill>

<skill>
<name>code-review</name>
<description>Review code for best practices and bugs...</description>
</skill>

<skill>
<name>data-visualization</name>
<description>Create charts and graphs from data...</description>
</skill>

<skill>
<name>academic-writing</name>
<description>Write academic papers with proper citations...</description>
</skill>

<skill>
<name>email-drafting</name>
<description>Draft professional emails...</description>
</skill>

<skill>
<name>web-scraping</name>
<description>Extract content from web pages...</description>
</skill>
</available_skills>

总计：约 200-300 tokens

当用户说 帮我分析这个 PDF 时，Agent 才会激活 pdf-processing 技能并加载其完整的 500-1000 tokens 指令。其他五个技能的详细指令完全不会占用上下文窗口。

结果对比：

传统方式每次调用 - 4300-6300 tokens（固定成本）
Skills 方式每次调用 - 200-300 tokens（基础）+ 500-1500 tokens（按需）
节省比例 - 60-80%

实践演示

本文采用最常规的方式，从目录中加载技能以及相关资源文件，这也是很多 Agent 框架采用的方式，比如 Claude Code SDK。

环境准备

使用最流行的 Python 包管理工具 uvopen in new window：

curl -LsSf https://astral.sh/uv/install.sh | sh

创建虚拟环境

uv init agent_skill
cd agent_skill

安装依赖

uv add 'strands-agents[openai]' strands-agents-tools pymupdf

添加 .env

# 全局执行工具，跳过确认，生产环境不建议使用
BYPASS_TOOL_CONSENT=true
OPENAI_API_KEY=your_openai_api_key_here

技能目录结构

下面是三个示例的技能目录组织情况，从上至下，由简单到复杂。

skills/
├── hello-skill/
│   └── SKILL.md
├── explain-code/
│   └── SKILL.md
└── pdf-processing/
    ├── SKILL.md
    └── scripts/
        └── extract.py

示例一 Hello Skill

首先我们来看 hello-skill 的技能定义，当用户提到 hello 就激活调用技能。技能定义文件内容如下：

---
name: hello-skill
description: A simple test skill. Use when user says hello.
---

# Hello Skill

这是一个最简单的测试技能，用来验证 Agent Skills 功能是否正常工作。

## 使用方法

当用户说 "hello" 或要求测试技能时，回复:

👋 Hello! 这个技能正常工作了!

技能信息:

- 名称: hello-skill
- 状态: ✅ 运行正常

# main.py
import os
from dotenv import load_dotenv
from strands import Agent, AgentSkills
from strands.models.openai import OpenAIModel

load_dotenv()

model = OpenAIModel(
    client_args={"api_key": os.getenv("OPENAI_API_KEY")},
    model_id="gpt-5-mini",
)


plugin = AgentSkills(skills="./skills/")

agent = Agent(model=model, plugins=[plugin])

agent("hello")

执行命令：

uv run python main.py

输出结果：

Tool #1: skills
👋 Hello! 这个技能正常工作了!

技能信息:
- 名称: hello-skill
- 状态: ✅ 运行正常

示例二 Explain Code Skill

再来稍微复杂一点的例子，代码解释技能，当用户问到 explain-code.py 代码是怎么工作 或 解释一下这个代码 explain-code.py 时激活。技能定义文件内容如下：

---
name: explain-code
description: Explains code with visual diagrams and analogies. Use when explaining how code works, teaching about a codebase, or when the user asks "how does this work?"
---

When explaining code, always include:

1. **Start with an analogy**: Compare the code to something from everyday life
2. **Draw a diagram**: Use ASCII art to show the flow, structure, or relationships
3. **Walk through the code**: Explain step-by-step what happens
4. **Highlight a gotcha**: What's a common mistake or misconception?

Keep explanations conversational. For complex concepts, use multiple analogies.

# explain-code.py
import os
from dotenv import load_dotenv
from strands import Agent, AgentSkills
from strands.models.openai import OpenAIModel
from strands_tools import file_read, shell

load_dotenv()

model = OpenAIModel(
    client_args={"api_key": os.getenv("OPENAI_API_KEY")},
    model_id="gpt-5-mini",
)


plugin = AgentSkills(skills="./skills/")

agent = Agent(model=model, plugins=[plugin], tools=[file_read, shell])

agent("解释一下这个代码: explain-code.py")

输出结果（精简版）：

Tool #1: skills

Tool #2: file_read
类比（开场白）
想象你在组装一台"智能助理"机器：先把钥匙（API 密钥）从抽屉里取出，装上一个会说话的大脑（模型），给它一套技能包（skills 插件），再配上工具箱（file_read、shell）。当你对它说"解释一下 explain-code.py"，它就会开始工作并把结果返回给你。

ASCII 流程图（简明示意）
script (explain-code.py)
  |
  +-- load .env --> env vars (OPENAI_API_KEY)
  |
  +-- create OpenAIModel(client_args={"api_key": ...}, model_id="gpt-5-mini")
  |
  +-- AgentSkills(skills="./skills/")
  |
  +-- Agent(model, plugins=[plugin], tools=[file_read, shell])
  |
  +-- agent("解释一下这个代码: explain-code.py")  ---> agent 调用开始，可能使用 file_read/shell 等工具 ---> 返回解释文本

简短总结简短总结
这段脚本的作用是：加载环境变量 -> 创建一个 OpenAI 型模型实例 -> 加载 skills 插件 -> 构造一个带 file_read 与 shell 工具的 Agent -> 调用 Agent 解释代码。使用前需确保 .env 中配置了 OPENAI_API_KEY，并注意赋予 agent 的工具权限带来的安全风险。

示例三 PDF Processing Skill

最后一个示例从 PDF 文件中提取内容，然后 LLM 摘要输出提取内容，这里会涉及到脚本执行，我们需要编写 extract.py 脚本提取 PDF 的内容。

---
name: pdf-processing
description: Extract text and tables from PDF files
allowed-tools: shell
---
# PDF processing

You are a PDF processing expert. When asked to extract content from a PDF:

1. Use `shell` to run the extraction script at `skills/pdf-processing/scripts/extract.py`
2. Summarize the extracted content for the user

# extract.py
import sys
import json
from pathlib import Path

import pymupdf


def extract_text(pdf_path: str) -> dict:
    """提取 PDF 文本内容"""
    try:
        pdf_file = Path(pdf_path)
        if not pdf_file.exists():
            return {"success": False, "error": f"File not found: {pdf_path}"}

        text_content = []
        pdf_document = pymupdf.open(pdf_path)

        for page_num in range(len(pdf_document)):
            page = pdf_document[page_num]
            page_text = page.get_text("text")
            if page_text.strip():
                text_content.append({"page": page_num + 1, "text": page_text})

        pdf_document.close()

        return {
            "success": True,
            "total_pages": len(text_content),
            "content": text_content,
        }
    except Exception as e:
        return {"success": False, "error": str(e)}


if __name__ == "__main__":
    pdf_path = sys.argv[1]
    result = extract_text(pdf_path)
    print(json.dumps(result, ensure_ascii=False, indent=2))

输出结果（精简版）：

Tool #1: skills

Tool #2: shell
Do you want to proceed with execution? [y/*] y
{
  "success": true,
  "total_pages": 1,
  "content": [
    {
      "page": 1,
      "text": "Value Investing: The Use of Historical Financial Statement..."
    }
  ]
}


简短中文摘要：
- 这是 Joseph D. Piotroski（2002 年）关于价值投资的论文，研究如何用历史财务数据来挑选好公司。
- 核心结论：在高账面市值比（high book-to-market）的股票中，用会计指标筛选能显著提高收益。买入"预期赢家"并做空"预期输家"在 1976-1996 年间年化约 23%。
- 这个策略在中小型公司、低换手率、无分析师覆盖的股票上效果最好。市场对历史财务信息的反应有延迟，部分收益会在季度财报公布前后体现出来。