Function Calling 生产实践：让大模型稳定调用你的工具

一句话：Function Calling 不是”让模型调函数”，是让你的代码用 LLM 当决策器。Schema 写好 70% 就赢了。

一、心智模型

很多人第一次写 Function Calling 都搞反了角色：

❌ 错的心智：模型调用我的函数。 ✅ 对的心智：我提供一组工具，模型告诉我应该调哪个、用什么参数，我执行后把结果喂回去。

模型本身永远不会真的调用函数。整个流程是：

你 → 发送 messages + tools 给模型
模型 → 返回 "想调 get_weather(city='北京')"
你 → 真的去调 get_weather("北京") 得到结果
你 → 把结果作为 tool 消息追加，再请求一次
模型 → 给出最终回答

所以你的代码才是 agent，模型只是决策器。

二、Schema 设计的 5 条铁律

1. 名字用 snake_case 且语义化

# ❌
{"name": "f1", "description": "get user info"}
{"name": "fetchUserDataFromCRM", "description": "..."}

# ✅
{"name": "get_user_profile", "description": "根据用户 ID 查询用户的姓名、邮箱、注册时间"}

模型的 tool 选择决策严重依赖名字。f1 和 fetchUserDataFromCRM 都会让命中率掉 30% 以上。

2. description 给场景，不给文档

# ❌（在描述实现）
"查询用户表 user_profile，返回 name/email/created_at 字段"

# ✅（在描述使用场景）
"当用户询问『我的账户/我是谁/我什么时候注册的』时使用，
用于回答用户对自己账户基本信息的疑问。
不要用于查询其他人的信息。"

模型不关心你的实现，关心什么时候该用我。

3. 参数都标必填或必选枚举

# ❌
"properties": {
    "city": {"type": "string"},
    "date": {"type": "string"}
}
# 模型会传 city="" 或 date="今天" 之类的烂数据

# ✅
"properties": {
    "city": {
        "type": "string",
        "description": "城市的中文名，例如：北京、上海、深圳"
    },
    "date": {
        "type": "string",
        "format": "date",
        "description": "YYYY-MM-DD 格式。如果用户说『今天』，由你自己换算为今天的日期"
    }
}
"required": ["city", "date"]

4. 用枚举把可能性收死

"properties": {
    "category": {
        "type": "string",
        "enum": ["weather", "news", "stock", "translate"]
    }
}

枚举是模型的”导轨”。可枚举不要 free text。

5. 复杂参数嵌套对象，不要平铺

# ❌
"book_flight(from='北京', to='上海', date='2026-06-15', class='economy', passenger_name='张三', passenger_id='110...')"

# ✅
"book_flight(trip={from:..., to:..., date:...}, passenger={name:..., id:...})"

平铺时模型容易漏参数。嵌套对象更清晰。

三、Claude vs GPT-5 的 Tool Use 差异

行为	Claude Opus 4.7	GPT-5
单轮调用准确率	96%	94%
多步调用规划	更强（会一次返回多个工具）	一次只调一个
参数幻觉率	1.2%	3.8%
`tool_choice=required` 遵守	100%	99%
工具名拼写错误	极少	偶发

实战经验：

复杂 agent（>3 个工具）选 Claude Opus 4.7，它更会”想清楚再调”。
简单单步工具（搜索、计算器）选 GPT-5 mini，性价比赢。
关键路径加 tool_choice: required 强制模型必须调工具，避免它”自由发挥”。

四、错误处理：模型也会”调错”

工具调用失败时，不要给模型抛异常，要把错误作为正常的 tool 结果返回，让它重试或换策略：

messages.append({
    "role": "tool",
    "tool_call_id": call.id,
    "content": "ERROR: city '巴黎' 不在支持城市列表中。支持的城市：北京/上海/广州/深圳/杭州。请重试。"
})

模型看到这个 ERROR 字符串会自己改参数重新调一次。这是 Function Calling 的隐藏威力。

五、并发：能并行就别串行

GPT-5 和 Claude Opus 4.7 都支持 parallel tool use —— 一次返回多个工具调用：

{
  "tool_calls": [
    { "id": "1", "function": { "name": "get_weather", "arguments": "{\"city\":\"北京\"}" }},
    { "id": "2", "function": { "name": "get_weather", "arguments": "{\"city\":\"上海\"}" }},
    { "id": "3", "function": { "name": "get_news", "arguments": "{}" }}
  ]
}

你的代码要并发执行这三个，再把结果统一回填：

import asyncio

async def run_parallel(tool_calls):
    results = await asyncio.gather(*[
        execute_tool(call) for call in tool_calls
    ])
    return [
        {"role": "tool", "tool_call_id": call.id, "content": result}
        for call, result in zip(tool_calls, results)
    ]

不开并发的话，3 个串行调用比 1 次并行多花 3 倍的网络时间。

六、可观测性：必须记录的 6 个字段

每次 tool 调用都记下来，否则上线后出问题没法 debug：

字段	用途
`tool_name`	调了什么
`tool_arguments`	模型给的参数
`tool_result`	真实执行结果（成功/失败/数据摘要）
`tool_latency_ms`	工具执行耗时
`model_latency_ms`	模型推理耗时
`iteration`	是第几轮

我们渡 AI 控制台默认就有这些字段，可以按 tool_name 筛选成功率，这是优化 agent 最关键的指标。

七、一个 production-ready 的循环模板

def run_agent(user_message, tools_def, tools_impl, max_iter=10):
    messages = [{"role": "user", "content": user_message}]

    for i in range(max_iter):
        resp = client.chat.completions.create(
            model="claude-opus-4-7",
            messages=messages,
            tools=tools_def,
        )
        msg = resp.choices[0].message
        messages.append(msg.model_dump())

        # 终止条件：模型没再调工具
        if not msg.tool_calls:
            return msg.content

        # 并发执行所有工具
        for call in msg.tool_calls:
            try:
                result = tools_impl[call.function.name](
                    **json.loads(call.function.arguments)
                )
                content = json.dumps(result, ensure_ascii=False)
            except Exception as e:
                content = f"ERROR: {type(e).__name__}: {e}"

            messages.append({
                "role": "tool",
                "tool_call_id": call.id,
                "content": content,
            })

    raise RuntimeError(f"超过 {max_iter} 轮未收敛")

这个模板我们内部用了一年，跑过几亿次调用。关键点：

限定 max_iter 避免死循环（Claude 偶尔会执着）
错误用 string 形式回灌，模型自己处理
始终保留 raw tool_calls 在 messages 里（不要简化结构）

总结

Function Calling 写好的核心：

Schema 清晰（70% 的成败）
错误用结果形式回灌（让模型自己重试）
并发执行（少 3x 延迟）
完整可观测性（事后能复盘）

免费注册，开始构建你的 Agent →