🔧Tool Learning in the Wild:
Empowering Language Models as Automatic Tool Agents

1Shandong University 2University of Electronic Science and Technology of China
3Baidu Inc. 4University of Birmingham
5Leiden University

Brief Introduction

To integrate LLMs with tools, most previous work represents diverse tool-calling actions as special tokens, integrate these tokens into the text generation process of LLMs, and guide LLMs by specific tool-use workflows. However, these methods usually suffer from two challenges in realistic scenarios. First, it requires intensive expertise to effectively parse tool documentation and create examples to cover diverse usage, struggling to scale to large toolsets in practical applications. Consequently, LLMs show diminished performance when in-context examples are incomplete or missing, which potentially limits the scope of available tools to LLMs. Second, it is ad-hoc to manually define the tool-use workflow (e.g., step-by-step procedure and tool-calling format) for LLM, showing limited generalization to diverse tool specifications and restricting their flexibility in integrating multiple tools dynamically in a once tool-calling action.

geometric reasoning

Comparison between conventional tool-use flow (a) and the proposed framework (b).

In this work, we enquiry: Can we empower LLMs to automate tool-use flow and effectively manipulate diverse tools?

To achieve this, we propose a novel framework named AutoTools, which diverges from previous work by enabling LLMs as agents to automate tool-use workflow. AutoTools consists of two stages: (1) Tool Encapsulation and (2) Tool Programming.

Our AutoTools Framework

Different from the handcrafted and ad-hoc tool-use workflow, the proposed AutoTools consists of two main steps:

  • Tool Encapsulation In the Tool Encapsulation stage, we automatically transforms the toolset into a list of well-structured, callable functions with generated demonstrations. Specifically, for each tool, the LLM is provided with its raw documentation and is induced to encapsulate it into a callable function. To verify the correctness, besides the syntax compilation, the LLM is stimulated to generate function-calling instances for each function to test the runtime correctness.
  • Tool Programming In the Tool Programming stage, the LLM is prompted to read the encapsulated functions and flexibly integrate them through a unified programming language (e.g., Python). Concretely, we first load the encapsulated functions to initialize an execution environment. Then, the LLM is equipped with the created function library and generates executable programs as a solution. The programs sequentially call a chain of functions, parse useful intermediates to resolve input-output dependencies among functions, and ultimately derive the final answer.
geometric reasoning

An overview of the proposed framework AutoTools, in which the LLM (1) automatically encapsulates diverse tools into unified callable functions and (2) directly utilizes these functions through programming.

Further improvement with AutoTools-Learning

We further investigate how to improve the LLM's expertise within AutoTools, especially for LLMs with fewer parameters. We propose AutoTools-learning, a multi-task learning approach that trains the LLM as an automated tool agent from synthetic datasets. We design three core learning tasks: (1) documentation understanding, where the LLM is trained to parse diverse tool documentation and generate structured functions; (2) relevance learning, where the LLM learns to select relevant tools based on a query and a candidate tool list; and (3) function learning, where we optimize the LLM to call in-context functions and solve practical queries. To enable this learning process, we filter and synthesize training data from large-scale public resources for each task, transforming it into a unified format. This enables us to collect high-quality examples without intensive human annotation.

Experimental Results

We evaluated the proposed AutoTools on ToolBench, RestBench and a newly-collected benchmark, i.e., AutoTools-Eval.

geometric reasoning
geometric reasoning
geometric reasoning

Case Study

GIven the documentation of the "SEARCH_TOOL" APIs (tool) in natural language, the LLM can understand and encapsulate it into a structured function.

geometric reasoning

Citation

@inproceedings{autotools,
	title     = {Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents},
	author    = {Zhengliang Shi, Shen Gao, Lingyong Yan, Yue Feng, Xiuyi Chen, Zhumin Chen, Dawei Yin, Suzan Verberne, Zhaochun Ren},
	year      = 2025,
	booktitle = {WWW}
}