Skip to content

Commit 373e861

Browse files
authored
AI Agents Guide: Function Calling (#113)
1 parent a1d6812 commit 373e861

File tree

4 files changed

+1035
-2
lines changed

4 files changed

+1035
-2
lines changed

AI_Agents_Guide/Function_Calling/README.md

Lines changed: 263 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,18 +26,279 @@
2626
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2727
-->
2828

29-
# Function Calling
29+
# Function Calling with Triton Inference Server
3030

3131
This tutorial focuses on function calling, a common approach to easily connect
3232
large language models (LLMs) to external tools. This method empowers AI agents
3333
with effective tool usage and seamless interaction with external APIs,
3434
significantly expanding their capabilities and practical applications.
3535

36+
## Table of Contents
37+
38+
- [What is Function Calling?](#what-is-function-calling)
39+
- [Tutorial Overview](#tutorial-overview)
40+
+ [Prerequisite: Hermes-2-Pro-Llama-3-8B](#prerequisite-hermes-2-pro-llama-3-8b)
41+
- [Function Definitions](#function-definitions)
42+
- [Prompt Engineering](#prompt-engineering)
43+
- [Combining Everything Together](#combining-everything-together)
44+
- [Further Optimizations](#further-optimizations)
45+
+ [Enforcing Output Format](#enforcing-output-format)
46+
+ [Parallel Tool Call](#parallel-tool-call)
47+
- [References](#references)
48+
3649
## What is Function Calling?
3750

3851
Function calling refers to the ability of LLMs to:
3952
* Recognize when a specific function or tool needs to be used to answer a query
4053
or perform a task.
4154
* Generate a structured output containing the necessary arguments to call
4255
that function.
43-
* Integrate the results of the function call into its response.
56+
* Integrate the results of the function call into its response.
57+
58+
Function calling is a powerful mechanism that allows LLMs to perform
59+
more complex tasks (e.g. agent orchestration in multi-agent systems)
60+
that require specific computations or data retrieval
61+
beyond their inherent knowledge. By recognizing when a particular function
62+
is needed, LLMs can dynamically extend their functionality, making them more
63+
versatile and useful in real-world applications.
64+
65+
## Tutorial Overview
66+
67+
This tutorial demonstrates function calling using the
68+
[Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
69+
model, which is pre-fine-tuned for this capability. We'll create a basic
70+
stock reporting agent that provides up-to-date stock information and summarizes
71+
recent company news.
72+
73+
### Prerequisite: Hermes-2-Pro-Llama-3-8B
74+
75+
Before proceeding, please make sure that you've successfully deployed
76+
[Hermes-2-Pro-Llama-3-8B.](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B)
77+
model with Triton Inference Server and TensorRT-LLM backend
78+
following [these steps.](../../Popular_Models_Guide/Hermes-2-Pro-Llama-3-8B/README.md)
79+
80+
> [!IMPORTANT]
81+
> Make sure that the `tutorials` folder is mounted to `/tutorials`, when you
82+
> start the docker container.
83+
84+
## Function Definitions
85+
86+
We'll define three functions for our stock reporting agent:
87+
1. `get_current_stock_price`: Retrieves the current stock price for a given symbol.
88+
2. `get_company_news`: Retrieves company news and press releases for a given stock symbol.
89+
3. `final_answer`: Used as a no-op and to indicate the final response.
90+
91+
Each function includes its name, description, and input parameter schema:
92+
```python
93+
TOOLS = [
94+
{
95+
"type": "function",
96+
"function": {
97+
"name": "get_current_stock_price",
98+
"description": "Get the current stock price for a given symbol.\n\nArgs:\n symbol (str): The stock symbol.\n\nReturns:\n float: The current stock price, or None if an error occurs.",
99+
"parameters": {
100+
"type": "object",
101+
"properties": {"symbol": {"type": "string"}},
102+
"required": ["symbol"],
103+
},
104+
},
105+
},
106+
{
107+
"type": "function",
108+
"function": {
109+
"name": "get_company_news",
110+
"description": "Get company news and press releases for a given stock symbol.\n\nArgs:\nsymbol (str): The stock symbol.\n\nReturns:\npd.DataFrame: DataFrame containing company news and press releases.",
111+
"parameters": {
112+
"type": "object",
113+
"properties": {"symbol": {"type": "string"}},
114+
"required": ["symbol"],
115+
},
116+
},
117+
},
118+
{
119+
"type": "function",
120+
"function": {
121+
"name": "final_answer",
122+
"description": "Return final generated answer",
123+
"parameters": {
124+
"type": "object",
125+
"properties": {"final_response": {"type": "string"}},
126+
"required": ["final_response"],
127+
},
128+
},
129+
},
130+
]
131+
```
132+
These function definitions will be passed to our model through a prompt,
133+
enabling it to recognize and utilize them appropriately during the conversation.
134+
135+
For the actual implementations, please refer to [client_utils.py.](./artifacts/client_utils.py)
136+
137+
## Prompt Engineering
138+
139+
**Prompt engineering** is a crucial aspect of function calling, as it guides
140+
the LLM in recognizing when and how to utilize specific functions.
141+
By carefully crafting prompts, you can effectively define the LLM's role,
142+
objectives, and the tools it can access, ensuring accurate and efficient task
143+
execution.
144+
145+
For our task, we've organized a sample prompt structure, provided
146+
in the accompanying [`system_prompt_schema.yml`](./artifacts/system_prompt_schema.yml)
147+
file. This file meticulously outlines:
148+
149+
- **Role**: Defines the specific role the LLM is expected to perform.
150+
- **Objective**: Clearly states the goal or desired outcome of the interaction.
151+
- **Tools**: Lists the available functions or tools the LLM can use to achieve
152+
its objective.
153+
- **Schema**: Specifies the structure and format required for calling each tool
154+
or function.
155+
- **Instructions**: Provides a clear set of guidelines to ensure the LLM follows
156+
the intended path and utilizes the tools appropriately.
157+
158+
By leveraging prompt engineering, you can enhance the LLM's ability
159+
to perform complex tasks and integrate function calls seamlessly into
160+
its responses, thereby maximizing its utility in various applications.
161+
162+
## Combining Everything Together
163+
164+
First, let's start Triton SDK container:
165+
```bash
166+
# Using the SDK container as an example
167+
docker run --rm -it --net host --shm-size=2g \
168+
--ulimit memlock=-1 --ulimit stack=67108864 --gpus all \
169+
-v /path/to/tutorials/:/tutorials \
170+
-v /path/to/tutorials/repo:/tutorials \
171+
nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk
172+
```
173+
174+
The provided client script uses `pydantic` and `yfinance` libraries, which we
175+
do not ship with the sdk container. Make sure to install it, before proceeding:
176+
177+
```bash
178+
pip install pydantic yfinance
179+
```
180+
181+
Run the provided [`client.py`](./artifacts/client.py) as follows:
182+
183+
```bash
184+
python3 /tutorials/AI_Agents_Guide/Function_Calling/artifacts/client.py --prompt "Tell me about Rivian. Include current stock price in your final response." -o 200
185+
```
186+
187+
You should expect to see a response similar to:
188+
189+
```bash
190+
+++++++++++++++++++++++++++++++++++++
191+
RESPONSE: Rivian, with its current stock price of <CURRENT STOCK PRICE>, <NEWS SUMMARY>
192+
+++++++++++++++++++++++++++++++++++++
193+
```
194+
195+
To see what tools were "called" by our LLM, simply add `verbose` flag as follows:
196+
```bash
197+
python3 /tutorials/AI_Agents_Guide/Function_Calling/artifacts/client.py --prompt "Tell me about Rivian. Include current stock price in your final response." -o 200 --verbose
198+
```
199+
200+
This will show the step-by-step process of function calling, including:
201+
- The tools being called
202+
- The arguments passed to each tool
203+
- The responses from each function call
204+
- The final summarized response
205+
206+
207+
```bash
208+
[b'\n{\n "step": "1",\n "description": "Get the current stock price for Rivian",\n "tool": "get_current_stock_price",\n "arguments": {\n "symbol": "RIVN"\n }\n}']
209+
=====================================
210+
Executing function: get_current_stock_price({'symbol': 'RIVN'})
211+
Function response: <CURRENT STOCK PRICE>
212+
=====================================
213+
[b'\n{\n "step": "2",\n "description": "Get company news and press releases for Rivian",\n "tool": "get_company_news",\n "arguments": {\n "symbol": "RIVN"\n }\n}']
214+
=====================================
215+
Executing function: get_company_news({'symbol': 'RIVN'})
216+
Function response: [<LIST OF RECENT NEWS TITLES>]
217+
=====================================
218+
[b'\n{\n "step": "3",\n "description": "Summarize the company news and press releases for Rivian",\n "tool": "final_answer",\n "arguments": {\n "final_response": "Rivian, with its current stock price of <CURRENT STOCK PRICE>, <NEWS SUMMARY>"\n }\n}']
219+
220+
221+
+++++++++++++++++++++++++++++++++++++
222+
RESPONSE: Rivian, with its current stock price of <CURRENT STOCK PRICE>, <NEWS SUMMARY>
223+
+++++++++++++++++++++++++++++++++++++
224+
```
225+
226+
> [!TIP]
227+
> In this tutorial, all functionalities (tool definitions, implementations,
228+
> and executions) are implemented on the client side (see
229+
> [client.py](./artifacts/client.py)).
230+
> For production scenarios, especially when functions are known beforehand,
231+
> consider implementing this logic on the server side.
232+
> A recommended approach for server-side implementation is to deploy your
233+
> workflow through a Triton [ensemble](https://github.com/triton-inference-server/server/blob/a6fff975a214ff00221790dd0a5521fb05ce3ac9/docs/user_guide/architecture.md#ensemble-models)
234+
> or a [BLS](https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#business-logic-scripting).
235+
> Use a pre-processing model to combine and format the user prompt with the
236+
> system prompt and available tools. Employ a post-processing model to manage
237+
> multiple calls to the deployed LLM as needed to reach the final answer.
238+
239+
## Further Optimizations
240+
241+
### Enforcing Output Format
242+
243+
In this tutorial, we demonstrated how to enforce a specific output format
244+
using prompt engineering. The desired structure is as follows:
245+
```python
246+
{
247+
"step" : <Step number>
248+
"description": <Description of what the step does and its output>
249+
"tool": <Tool to use>,
250+
"arguments": {
251+
<Parameters to pass to the tool as a valid dict>
252+
}
253+
}
254+
```
255+
However, there may be instances where the output deviates from this
256+
required schema. For example, consider the following prompt execution:
257+
258+
```bash
259+
python3 /tutorials/AI_Agents_Guide/Function_Calling/artifacts/client.py --prompt "How Rivian is doing?" -o 500 --verbose
260+
```
261+
This execution may fail with an invalid JSON format error. The verbose
262+
output will reveal that the final LLM response contained plain text
263+
instead of the expected JSON format:
264+
```
265+
{
266+
"step": "3",
267+
"description": <Description of what the step does and its output>
268+
"tool": "final_answer",
269+
"arguments": {
270+
"final_response": <Final Response>
271+
}
272+
}
273+
```
274+
Fortunately, this behavior can be controlled using constrained decoding,
275+
a technique that guides the model to generate outputs that meet specific
276+
formatting and content requirements. We strongly recommend exploring our
277+
dedicated [tutorial](../Constrained_Decoding/README.md) on constrained decoding
278+
to gain deeper insights and enhance your ability to manage model outputs
279+
effectively.
280+
281+
> [!TIP]
282+
> For optimal results, utilize the `FunctionCall` class defined in
283+
> [client_utils.py](./artifacts/client_utils.py) as the JSON schema
284+
> for your Logits Post-Processor. This approach ensures consistent
285+
> and properly formatted outputs, aligning with the structure we've
286+
> established throughout this tutorial.
287+
288+
### Parallel Tool Call
289+
290+
This tutorial focuses on a single turn forced call, the LLM is prompted
291+
to make a specific function call within a single interaction. This approach is
292+
useful when a precise action is needed immediately, ensuring that
293+
the function is executed as part of the current conversation.
294+
295+
It is possible, that come of function calls can be executed simultaneously.
296+
This technique is beneficial for tasks that can be divided into independent
297+
operations, allowing for increased efficiency and reduced response time.
298+
299+
We encourage our readers to take on the challenge of implementing
300+
parallel tool calls as a practical exercise.
301+
302+
## References
303+
304+
Parts of this tutorial are based of [Hermes-Function-Calling](https://github.com/NousResearch/Hermes-Function-Calling).

0 commit comments

Comments
 (0)