|
26 | 26 | # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
27 | 27 | -->
|
28 | 28 |
|
29 |
| -# Function Calling |
| 29 | +# Function Calling with Triton Inference Server |
30 | 30 |
|
31 | 31 | This tutorial focuses on function calling, a common approach to easily connect
|
32 | 32 | large language models (LLMs) to external tools. This method empowers AI agents
|
33 | 33 | with effective tool usage and seamless interaction with external APIs,
|
34 | 34 | significantly expanding their capabilities and practical applications.
|
35 | 35 |
|
| 36 | +## Table of Contents |
| 37 | + |
| 38 | +- [What is Function Calling?](#what-is-function-calling) |
| 39 | +- [Tutorial Overview](#tutorial-overview) |
| 40 | + + [Prerequisite: Hermes-2-Pro-Llama-3-8B](#prerequisite-hermes-2-pro-llama-3-8b) |
| 41 | +- [Function Definitions](#function-definitions) |
| 42 | +- [Prompt Engineering](#prompt-engineering) |
| 43 | +- [Combining Everything Together](#combining-everything-together) |
| 44 | +- [Further Optimizations](#further-optimizations) |
| 45 | + + [Enforcing Output Format](#enforcing-output-format) |
| 46 | + + [Parallel Tool Call](#parallel-tool-call) |
| 47 | +- [References](#references) |
| 48 | + |
36 | 49 | ## What is Function Calling?
|
37 | 50 |
|
38 | 51 | Function calling refers to the ability of LLMs to:
|
39 | 52 | * Recognize when a specific function or tool needs to be used to answer a query
|
40 | 53 | or perform a task.
|
41 | 54 | * Generate a structured output containing the necessary arguments to call
|
42 | 55 | that function.
|
43 |
| - * Integrate the results of the function call into its response. |
| 56 | + * Integrate the results of the function call into its response. |
| 57 | + |
| 58 | +Function calling is a powerful mechanism that allows LLMs to perform |
| 59 | +more complex tasks (e.g. agent orchestration in multi-agent systems) |
| 60 | +that require specific computations or data retrieval |
| 61 | +beyond their inherent knowledge. By recognizing when a particular function |
| 62 | +is needed, LLMs can dynamically extend their functionality, making them more |
| 63 | +versatile and useful in real-world applications. |
| 64 | + |
| 65 | +## Tutorial Overview |
| 66 | + |
| 67 | +This tutorial demonstrates function calling using the |
| 68 | +[Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) |
| 69 | +model, which is pre-fine-tuned for this capability. We'll create a basic |
| 70 | +stock reporting agent that provides up-to-date stock information and summarizes |
| 71 | +recent company news. |
| 72 | + |
| 73 | +### Prerequisite: Hermes-2-Pro-Llama-3-8B |
| 74 | + |
| 75 | +Before proceeding, please make sure that you've successfully deployed |
| 76 | +[Hermes-2-Pro-Llama-3-8B.](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) |
| 77 | +model with Triton Inference Server and TensorRT-LLM backend |
| 78 | +following [these steps.](../../Popular_Models_Guide/Hermes-2-Pro-Llama-3-8B/README.md) |
| 79 | + |
| 80 | +> [!IMPORTANT] |
| 81 | +> Make sure that the `tutorials` folder is mounted to `/tutorials`, when you |
| 82 | +> start the docker container. |
| 83 | +
|
| 84 | +## Function Definitions |
| 85 | + |
| 86 | +We'll define three functions for our stock reporting agent: |
| 87 | +1. `get_current_stock_price`: Retrieves the current stock price for a given symbol. |
| 88 | +2. `get_company_news`: Retrieves company news and press releases for a given stock symbol. |
| 89 | +3. `final_answer`: Used as a no-op and to indicate the final response. |
| 90 | + |
| 91 | +Each function includes its name, description, and input parameter schema: |
| 92 | + ```python |
| 93 | +TOOLS = [ |
| 94 | + { |
| 95 | + "type": "function", |
| 96 | + "function": { |
| 97 | + "name": "get_current_stock_price", |
| 98 | + "description": "Get the current stock price for a given symbol.\n\nArgs:\n symbol (str): The stock symbol.\n\nReturns:\n float: The current stock price, or None if an error occurs.", |
| 99 | + "parameters": { |
| 100 | + "type": "object", |
| 101 | + "properties": {"symbol": {"type": "string"}}, |
| 102 | + "required": ["symbol"], |
| 103 | + }, |
| 104 | + }, |
| 105 | + }, |
| 106 | + { |
| 107 | + "type": "function", |
| 108 | + "function": { |
| 109 | + "name": "get_company_news", |
| 110 | + "description": "Get company news and press releases for a given stock symbol.\n\nArgs:\nsymbol (str): The stock symbol.\n\nReturns:\npd.DataFrame: DataFrame containing company news and press releases.", |
| 111 | + "parameters": { |
| 112 | + "type": "object", |
| 113 | + "properties": {"symbol": {"type": "string"}}, |
| 114 | + "required": ["symbol"], |
| 115 | + }, |
| 116 | + }, |
| 117 | + }, |
| 118 | + { |
| 119 | + "type": "function", |
| 120 | + "function": { |
| 121 | + "name": "final_answer", |
| 122 | + "description": "Return final generated answer", |
| 123 | + "parameters": { |
| 124 | + "type": "object", |
| 125 | + "properties": {"final_response": {"type": "string"}}, |
| 126 | + "required": ["final_response"], |
| 127 | + }, |
| 128 | + }, |
| 129 | + }, |
| 130 | +] |
| 131 | + ``` |
| 132 | +These function definitions will be passed to our model through a prompt, |
| 133 | +enabling it to recognize and utilize them appropriately during the conversation. |
| 134 | + |
| 135 | +For the actual implementations, please refer to [client_utils.py.](./artifacts/client_utils.py) |
| 136 | + |
| 137 | +## Prompt Engineering |
| 138 | + |
| 139 | +**Prompt engineering** is a crucial aspect of function calling, as it guides |
| 140 | +the LLM in recognizing when and how to utilize specific functions. |
| 141 | +By carefully crafting prompts, you can effectively define the LLM's role, |
| 142 | +objectives, and the tools it can access, ensuring accurate and efficient task |
| 143 | +execution. |
| 144 | + |
| 145 | +For our task, we've organized a sample prompt structure, provided |
| 146 | +in the accompanying [`system_prompt_schema.yml`](./artifacts/system_prompt_schema.yml) |
| 147 | +file. This file meticulously outlines: |
| 148 | + |
| 149 | +- **Role**: Defines the specific role the LLM is expected to perform. |
| 150 | +- **Objective**: Clearly states the goal or desired outcome of the interaction. |
| 151 | +- **Tools**: Lists the available functions or tools the LLM can use to achieve |
| 152 | +its objective. |
| 153 | +- **Schema**: Specifies the structure and format required for calling each tool |
| 154 | +or function. |
| 155 | +- **Instructions**: Provides a clear set of guidelines to ensure the LLM follows |
| 156 | +the intended path and utilizes the tools appropriately. |
| 157 | + |
| 158 | +By leveraging prompt engineering, you can enhance the LLM's ability |
| 159 | +to perform complex tasks and integrate function calls seamlessly into |
| 160 | +its responses, thereby maximizing its utility in various applications. |
| 161 | + |
| 162 | +## Combining Everything Together |
| 163 | + |
| 164 | +First, let's start Triton SDK container: |
| 165 | +```bash |
| 166 | +# Using the SDK container as an example |
| 167 | +docker run --rm -it --net host --shm-size=2g \ |
| 168 | + --ulimit memlock=-1 --ulimit stack=67108864 --gpus all \ |
| 169 | + -v /path/to/tutorials/:/tutorials \ |
| 170 | + -v /path/to/tutorials/repo:/tutorials \ |
| 171 | + nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk |
| 172 | +``` |
| 173 | + |
| 174 | +The provided client script uses `pydantic` and `yfinance` libraries, which we |
| 175 | +do not ship with the sdk container. Make sure to install it, before proceeding: |
| 176 | + |
| 177 | +```bash |
| 178 | +pip install pydantic yfinance |
| 179 | +``` |
| 180 | + |
| 181 | +Run the provided [`client.py`](./artifacts/client.py) as follows: |
| 182 | + |
| 183 | +```bash |
| 184 | +python3 /tutorials/AI_Agents_Guide/Function_Calling/artifacts/client.py --prompt "Tell me about Rivian. Include current stock price in your final response." -o 200 |
| 185 | +``` |
| 186 | + |
| 187 | +You should expect to see a response similar to: |
| 188 | + |
| 189 | +```bash |
| 190 | ++++++++++++++++++++++++++++++++++++++ |
| 191 | +RESPONSE: Rivian, with its current stock price of <CURRENT STOCK PRICE>, <NEWS SUMMARY> |
| 192 | ++++++++++++++++++++++++++++++++++++++ |
| 193 | +``` |
| 194 | + |
| 195 | +To see what tools were "called" by our LLM, simply add `verbose` flag as follows: |
| 196 | +```bash |
| 197 | +python3 /tutorials/AI_Agents_Guide/Function_Calling/artifacts/client.py --prompt "Tell me about Rivian. Include current stock price in your final response." -o 200 --verbose |
| 198 | +``` |
| 199 | + |
| 200 | +This will show the step-by-step process of function calling, including: |
| 201 | +- The tools being called |
| 202 | +- The arguments passed to each tool |
| 203 | +- The responses from each function call |
| 204 | +- The final summarized response |
| 205 | + |
| 206 | + |
| 207 | +```bash |
| 208 | +[b'\n{\n "step": "1",\n "description": "Get the current stock price for Rivian",\n "tool": "get_current_stock_price",\n "arguments": {\n "symbol": "RIVN"\n }\n}'] |
| 209 | +===================================== |
| 210 | +Executing function: get_current_stock_price({'symbol': 'RIVN'}) |
| 211 | +Function response: <CURRENT STOCK PRICE> |
| 212 | +===================================== |
| 213 | +[b'\n{\n "step": "2",\n "description": "Get company news and press releases for Rivian",\n "tool": "get_company_news",\n "arguments": {\n "symbol": "RIVN"\n }\n}'] |
| 214 | +===================================== |
| 215 | +Executing function: get_company_news({'symbol': 'RIVN'}) |
| 216 | +Function response: [<LIST OF RECENT NEWS TITLES>] |
| 217 | +===================================== |
| 218 | +[b'\n{\n "step": "3",\n "description": "Summarize the company news and press releases for Rivian",\n "tool": "final_answer",\n "arguments": {\n "final_response": "Rivian, with its current stock price of <CURRENT STOCK PRICE>, <NEWS SUMMARY>"\n }\n}'] |
| 219 | + |
| 220 | + |
| 221 | ++++++++++++++++++++++++++++++++++++++ |
| 222 | +RESPONSE: Rivian, with its current stock price of <CURRENT STOCK PRICE>, <NEWS SUMMARY> |
| 223 | ++++++++++++++++++++++++++++++++++++++ |
| 224 | +``` |
| 225 | + |
| 226 | +> [!TIP] |
| 227 | +> In this tutorial, all functionalities (tool definitions, implementations, |
| 228 | +> and executions) are implemented on the client side (see |
| 229 | +> [client.py](./artifacts/client.py)). |
| 230 | +> For production scenarios, especially when functions are known beforehand, |
| 231 | +> consider implementing this logic on the server side. |
| 232 | +> A recommended approach for server-side implementation is to deploy your |
| 233 | +> workflow through a Triton [ensemble](https://github.com/triton-inference-server/server/blob/a6fff975a214ff00221790dd0a5521fb05ce3ac9/docs/user_guide/architecture.md#ensemble-models) |
| 234 | +> or a [BLS](https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#business-logic-scripting). |
| 235 | +> Use a pre-processing model to combine and format the user prompt with the |
| 236 | +> system prompt and available tools. Employ a post-processing model to manage |
| 237 | +> multiple calls to the deployed LLM as needed to reach the final answer. |
| 238 | +
|
| 239 | +## Further Optimizations |
| 240 | + |
| 241 | +### Enforcing Output Format |
| 242 | + |
| 243 | +In this tutorial, we demonstrated how to enforce a specific output format |
| 244 | +using prompt engineering. The desired structure is as follows: |
| 245 | +```python |
| 246 | + { |
| 247 | + "step" : <Step number> |
| 248 | + "description": <Description of what the step does and its output> |
| 249 | + "tool": <Tool to use>, |
| 250 | + "arguments": { |
| 251 | + <Parameters to pass to the tool as a valid dict> |
| 252 | + } |
| 253 | + } |
| 254 | +``` |
| 255 | +However, there may be instances where the output deviates from this |
| 256 | +required schema. For example, consider the following prompt execution: |
| 257 | + |
| 258 | +```bash |
| 259 | +python3 /tutorials/AI_Agents_Guide/Function_Calling/artifacts/client.py --prompt "How Rivian is doing?" -o 500 --verbose |
| 260 | +``` |
| 261 | +This execution may fail with an invalid JSON format error. The verbose |
| 262 | +output will reveal that the final LLM response contained plain text |
| 263 | +instead of the expected JSON format: |
| 264 | +``` |
| 265 | +{ |
| 266 | + "step": "3", |
| 267 | + "description": <Description of what the step does and its output> |
| 268 | + "tool": "final_answer", |
| 269 | + "arguments": { |
| 270 | + "final_response": <Final Response> |
| 271 | + } |
| 272 | +} |
| 273 | +``` |
| 274 | +Fortunately, this behavior can be controlled using constrained decoding, |
| 275 | +a technique that guides the model to generate outputs that meet specific |
| 276 | +formatting and content requirements. We strongly recommend exploring our |
| 277 | +dedicated [tutorial](../Constrained_Decoding/README.md) on constrained decoding |
| 278 | +to gain deeper insights and enhance your ability to manage model outputs |
| 279 | +effectively. |
| 280 | + |
| 281 | +> [!TIP] |
| 282 | +> For optimal results, utilize the `FunctionCall` class defined in |
| 283 | +> [client_utils.py](./artifacts/client_utils.py) as the JSON schema |
| 284 | +> for your Logits Post-Processor. This approach ensures consistent |
| 285 | +> and properly formatted outputs, aligning with the structure we've |
| 286 | +> established throughout this tutorial. |
| 287 | +
|
| 288 | +### Parallel Tool Call |
| 289 | + |
| 290 | +This tutorial focuses on a single turn forced call, the LLM is prompted |
| 291 | +to make a specific function call within a single interaction. This approach is |
| 292 | +useful when a precise action is needed immediately, ensuring that |
| 293 | +the function is executed as part of the current conversation. |
| 294 | + |
| 295 | +It is possible, that come of function calls can be executed simultaneously. |
| 296 | +This technique is beneficial for tasks that can be divided into independent |
| 297 | +operations, allowing for increased efficiency and reduced response time. |
| 298 | + |
| 299 | +We encourage our readers to take on the challenge of implementing |
| 300 | +parallel tool calls as a practical exercise. |
| 301 | + |
| 302 | +## References |
| 303 | + |
| 304 | +Parts of this tutorial are based of [Hermes-Function-Calling](https://github.com/NousResearch/Hermes-Function-Calling). |
0 commit comments