Add trtllm TriggerPhrase LP

aerdem4 · aerdem4 · commit f4b28755b6b2 · 2025-06-30T16:08:13.000+03:00
Signed-off-by: aerdem4 &lt;ahmeterd4@gmail.com&gt;
diff --git a/example_notebooks/transformers/trigger_phrase_logits_processor.ipynb b/example_notebooks/transformers/trigger_phrase_logits_processor.ipynb
@@ -28,11 +28,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
-      "  warnings.warn(\n",
-      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
-      "  warnings.warn(\n"
+      "Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.\n"
      ]
     }
    ],
@@ -70,14 +66,9 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:392: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.\n",
-      "  warnings.warn(\n",
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.\n",
-      "  warnings.warn(\n",
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:407: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.\n",
-      "  warnings.warn(\n",
       "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n",
-      "Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.\n"
+      "Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.\n",
+      "The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n"
      ]
     },
     {
@@ -113,9 +104,9 @@
       "\n",
       "Let me test this function with some examples. For n=0, it returns 0. For n=1, returns 1. For n=2, it's F(1)+F(0) = 1+0=1. For n=3, F(2)+F(1)=1+1=2. That looks correct.\n",
       "\n",
-      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which matches the standard definition. So, the function should work regardless of the starting point as long as the base cases are correct.\n",
+      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which is correct.\n",
       "\n",
-      "Another thing to consider is the base cases. If the function is called with n=0, it returns 0, which is correct. For n=1, returns 1. For n=2, returns 1, which is correct. So, the function should handle all non-negative integers correctly.\n",
+      "Another test case: n=5. Let's compute it step by step. F(0)=0, F(1)=1, F(2)=1, F(3)=2, F(4)=3, F(5)=5. So the function should return 5 for n=5.\n",
       "\n",
       "I think this should work. So, the function is straightforward. It's a simple recursive implementation, but it's not the most efficient for large n. However, for the purpose of this problem, it's acceptable.\n",
       "</think>\n",
@@ -215,39 +206,64 @@
       "\n",
       "Let me test this function with some examples. For n=0, it returns 0. For n=1, returns 1. For n=2, it's F(1)+F(0) = 1+0=1. For n=3, F(2)+F(1)=1+1=2. That looks correct.\n",
       "\n",
-      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which matches the standard definition. So, the function should work regardless of the starting point as long as the base cases are correct.\n",
+      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which is correct.\n",
       "\n",
-      "Another thing to consider is the base cases. If the function is called with n=0, it returns 0, which is correct. For n=1, returns 1. For n=2, returns 1, which is correct. So, the function should handle all non-negative integers correctly.\n",
+      "Another test case: n=5. Let's compute it step by step. F(0)=0, F(1)=1, F(2)=1, F(3)=2, F(4)=3, F(5)=5. So the function should return 5 for n=5.\n",
       "\n",
       "I think this should work. So, the function is straightforward. It's a simple recursive implementation, but it's not the most efficient for large n. However, for the purpose of this problem, it's acceptable.\n",
-      ",,,\n",
+      "Wait, but in the problem statement, it says to make it recursive. So, the function as written is recursive, but it's not optimized. So, I think this should be the solution.\n",
+      "Wait, but the problem says to make it recursive, so perhaps the function should handle larger n efficiently. But without memoization, it's not efficient. So, maybe the function should be written with memoization or an iterative approach.\n",
+      "\n",
+      "Alternatively, I can implement it iteratively, which is more efficient. Let's think about that.\n",
+      "\n",
+      "An iterative approach would start from the bottom and build up the Fibonacci numbers. For example:\n",
+      "\n",
+      "def fibonacci(n):\n",
+      "    if n == 0:\n",
+      "        return 0\n",
+      "    a, b = 0, 1\n",
+      "    for _ in range(2, n+1):\n",
+      "        a, b = b, a + b\n",
+      "    return b\n",
+      "\n",
+      "Wait, let's test this. For n=5, the loop runs from 2 to 5, which is 4 iterations. Let's see:\n",
+      "\n",
+      "Initial a=0, b=1.\n",
+      "\n",
+      "After first iteration (i=2): a becomes 1, b becomes 0+1=1.\n",
+      "\n",
+      "After second iteration (i=3): a=1, b=1+1=2.\n",
+      "\n",
+      "After third iteration (i=4): a=2, b=1+2=3.\n",
+      "\n",
+      "After fourth iteration (i=5): a=3, b=2+3=5.\n",
+      "\n",
+      "So, return b=5, which is correct.\n",
+      "\n",
+      "This approach is more efficient as it runs in O(n) time and uses O(1) space, which is better for large n.\n",
       "\n",
-      "Wait, but the problem says to make it recursive. So, the function should call itself with smaller arguments. The approach I have is correct and recursive.\n",
+      "But the problem says to make it recursive. So, perhaps the recursive approach is acceptable, but the iterative approach is better for larger n.\n",
       "\n",
-      "So, the final function is as I wrote above.\n",
-      ",,,\n",
+      "But since the problem doesn't specify, I'll proceed with the recursive approach as it's straightforward.\n",
       "\n",
-      "Wait, but in the function, for n=2, it's F(1)+F(0) = 1+0=1, which is correct. For n=3, F(2)+F(1)=1+1=2. So, the function works as expected.\n",
+      "Wait, but the problem says to make it recursive. So, perhaps the function should be written recursively, but without memoization. So, the function as I wrote earlier is correct.\n",
       "\n",
-      "I think this should solve the problem.\n",
+      "Alternatively, I can write it with memoization using a dictionary to store previously computed values. But that's more advanced and perhaps beyond the scope of this problem.\n",
+      "\n",
+      "So, the function I wrote earlier is correct and meets the requirements.\n",
       "</think>\n",
       "\n",
       "To solve this problem, we need to generate the nth Fibonacci number using a recursive approach. The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, starting from 0 and 1. \n",
       "\n",
       "### Approach\n",
-      "The approach to solve this problem involves using recursion, which is a method where a function calls itself with a modified parameter to achieve the desired result. Here's a step-by-step breakdown of the approach:\n",
+      "The Fibonacci sequence is defined as follows:\n",
+      "- F(0) = 0\n",
+      "- F(1) = 1\n",
+      "- F(n) = F(n-1) + F(n-2) for n >= 2\n",
       "\n",
-      "1. **Base Cases**: \n",
-      "   - If `n` is 0, return 0.\n",
-      "   - If `n` is 1, return 1.\n",
-      "   \n",
-      "2. **Recursive Case**:\n",
-      "   - For any `n` greater than 1, the nth Fibonacci number is the sum of the (n-1)th and (n-2)th Fibonacci numbers. This is achieved by recursively calling the function with `n-1` and `n-2` and adding their results.\n",
-      "\n",
-      "This approach ensures that each Fibonacci number is computed by breaking down the problem into smaller subproblems, which are then solved recursively.\n",
+      "Given the requirement to use a recursive approach, we can define a function that calls itself with smaller values of n until it reaches the base cases. The function will handle the base cases directly and use recursion for the general case.\n",
       "\n",
       "### Solution Code\n",
-      "\n",
       "```python\n",
       "def fibonacci(n):\n",
       "    if n == 0:\n",
@@ -259,10 +275,16 @@
       "```\n",
       "\n",
       "### Explanation\n",
-      "- **Base Cases**: The function first checks if `n` is 0 or 1. If `n` is 0, it returns 0. If `n` is 1, it returns 1. These are the simplest cases of the Fibonacci sequence.\n",
-      "- **Recursive Case**: For any `n` greater than 1, the function calls itself with `n-1` and `n-2`, and returns the sum of these two recursive calls. This builds up the solution by solving smaller subproblems and combining their results.\n",
+      "The function `fibonacci` takes an integer `n` as input and returns the nth Fibonacci number. \n",
+      "\n",
+      "1. **Base Cases**:\n",
+      "   - If `n` is 0, the function returns 0.\n",
+      "   - If `n` is 1, the function returns 1.\n",
+      "\n",
+      "2. **Recursive Case**:\n",
+      "   - For `n >= 2`, the function calls itself with `n-1` and `n-2` and returns the sum of these two recursive calls. This builds up the Fibonacci sequence from the bottom up, ensuring that each value is computed only once.\n",
       "\n",
-      "This approach is straightforward and leverages the divide-and-conquer strategy inherent in recursion, making it easy to understand and implement. However, it's important to note that this approach has a time complexity of O(2^n) due to the exponential number of function calls, which is not efficient for large values of `n`. For larger values, an iterative approach or memoization would be more efficient.\n",
+      "This approach is straightforward and leverages the recursive nature of the Fibonacci sequence, making it easy to understand and implement. However, it's important to note that for very large values of `n`, this approach can be inefficient due to repeated calculations. For larger values, an iterative approach or memoization would be more efficient.\n",
       "-----END-----\n",
       "\n"
      ]
@@ -332,9 +354,9 @@
       "\n",
       "Let me test this function with some examples. For n=0, it returns 0. For n=1, returns 1. For n=2, it's F(1)+F(0) = 1+0=1. For n=3, F(2)+F(1)=1+1=2. That looks correct.\n",
       "\n",
-      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which matches the standard definition. So, the function should work regardless of the starting point as long as the base cases are correct.\n",
+      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which is correct.\n",
       "\n",
-      "Another thing to consider is the base cases. If the function is called with n=0, it returns 0, which is correct. For n=1, returns 1. For n=2, returns 1, which is correct. So, the function should handle all non-negative integers correctly.\n",
+      "Another test case: n=5. Let's compute it step by step. F(0)=0, F(1)=1, F(2)=1, F(3)=2, F(4)=3, F(5)=5. So the function should return 5 for n=5.\n",
       "\n",
       "I think this should work. So, the function is straightforward. It's a simple recursive implementation, but it's not the most efficient for large n. However, for the purpose of this problem, it's acceptable.\n",
       "</think>\n",
@@ -348,7 +370,7 @@
       "        return fibonacci(n-1) + fibonacci(n-2)\n",
       "```\n",
       "\n",
-      "This function calculates the nth Fibonacci number using a recursive approach. It handles the base cases where n is 0 or 1 and recursively computes the value for larger n by summing the two preceding Fibonacci numbers.\n",
+      "This function calculates the nth Fibonacci number using a recursive approach. It handles the base cases where n is 0 or 1 and for other values, it recursively calculates the sum of the two preceding Fibonacci numbers. While this implementation is straightforward, it's not the most efficient for large values of n due to repeated calculations.\n",
       "-----END-----\n",
       "\n"
      ]
diff --git a/example_notebooks/trtllm/README.md b/example_notebooks/trtllm/README.md
@@ -26,4 +26,6 @@ python example_notebooks/trtllm/multiple_choice_logits_processor.py -p "I am get
 3. Battery"
 
 python example_notebooks/trtllm/prevent_hallucination_logits_processor.py -p "Tell me the Nobel Prizes in 1977"
+
+python example_notebooks/trtllm/trigger_phrase_logits_processor.py -p "Generate a python function to calculate nth fibonacci number. Make it recursive. Keep thinking short."
 ```
diff --git a/example_notebooks/trtllm/trigger_phrase_logits_processor.py b/example_notebooks/trtllm/trigger_phrase_logits_processor.py
@@ -0,0 +1,17 @@
+from transformers import AutoTokenizer
+from logits_processor_zoo.trtllm import TriggerPhraseLogitsProcessor
+from utils import TRTLLMTester, get_parser
+
+
+if __name__ == "__main__":
+    args = get_parser()
+
+    tokenizer = AutoTokenizer.from_pretrained(args.model_name)
+    llm_tester = TRTLLMTester(args.model_name, args.backend)
+
+    lp = TriggerPhraseLogitsProcessor("...Wait, let me think more.", " function", tokenizer,
+                                      trigger_count=2, trigger_after=False)
+    llm_tester.run([args.prompt], logits_processor=lp)
+
+    lp = TriggerPhraseLogitsProcessor("\n```python", " function", tokenizer, trigger_count=1, trigger_after=True)
+    llm_tester.run([args.prompt], logits_processor=lp)
diff --git a/logits_processor_zoo/transformers/trigger_phrase.py b/logits_processor_zoo/transformers/trigger_phrase.py
@@ -55,7 +55,7 @@ def _process(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> to
             if scores[i, :].argmax() == self.trigger_token and it == -1:
                 self.iterators[i] = 0
                 if not self.trigger_after:
-                    scores[i] = enforce_tokens(scores[i], [self.phrase_tokens[it]])
+                    scores[i] = enforce_tokens(scores[i], [self.phrase_tokens[0]])
                     self.iterators[i] += 1
             elif len(self.phrase_tokens) > it >= 0:
                 scores[i] = enforce_tokens(scores[i], [self.phrase_tokens[it]])
diff --git a/logits_processor_zoo/trtllm/__init__.py b/logits_processor_zoo/trtllm/__init__.py
@@ -20,6 +20,7 @@
 from .cite_prompt import CiteFromPromptLogitsProcessor
 from .multiple_choice import MultipleChoiceLogitsProcessor
 from .prevent_hallucination import PreventHallucinationLogitsProcessor
+from .trigger_phrase import TriggerPhraseLogitsProcessor
 
 __all__ = ['GenLengthLogitsProcessor', 'ForceLastPhraseLogitsProcessor', 'CiteFromPromptLogitsProcessor',
-           'MultipleChoiceLogitsProcessor', 'PreventHallucinationLogitsProcessor']
+           'MultipleChoiceLogitsProcessor', 'PreventHallucinationLogitsProcessor', 'TriggerPhraseLogitsProcessor']
diff --git a/logits_processor_zoo/trtllm/trigger_phrase.py b/logits_processor_zoo/trtllm/trigger_phrase.py
@@ -0,0 +1,79 @@
+#
+# SPDX-FileCopyrightText: Copyright (c) 1993-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import List, Optional
+from transformers import PreTrainedTokenizer
+import torch
+from logits_processor_zoo.utils import enforce_tokens, text_to_token
+from tensorrt_llm.sampling_params import LogitsProcessor
+
+
+class TriggerPhraseLogitsProcessor(LogitsProcessor):
+    """
+    A logits processor which triggers phrases when it encounters a given token.
+
+    Parameters
+    ----------
+    phrase (str): The phrase to be generated by LLM when it encounters the trigger token.
+    trigger_token_phrase (str): One token phrase in string to trigger phrases.
+    tokenizer (PreTrainedTokenizer): The tokenizer used by the LLM.
+    trigger_count (int): How many times the phrase will be triggered.
+    trigger_after (bool): Whether the phrase is written after the trigger token or instead of the trigger token.
+    """
+    def __init__(self, phrase: str, trigger_token_phrase: str, tokenizer: PreTrainedTokenizer,
+                 trigger_count: int = 1, trigger_after: bool = False):
+        self.tokenizer = tokenizer
+        self.trigger_token = text_to_token(self.tokenizer, trigger_token_phrase, last=False)
+        self.phrase_tokens = self.tokenizer.encode(phrase, add_special_tokens=False)
+        self.initial_trigger_count = trigger_count
+        self.trigger_after = trigger_after
+        self.iterators = None
+        self.trigger_counts = None
+
+    def _init_before_gen(self, beam_width):
+        self.iterators = -torch.ones(beam_width, dtype=torch.int32)
+        self.trigger_counts = self.initial_trigger_count*torch.ones(beam_width, dtype=torch.int32)
+
+    def __call__(self, req_id: int, logits: torch.Tensor,
+                 token_ids: List[List[int]], stream_ptr: Optional[int],
+                 client_id: Optional[int]) -> None:
+        beam_width = len(token_ids)
+        if self.iterators is None:
+            self._init_before_gen(beam_width)
+
+        stream = None if stream_ptr is None else torch.cuda.ExternalStream(stream_ptr)
+
+        with torch.cuda.stream(stream):
+            for i in range(beam_width):  # iterate over beams
+                if self.trigger_counts[i] <= 0:
+                    continue
+
+                current_index = self.iterators[i].item()
+
+                if logits[0, i].argmax() == self.trigger_token and current_index == -1:
+                    self.iterators[i] = 0
+                    print("triggering...")
+                    if not self.trigger_after:
+                        enforce_tokens(logits[0, i], [self.phrase_tokens[0]])
+                        self.iterators[i] += 1
+                elif len(self.phrase_tokens) > current_index >= 0:
+                    enforce_tokens(logits[0, i], [self.phrase_tokens[current_index]])
+                    self.iterators[i] += 1
+
+                if len(self.phrase_tokens) == self.iterators[i].item():  # phrase completed, reset for next trigger
+                    self.iterators[i] = -1
+                    self.trigger_counts[i] -= 1