You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
||`beam_width`|[batch_size]| uint32 |**Optional**. beam size for beam search, using sampling if setting to 1 |
89
91
||`bad_words_list`|[batch_size, 2, word_list_len]| int32 |**Optional**. List of tokens (words) to never sample. Should be generated with FasterTransformer/examples/pytorch/gpt/utils/word_list.py |
90
92
||`stop_words_list`|[batch_size, 2, word_list_len]| int32 |**Optional**. List of tokens (words) that stop sampling. Should be generated with FasterTransformer/examples/pytorch/gpt/utils/word_list.py |
@@ -96,8 +98,8 @@ The following table shows the details of these settings:
96
98
||`top_p_min`|[batch_size]| float |**Optional**. min top_p values for top p factual-nucleus sampling |
97
99
||`top_p_reset_ids`|[batch_size]| uint32 |**Optional**. reset ids for reseting top_p values for top p factual-nucleus sampling |
98
100
| output |||||
99
-
||`output_ids`|[batch_size, beam_width, -1]|uint32| output ids before detokenization |
100
-
||`sequence_length`|[batch_size]|uint32| real sequence length of each output |
101
+
||`output_ids`|[batch_size, beam_width, -1]| int32| output ids before detokenization |
102
+
||`sequence_length`|[batch_size]| int32| real sequence length of each output |
101
103
||`cum_log_probs`|[batch_size, beam_width]| float |**Optional**. cumulative log probability of output sentence |
102
104
||`output_log_probs`|[batch_size, beam_width, request_output_seq_len]| float |**Optional**. It records the log probability of logits at each step for sampling. |
103
105
| parameter |||||
@@ -343,3 +345,102 @@ The model is finetuned by 5 epoches, and the accuracy on both FT and NeMo are 61
343
345
## Build an entire processing pipeline with Triton
344
346
345
347
For T5-Encoder, there exists an example of tokenization in `all_models/t5-encoder/tokenizer`. This python model accepts sentences and convert them to token lists. It can be integrated in a Triton [ensemble model](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#ensemble-models) together with the `fastertransformer` model. You may also consult GPT [documentation](./gpt_guide.md).
0 commit comments