You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
transformer_layer_spec (ModuleSpec): Specifies module to use for transformer layers
30
-
vocab_size (int): Vocabulary size
31
-
max_sequence_length (int): maximum size of sequence. This is used for positional embedding
32
-
pre_process (bool, optional): Include embedding layer (used with pipeline parallelism). Defaults to True.
33
-
post_process (bool, optional): Include an output layer (used with pipeline parallelism). Defaults to True.
34
-
fp16_lm_cross_entropy (bool, optional): Defaults to False.
35
-
parallel_output (bool, optional): Do not gather the outputs, keep them split across tensor parallel ranks. Defaults to True.
36
-
share_embeddings_and_output_weights (bool, optional): When True, input embeddings and output logit weights are shared. Defaults to False.
37
-
position_embedding_type (Literal[learned_absolute,rope], optional): Position embedding type.. Defaults to 'learned_absolute'.
38
-
rotary_percent (float, optional): Percent of rotary dimension to use for rotary position embeddings. Ignored unless position_embedding_type is 'rope'. Defaults to 1.0.
39
-
rotary_base (int, optional): Base period for rotary position embeddings. Ignored unless position_embedding_type is 'rope'. Defaults to 10000.
40
-
seq_len_interpolation_factor (Optional[float], optional): scale of linearly interpolating RoPE for longer sequences. The value must be a float larger than 1.0. Defaults to None.
25
+
config (TransformerConfig):
26
+
Transformer config
27
+
transformer_layer_spec (ModuleSpec):
28
+
Specifies module to use for transformer layers
29
+
vocab_size (int):
30
+
Vocabulary size
31
+
max_sequence_length (int):
32
+
maximum size of sequence. This is used for positional embedding
33
+
pre_process (bool, optional):
34
+
Include embedding layer (used with pipeline parallelism). Defaults to True.
35
+
post_process (bool, optional):
36
+
Include an output layer (used with pipeline parallelism). Defaults to True.
37
+
fp16_lm_cross_entropy (bool, optional):
38
+
Defaults to False.
39
+
parallel_output (bool, optional):
40
+
Do not gather the outputs, keep them split across tensor
0 commit comments