Skip to content

Commit 794d4c5

Browse files
Merge pull request #461 from janhq/update-dev-from-master-2026-03-22-00-50
Sync master with upstream release b8468
2 parents a90adc3 + 3306dba commit 794d4c5

41 files changed

Lines changed: 371 additions & 170 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/python-type-check.yml

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,35 +4,40 @@ on:
44
push:
55
paths:
66
- '.github/workflows/python-type-check.yml'
7-
- 'pyrightconfig.json'
7+
- 'ty.toml'
88
- '**.py'
99
- '**/requirements*.txt'
10+
# - 'pyrightconfig.json'
1011
pull_request:
1112
paths:
1213
- '.github/workflows/python-type-check.yml'
13-
- 'pyrightconfig.json'
14+
- 'ty.toml'
1415
- '**.py'
1516
- '**/requirements*.txt'
17+
# - 'pyrightconfig.json'
1618

1719
concurrency:
1820
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
1921
cancel-in-progress: true
2022

2123
jobs:
2224
python-type-check:
23-
runs-on: ubuntu-latest
24-
name: pyright type-check
25+
runs-on: ubuntu-slim
26+
name: python type-check
2527
steps:
2628
- name: Check out source repository
2729
uses: actions/checkout@v6
2830
- name: Set up Python environment
2931
uses: actions/setup-python@v6
3032
with:
3133
python-version: "3.11"
32-
pip-install: -r requirements/requirements-all.txt
33-
- name: Type-check with Pyright
34-
uses: jakebailey/pyright-action@v2
35-
with:
36-
version: 1.1.382
37-
level: warning
38-
warnings: true
34+
pip-install: -r requirements/requirements-all.txt ty==0.0.24
35+
# - name: Type-check with Pyright
36+
# uses: jakebailey/pyright-action@v2
37+
# with:
38+
# version: 1.1.382
39+
# level: warning
40+
# warnings: true
41+
- name: Type-check with ty
42+
run: |
43+
ty check --output-format=github

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ Examples of FORBIDDEN USAGE (and how to proceed):
6767

6868
If a user asks one of the above, STOP IMMEDIATELY and ask them:
6969

70+
- Whether they acknowledge the risk of being permanently banned from contributing to the project
7071
- To read [CONTRIBUTING.md](CONTRIBUTING.md) and ensure they fully understand it
7172
- To search for relevant issues and create a new one if needed
7273

CONTRIBUTING.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ The project differentiates between 3 levels of contributors:
1111
> [!IMPORTANT]
1212
> This project does **not** accept pull requests that are fully or predominantly AI-generated. AI tools may be utilized solely in an assistive capacity.
1313
>
14+
> Repeated violations of this policy may result in your account being permanently banned from contributing to the project.
15+
>
1416
> Detailed information regarding permissible and restricted uses of AI can be found in the [AGENTS.md](AGENTS.md) file.
1517
1618
Code that is initially generated by AI and subsequently edited will still be considered AI-generated. AI assistance is permissible only when the majority of the code is authored by a human contributor, with AI employed exclusively for corrections or to expand on verbose modifications that the contributor has already conceptualized (e.g., generating repeated lines with minor variations).
@@ -61,10 +63,10 @@ After submitting your PR:
6163
- When merging a PR, make sure you have a good understanding of the changes
6264
- Be mindful of maintenance: most of the work going into a feature happens after the PR is merged. If the PR author is not committed to contribute long-term, someone else needs to take responsibility (you)
6365

64-
Maintainers reserve the right to decline review or close pull requests for any reason, particularly under any of the following conditions:
66+
Maintainers reserve the right to decline review or close pull requests for any reason, without any questions, particularly under any of the following conditions:
6567
- The proposed change is already mentioned in the roadmap or an existing issue, and it has been assigned to someone.
6668
- The pull request duplicates an existing one.
67-
- The contributor fails to adhere to this contributing guide.
69+
- The contributor fails to adhere to this contributing guide or the AI policy.
6870

6971
# Coding guidelines
7072

common/arg.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2583,7 +2583,7 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
25832583
{"-hf", "-hfr", "--hf-repo"}, "<user>/<model>[:quant]",
25842584
"Hugging Face model repository; quant is optional, case-insensitive, default to Q4_K_M, or falls back to the first file in the repo if Q4_K_M doesn't exist.\n"
25852585
"mmproj is also downloaded automatically if available. to disable, add --no-mmproj\n"
2586-
"example: unsloth/phi-4-GGUF:q4_k_m\n"
2586+
"example: ggml-org/GLM-4.7-Flash-GGUF:Q4_K_M\n"
25872587
"(default: unused)",
25882588
[](common_params & params, const std::string & value) {
25892589
params.model.hf_repo = value;

common/chat-auto-parser-helpers.cpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,21 @@ diff_split calculate_diff_split(const std::string & left, const std::string & ri
188188
result.suffix = "";
189189
// pick prefix = all as representation
190190
}
191+
192+
// When left has no unique content (result.left is empty), left is entirely
193+
// shared with right. The simultaneous prefix/suffix segment matching can
194+
// incorrectly consume trailing segments of left as suffix when those same
195+
// segments also appear at the end of right (e.g. "\n" at the end of both
196+
// the shared content and the generation prompt). This rotates the diff.
197+
// Fix: if left is a prefix of right, enforce that directly.
198+
if (result.left.empty() && !result.right.empty() &&
199+
left.size() <= right.size() &&
200+
right.substr(0, left.size()) == left) {
201+
result.prefix = left;
202+
result.suffix = "";
203+
result.right = right.substr(left.size());
204+
}
205+
191206
return result;
192207
}
193208

convert_hf_to_gguf.py

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,10 @@
3131
from gguf.vocab import MistralTokenizerType, MistralVocab
3232

3333
try:
34-
from mistral_common.tokens.tokenizers.base import TokenizerVersion # pyright: ignore[reportMissingImports]
35-
from mistral_common.tokens.tokenizers.multimodal import DATASET_MEAN as _MISTRAL_COMMON_DATASET_MEAN, DATASET_STD as _MISTRAL_COMMON_DATASET_STD # pyright: ignore[reportMissingImports]
36-
from mistral_common.tokens.tokenizers.tekken import Tekkenizer # pyright: ignore[reportMissingImports]
37-
from mistral_common.tokens.tokenizers.sentencepiece import ( # pyright: ignore[reportMissingImports]
34+
from mistral_common.tokens.tokenizers.base import TokenizerVersion # type: ignore[import-not-found]
35+
from mistral_common.tokens.tokenizers.multimodal import DATASET_MEAN as _MISTRAL_COMMON_DATASET_MEAN, DATASET_STD as _MISTRAL_COMMON_DATASET_STD # type: ignore[import-not-found]
36+
from mistral_common.tokens.tokenizers.tekken import Tekkenizer # type: ignore[import-not-found]
37+
from mistral_common.tokens.tokenizers.sentencepiece import ( # type: ignore[import-not-found]
3838
SentencePieceTokenizer,
3939
)
4040

@@ -45,9 +45,9 @@
4545
_MISTRAL_COMMON_DATASET_STD = (0.26862954, 0.26130258, 0.27577711)
4646

4747
_mistral_common_installed = False
48-
TokenizerVersion = None
49-
Tekkenizer = None
50-
SentencePieceTokenizer = None
48+
TokenizerVersion: Any = None
49+
Tekkenizer: Any = None
50+
SentencePieceTokenizer: Any = None
5151
_mistral_import_error_msg = (
5252
"Mistral format requires `mistral-common` to be installed. Please run "
5353
"`pip install mistral-common[image,audio]` to install it."
@@ -145,6 +145,7 @@ def __init__(self, dir_model: Path, ftype: gguf.LlamaFileType, fname_out: Path,
145145
self.model_name = model_name
146146
self.dir_model_card = dir_model # overridden in convert_lora_to_gguf.py
147147
self._is_nvfp4 = False
148+
self._is_mxfp4 = False
148149

149150
# Apply heuristics to figure out typical tensor encoding based on first tensor's dtype
150151
# NOTE: can't use field "torch_dtype" in config.json, because some finetunes lie.
@@ -220,7 +221,7 @@ def index_tensors(self, remote_hf_model_id: str | None = None) -> dict[str, Call
220221
if weight_map is None or not isinstance(weight_map, dict):
221222
raise ValueError(f"Can't load 'weight_map' from {index_name!r}")
222223
tensor_names_from_index.update(weight_map.keys())
223-
part_dict: dict[str, None] = dict.fromkeys(weight_map.values(), None)
224+
part_dict: dict[str, None] = dict.fromkeys(weight_map.values(), None) # ty: ignore[invalid-assignment]
224225
part_names = sorted(part_dict.keys())
225226
else:
226227
weight_map = {}
@@ -712,6 +713,7 @@ def _flush_nvfp4_experts(self, key, expert_blocks, expert_scales, expert_shapes,
712713
def prepare_tensors(self):
713714
# detect NVFP4 quantization (ModelOpt format)
714715
quant_algo = (self.hparams.get("quantization_config") or {}).get("quant_algo")
716+
quant_method = (self.hparams.get("quantization_config") or {}).get("quant_method")
715717
quant_layers = (self.hparams.get("quantization_config") or {}).get("quantized_layers") or {}
716718
quant_config_file = self.dir_model / "hf_quant_config.json"
717719

@@ -728,6 +730,7 @@ def prepare_tensors(self):
728730
quant_algo = "NVFP4"
729731

730732
self._is_nvfp4 = quant_algo == "NVFP4"
733+
self._is_mxfp4 = quant_method == "mxfp4"
731734

732735
# NVFP4 weights are repacked and written directly to gguf_writer.
733736
# This must run before dequant_model so NVFP4 tensors are removed
@@ -876,6 +879,12 @@ def prepare_metadata(self, vocab_only: bool):
876879
if self.metadata.name is None:
877880
self.metadata.name = self.dir_model.name
878881

882+
if self.ftype in (gguf.LlamaFileType.ALL_F32, gguf.LlamaFileType.MOSTLY_F16, gguf.LlamaFileType.MOSTLY_BF16):
883+
if self._is_nvfp4:
884+
self.ftype = gguf.LlamaFileType.MOSTLY_NVFP4
885+
elif self._is_mxfp4:
886+
self.ftype = gguf.LlamaFileType.MOSTLY_MXFP4_MOE
887+
879888
# Generate parameter weight class (useful for leader boards) if not yet determined
880889
if self.metadata.size_label is None and total_params > 0:
881890
self.metadata.size_label = gguf.size_label(total_params, shared_params, expert_params, expert_count)
@@ -5882,7 +5891,7 @@ def set_vocab(self):
58825891
logger.error(f'Error: Missing {tokenizer_path}')
58835892
sys.exit(1)
58845893

5885-
sentencepiece_model = model.ModelProto() # pyright: ignore[reportAttributeAccessIssue]
5894+
sentencepiece_model = model.ModelProto() # pyright: ignore[reportAttributeAccessIssue] # ty: ignore[unresolved-attribute]
58865895
sentencepiece_model.ParseFromString(open(tokenizer_path, "rb").read())
58875896
add_prefix = sentencepiece_model.normalizer_spec.add_dummy_prefix
58885897

@@ -6203,7 +6212,7 @@ def _xlmroberta_set_vocab(self) -> None:
62036212

62046213
vocab_size = max(self.hparams.get("vocab_size", 0), tokenizer.vocab_size)
62056214
else:
6206-
sentencepiece_model = model.ModelProto() # pyright: ignore[reportAttributeAccessIssue]
6215+
sentencepiece_model = model.ModelProto() # pyright: ignore[reportAttributeAccessIssue] # ty: ignore[unresolved-attribute]
62076216
sentencepiece_model.ParseFromString(open(tokenizer_path, "rb").read())
62086217
assert sentencepiece_model.trainer_spec.model_type == 1 # UNIGRAM
62096218

@@ -8880,7 +8889,7 @@ def set_vocab(self):
88808889
if not tokenizer_path.is_file():
88818890
raise FileNotFoundError(f"File not found: {tokenizer_path}")
88828891

8883-
sentencepiece_model = model.ModelProto() # pyright: ignore[reportAttributeAccessIssue]
8892+
sentencepiece_model = model.ModelProto() # pyright: ignore[reportAttributeAccessIssue] # ty: ignore[unresolved-attribute]
88848893
sentencepiece_model.ParseFromString(open(tokenizer_path, "rb").read())
88858894

88868895
# some models like Pile-T5 family use BPE tokenizer instead of Unigram
@@ -9017,7 +9026,7 @@ def set_vocab(self):
90179026
if not tokenizer_path.is_file():
90189027
raise FileNotFoundError(f"File not found: {tokenizer_path}")
90199028

9020-
sentencepiece_model = model.ModelProto() # pyright: ignore[reportAttributeAccessIssue]
9029+
sentencepiece_model = model.ModelProto() # pyright: ignore[reportAttributeAccessIssue] # ty: ignore[unresolved-attribute]
90219030
sentencepiece_model.ParseFromString(open(tokenizer_path, "rb").read())
90229031

90239032
# some models like Pile-T5 family use BPE tokenizer instead of Unigram
@@ -11125,8 +11134,7 @@ class GptOssModel(TextModel):
1112511134

1112611135
# TODO: remove once MXFP4 is supported more generally
1112711136
def dequant_model(self):
11128-
quant_config = self.hparams.get("quantization_config")
11129-
if quant_config is not None and quant_config.get("quant_method") == "mxfp4":
11137+
if self._is_mxfp4:
1113011138
return
1113111139
return super().dequant_model()
1113211140

@@ -12279,6 +12287,7 @@ def __torch_function__(cls, func, types, args=(), kwargs=None):
1227912287
kwargs = {}
1228012288

1228112289
if func is torch.Tensor.numpy:
12290+
assert len(args)
1228212291
return args[0].numpy()
1228312292

1228412293
return cls._wrap_fn(func)(*args, **kwargs)

convert_llama_ggml_to_gguf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,11 +112,11 @@ def load(self, data, offset):
112112
(n_dims, name_len, dtype) = struct.unpack('<3I', data[offset:offset + 12])
113113
assert n_dims >= 0 and n_dims <= 4, f'Invalid tensor dimensions {n_dims}'
114114
assert name_len < 4096, 'Absurd tensor name length'
115-
quant = gguf.GGML_QUANT_SIZES.get(dtype)
115+
self.dtype = gguf.GGMLQuantizationType(dtype)
116+
quant = gguf.GGML_QUANT_SIZES.get(self.dtype)
116117
assert quant is not None, 'Unknown tensor type'
117118
(blksize, tysize) = quant
118119
offset += 12
119-
self.dtype= gguf.GGMLQuantizationType(dtype)
120120
self.dims = struct.unpack(f'<{n_dims}I', data[offset:offset + (4 * n_dims)])
121121
offset += 4 * n_dims
122122
self.name = bytes(data[offset:offset + name_len])

convert_lora_to_gguf.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,10 +199,13 @@ def __torch_function__(cls, func: Callable, types, args=(), kwargs=None):
199199
kwargs = {}
200200

201201
if func is torch.permute:
202+
assert len(args)
202203
return type(args[0]).permute(*args, **kwargs)
203204
elif func is torch.reshape:
205+
assert len(args)
204206
return type(args[0]).reshape(*args, **kwargs)
205207
elif func is torch.stack:
208+
assert len(args)
206209
assert isinstance(args[0], Sequence)
207210
dim = kwargs.get("dim", 0)
208211
assert dim == 0
@@ -211,6 +214,7 @@ def __torch_function__(cls, func: Callable, types, args=(), kwargs=None):
211214
torch.stack([b._lora_B for b in args[0]], dim),
212215
)
213216
elif func is torch.cat:
217+
assert len(args)
214218
assert isinstance(args[0], Sequence)
215219
dim = kwargs.get("dim", 0)
216220
assert dim == 0
@@ -362,7 +366,7 @@ def load_hparams_from_hf(hf_model_id: str) -> tuple[dict[str, Any], Path | None]
362366
logger.error(f"Model {hparams['architectures'][0]} is not supported")
363367
sys.exit(1)
364368

365-
class LoraModel(model_class):
369+
class LoraModel(model_class): # ty: ignore[unsupported-base]
366370
model_arch = model_class.model_arch
367371

368372
lora_alpha: float

examples/json_schema_to_grammar.py

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,6 @@ def _build_repetition(item_rule, min_items, max_items, separator_rule=None):
2828
return f'({result})?' if min_items == 0 else result
2929

3030
def _generate_min_max_int(min_value: Optional[int], max_value: Optional[int], out: list, decimals_left: int = 16, top_level: bool = True):
31-
has_min = min_value != None
32-
has_max = max_value != None
33-
3431
def digit_range(from_char: str, to_char: str):
3532
out.append("[")
3633
if from_char == to_char:
@@ -106,7 +103,7 @@ def uniform_range(from_str: str, to_str: str):
106103
out.append(to_str[i])
107104
out.append("]")
108105

109-
if has_min and has_max:
106+
if min_value is not None and max_value is not None:
110107
if min_value < 0 and max_value < 0:
111108
out.append("\"-\" (")
112109
_generate_min_max_int(-max_value, -min_value, out, decimals_left, top_level=True)
@@ -133,7 +130,7 @@ def uniform_range(from_str: str, to_str: str):
133130

134131
less_decimals = max(decimals_left - 1, 1)
135132

136-
if has_min:
133+
if min_value is not None:
137134
if min_value < 0:
138135
out.append("\"-\" (")
139136
_generate_min_max_int(None, -min_value, out, decimals_left, top_level=False)
@@ -177,7 +174,7 @@ def uniform_range(from_str: str, to_str: str):
177174
more_digits(length - 1, less_decimals)
178175
return
179176

180-
if has_max:
177+
if max_value is not None:
181178
if max_value >= 0:
182179
if top_level:
183180
out.append("\"-\" [1-9] ")

examples/model-conversion/scripts/embedding/run-original-model.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def load_model_and_tokenizer(model_path, use_sentence_transformers=False, device
6464
print("Using SentenceTransformer to apply all numbered layers")
6565
model = SentenceTransformer(model_path)
6666
tokenizer = model.tokenizer
67-
config = model[0].auto_model.config # type: ignore
67+
config = model[0].auto_model.config
6868
else:
6969
tokenizer = AutoTokenizer.from_pretrained(model_path)
7070
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
@@ -108,8 +108,8 @@ def load_model_and_tokenizer(model_path, use_sentence_transformers=False, device
108108
print(f"Model file: {type(model).__module__}")
109109

110110
# Verify the model is using the correct sliding window
111-
if hasattr(model.config, 'sliding_window'): # type: ignore
112-
print(f"Model's sliding_window: {model.config.sliding_window}") # type: ignore
111+
if hasattr(model.config, 'sliding_window'):
112+
print(f"Model's sliding_window: {model.config.sliding_window}")
113113
else:
114114
print("Model config does not have sliding_window attribute")
115115

@@ -152,7 +152,7 @@ def main():
152152
device = next(model.parameters()).device
153153
else:
154154
# For SentenceTransformer, get device from the underlying model
155-
device = next(model[0].auto_model.parameters()).device # type: ignore
155+
device = next(model[0].auto_model.parameters()).device
156156

157157
model_name = os.path.basename(model_path)
158158

@@ -177,7 +177,7 @@ def main():
177177
print(f"{token_id:6d} -> '{token_str}'")
178178

179179
print(f"Embeddings shape (after all SentenceTransformer layers): {all_embeddings.shape}")
180-
print(f"Embedding dimension: {all_embeddings.shape[1] if len(all_embeddings.shape) > 1 else all_embeddings.shape[0]}") # type: ignore
180+
print(f"Embedding dimension: {all_embeddings.shape[1] if len(all_embeddings.shape) > 1 else all_embeddings.shape[0]}")
181181
else:
182182
# Standard approach: use base model output only
183183
encoded = tokenizer(
@@ -205,12 +205,12 @@ def main():
205205
print(f"Embedding dimension: {all_embeddings.shape[1]}")
206206

207207
if len(all_embeddings.shape) == 1:
208-
n_embd = all_embeddings.shape[0] # type: ignore
208+
n_embd = all_embeddings.shape[0]
209209
n_embd_count = 1
210210
all_embeddings = all_embeddings.reshape(1, -1)
211211
else:
212-
n_embd = all_embeddings.shape[1] # type: ignore
213-
n_embd_count = all_embeddings.shape[0] # type: ignore
212+
n_embd = all_embeddings.shape[1]
213+
n_embd_count = all_embeddings.shape[0]
214214

215215
print()
216216

0 commit comments

Comments
 (0)