Skip to content

Commit 972e464

Browse files
Merge pull request #464 from janhq/update-dev-from-master-2026-03-25-00-49
Sync master with upstream release b8508
2 parents b61c2f5 + 9f102a1 commit 972e464

37 files changed

Lines changed: 1296 additions & 452 deletions

.github/ISSUE_TEMPLATE/010-bug-compilation.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ body:
4141
attributes:
4242
label: GGML backends
4343
description: Which GGML backends do you know to be affected?
44-
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
44+
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, OpenVINO, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
4545
multiple: true
4646
validations:
4747
required: true

.github/ISSUE_TEMPLATE/011-bug-results.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ body:
4242
attributes:
4343
label: GGML backends
4444
description: Which GGML backends do you know to be affected?
45-
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
45+
options: [AMX, BLAS, CANN, CPU, CUDA, Hexagon, HIP, Metal, Musa, OpenCL, OpenVINO, RPC, SYCL, VirtGPU, Vulkan, WebGPU, zDNN, ZenDNN]
4646
multiple: true
4747
validations:
4848
required: true

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ LLM inference in C/C++
1717

1818
## Hot topics
1919

20+
- **Hugging Face cache migration: models downloaded with `-hf` are now stored in the standard Hugging Face cache directory, enabling sharing with other HF tools.**
2021
- **[guide : using the new WebUI of llama.cpp](https://github.com/ggml-org/llama.cpp/discussions/16938)**
2122
- [guide : running gpt-oss with llama.cpp](https://github.com/ggml-org/llama.cpp/discussions/15396)
2223
- [[FEEDBACK] Better packaging for llama.cpp to support downstream consumers 🤗](https://github.com/ggml-org/llama.cpp/discussions/15313)
@@ -241,7 +242,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
241242
<details>
242243
<summary>Tools</summary>
243244

244-
- [akx/ggify](https://github.com/akx/ggify) – download PyTorch models from HuggingFace Hub and convert them to GGML
245+
- [akx/ggify](https://github.com/akx/ggify) – download PyTorch models from Hugging Face Hub and convert them to GGML
245246
- [akx/ollama-dl](https://github.com/akx/ollama-dl) – download models from the Ollama library to be used directly with llama.cpp
246247
- [crashr/gppm](https://github.com/crashr/gppm) – launch llama.cpp instances utilizing NVIDIA Tesla P40 or P100 GPUs with reduced idle power consumption
247248
- [gpustack/gguf-parser](https://github.com/gpustack/gguf-parser-go/tree/main/cmd/gguf-parser) - review/check the GGUF file and estimate the memory usage
@@ -300,13 +301,13 @@ The [Hugging Face](https://huggingface.co) platform hosts a [number of LLMs](htt
300301
- [Trending](https://huggingface.co/models?library=gguf&sort=trending)
301302
- [LLaMA](https://huggingface.co/models?sort=trending&search=llama+gguf)
302303

303-
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, such as [ModelScope](https://modelscope.cn/), by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
304+
You can either manually download the GGUF file or directly use any `llama.cpp`-compatible models from [Hugging Face](https://huggingface.co/) or other model hosting sites, by using this CLI argument: `-hf <user>/<model>[:quant]`. For example:
304305

305306
```sh
306307
llama-cli -hf ggml-org/gemma-3-1b-it-GGUF
307308
```
308309

309-
By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. For example, you may opt to downloading model checkpoints from ModelScope or other model sharing communities by setting the environment variable, e.g. `MODEL_ENDPOINT=https://www.modelscope.cn/`.
310+
By default, the CLI would download from Hugging Face, you can switch to other options with the environment variable `MODEL_ENDPOINT`. The `MODEL_ENDPOINT` must point to a Hugging Face compatible API endpoint.
310311

311312
After downloading a model, use the CLI tools to run it locally - see below.
312313

common/CMakeLists.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,8 @@ add_library(${TARGET} STATIC
6363
debug.h
6464
download.cpp
6565
download.h
66+
hf-cache.cpp
67+
hf-cache.h
6668
http.h
6769
json-partial.cpp
6870
json-partial.h

common/arg.cpp

Lines changed: 45 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
#include "chat.h"
44
#include "common.h"
55
#include "download.h"
6+
#include "hf-cache.h"
67
#include "json-schema-to-grammar.h"
78
#include "log.h"
89
#include "sampling.h"
@@ -326,60 +327,48 @@ struct handle_model_result {
326327
common_params_model mmproj;
327328
};
328329

329-
static handle_model_result common_params_handle_model(
330-
struct common_params_model & model,
331-
const std::string & bearer_token,
332-
bool offline) {
330+
static handle_model_result common_params_handle_model(struct common_params_model & model,
331+
const std::string & bearer_token,
332+
bool offline) {
333333
handle_model_result result;
334-
// handle pre-fill default model path and url based on hf_repo and hf_file
335-
{
336-
if (!model.docker_repo.empty()) { // Handle Docker URLs by resolving them to local paths
337-
model.path = common_docker_resolve_model(model.docker_repo);
338-
model.name = model.docker_repo; // set name for consistency
339-
} else if (!model.hf_repo.empty()) {
340-
// short-hand to avoid specifying --hf-file -> default it to --model
341-
if (model.hf_file.empty()) {
342-
if (model.path.empty()) {
343-
auto auto_detected = common_get_hf_file(model.hf_repo, bearer_token, offline);
344-
if (auto_detected.repo.empty() || auto_detected.ggufFile.empty()) {
345-
exit(1); // error message already printed
346-
}
347-
model.name = model.hf_repo; // repo name with tag
348-
model.hf_repo = auto_detected.repo; // repo name without tag
349-
model.hf_file = auto_detected.ggufFile;
350-
if (!auto_detected.mmprojFile.empty()) {
351-
result.found_mmproj = true;
352-
result.mmproj.hf_repo = model.hf_repo;
353-
result.mmproj.hf_file = auto_detected.mmprojFile;
354-
}
355-
} else {
356-
model.hf_file = model.path;
357-
}
358-
}
359334

360-
std::string model_endpoint = get_model_endpoint();
361-
model.url = model_endpoint + model.hf_repo + "/resolve/main/" + model.hf_file;
362-
// make sure model path is present (for caching purposes)
363-
if (model.path.empty()) {
364-
// this is to avoid different repo having same file name, or same file name in different subdirs
365-
std::string filename = clean_file_name(model.hf_repo + "_" + model.hf_file);
366-
model.path = fs_get_cache_file(filename);
367-
}
335+
if (!model.docker_repo.empty()) {
336+
model.path = common_docker_resolve_model(model.docker_repo);
337+
model.name = model.docker_repo;
338+
} else if (!model.hf_repo.empty()) {
339+
// If -m was used with -hf, treat the model "path" as the hf_file to download
340+
if (model.hf_file.empty() && !model.path.empty()) {
341+
model.hf_file = model.path;
342+
model.path = "";
343+
}
344+
common_download_model_opts opts;
345+
opts.download_mmproj = true;
346+
opts.offline = offline;
347+
auto download_result = common_download_model(model, bearer_token, opts);
348+
349+
if (download_result.model_path.empty()) {
350+
LOG_ERR("error: failed to download model from Hugging Face\n");
351+
exit(1);
352+
}
368353

369-
} else if (!model.url.empty()) {
370-
if (model.path.empty()) {
371-
auto f = string_split<std::string>(model.url, '#').front();
372-
f = string_split<std::string>(f, '?').front();
373-
model.path = fs_get_cache_file(string_split<std::string>(f, '/').back());
374-
}
354+
model.name = model.hf_repo;
355+
model.path = download_result.model_path;
375356

357+
if (!download_result.mmproj_path.empty()) {
358+
result.found_mmproj = true;
359+
result.mmproj.path = download_result.mmproj_path;
360+
}
361+
} else if (!model.url.empty()) {
362+
if (model.path.empty()) {
363+
auto f = string_split<std::string>(model.url, '#').front();
364+
f = string_split<std::string>(f, '?').front();
365+
model.path = fs_get_cache_file(string_split<std::string>(f, '/').back());
376366
}
377-
}
378367

379-
// then, download it if needed
380-
if (!model.url.empty()) {
381-
bool ok = common_download_model(model, bearer_token, offline);
382-
if (!ok) {
368+
common_download_model_opts opts;
369+
opts.offline = offline;
370+
auto download_result = common_download_model(model, bearer_token, opts);
371+
if (download_result.model_path.empty()) {
383372
LOG_ERR("error: failed to download model from %s\n", model.url.c_str());
384373
exit(1);
385374
}
@@ -539,6 +528,13 @@ static bool common_params_parse_ex(int argc, char ** argv, common_params_context
539528
// parse the first time to get -hf option (used for remote preset)
540529
parse_cli_args();
541530

531+
// TODO: Remove later
532+
try {
533+
hf_cache::migrate_old_cache_to_hf_cache(params.hf_token, params.offline);
534+
} catch (const std::exception & e) {
535+
LOG_WRN("HF cache migration failed: %s\n", e.what());
536+
}
537+
542538
// maybe handle remote preset
543539
if (!params.model.hf_repo.empty()) {
544540
std::string cli_hf_repo = params.model.hf_repo;
@@ -1061,12 +1057,10 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
10611057
{"-cl", "--cache-list"},
10621058
"show list of models in cache",
10631059
[](common_params &) {
1064-
printf("model cache directory: %s\n", fs_get_cache_directory().c_str());
10651060
auto models = common_list_cached_models();
10661061
printf("number of models in cache: %zu\n", models.size());
10671062
for (size_t i = 0; i < models.size(); i++) {
1068-
auto & model = models[i];
1069-
printf("%4d. %s\n", (int) i + 1, model.to_string().c_str());
1063+
printf("%4zu. %s\n", i + 1, models[i].to_string().c_str());
10701064
}
10711065
exit(0);
10721066
}

common/chat-auto-parser-generator.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,7 @@ common_peg_arena autoparser::build_parser(const generation_params & inputs) cons
112112
} else {
113113
parser = content.build_parser(ctx);
114114
}
115-
parser = wrap_for_generation_prompt(p, parser, inputs, reasoning.start);
116-
return parser;
115+
return p.prefix(inputs.generation_prompt, reasoning.start) + parser;
117116
});
118117
}
119118

common/chat-auto-parser-helpers.cpp

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -308,22 +308,6 @@ std::vector<segment> prune_whitespace_segments(const std::vector<segment> & segm
308308
return result;
309309
}
310310

311-
common_peg_parser wrap_for_generation_prompt(common_chat_peg_builder & p,
312-
const common_peg_parser & prs,
313-
const autoparser::generation_params & inputs,
314-
const std::string & reasoning_start) {
315-
auto parser = prs;
316-
if (!inputs.generation_prompt.empty()) {
317-
size_t end_pos = inputs.generation_prompt.size();
318-
if (!reasoning_start.empty() && inputs.generation_prompt.find(reasoning_start) != std::string::npos) {
319-
end_pos = inputs.generation_prompt.find(reasoning_start);
320-
}
321-
std::string cut_genprompt = inputs.generation_prompt.substr(0, end_pos);
322-
parser = p.literal(cut_genprompt) + parser;
323-
}
324-
return parser;
325-
}
326-
327311
namespace autoparser {
328312

329313
std::string apply_template(const common_chat_template & tmpl, const template_params & params) {

common/chat-auto-parser-helpers.h

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,11 +58,6 @@ std::vector<segment> segmentize_markers(const std::string & text);
5858
// (MARKER, "</function>"), (MARKER, "</tool_call>") ]
5959
std::vector<segment> prune_whitespace_segments(const std::vector<segment> & segments);
6060

61-
// Wrap parser with generation prompt parser
62-
common_peg_parser wrap_for_generation_prompt(common_chat_peg_builder & p,
63-
const common_peg_parser & prs,
64-
const autoparser::generation_params & inputs,
65-
const std::string & reasoning_start = {});
6661
namespace autoparser {
6762

6863
// Apply a template with the given parameters, returning the rendered string (empty on failure)

common/chat-peg-parser.cpp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -802,6 +802,16 @@ common_peg_parser common_chat_peg_builder::build_json_tools_flat_keys(
802802
return tool_choices;
803803
}
804804

805+
common_peg_parser common_chat_peg_builder::prefix(const std::string & s, const std::string & delimiter) {
806+
if (s.empty()) {
807+
return eps();
808+
}
809+
if (delimiter.empty()) {
810+
return literal(s);
811+
}
812+
return literal(s.substr(0, s.rfind(delimiter)));
813+
}
814+
805815
common_peg_parser common_chat_peg_builder::standard_json_tools(
806816
const std::string & section_start,
807817
const std::string & section_end,

common/chat-peg-parser.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,10 @@ class common_chat_peg_builder : public common_peg_parser_builder {
8282
common_peg_parser tool_arg_string_value(const common_peg_parser & p) { return tag(TOOL_ARG_STRING_VALUE, p); }
8383
common_peg_parser tool_arg_json_value(const common_peg_parser & p) { return atomic(tag(TOOL_ARG_VALUE, p)); }
8484

85+
86+
// Return a parser that parses the prefix of a string, up to a given delimiter.
87+
common_peg_parser prefix(const std::string & s, const std::string & delimiter = {});
88+
8589
// Legacy-compatible helper for building standard JSON tool calls
8690
// Used by tests and manual parsers
8791
// name_key/args_key: JSON key names for function name and arguments

0 commit comments

Comments
 (0)