Description
Discussed in #107
Originally posted by zhouruicn November 9, 2024
I use the test case from "jlama-tests"
@test
@order(1)
public void LlamaRun() throws Exception {
var dir = appContext.nodeContext().nodePath().commonsLlm();
try (WeightLoader weights = SafeTensorSupport.loadWeights(dir.toFile())) {
LlamaTokenizer tokenizer = new LlamaTokenizer(dir);
Config c = om.readValue(dir.resolve("config.json").toFile(), LlamaConfig.class);
LlamaModel model = new LlamaModel(c, weights, tokenizer, DType.F32, DType.F32,
Optional.empty());
PromptSupport.Builder builder = model.promptSupport().get().builder();
builder.addUserMessage("您能讲中文吗");// chinese text here! it means can you speek chinese?
builder.addGenerationPrompt(true);
PromptContext promptContext = builder.build();
LOGGER.info("First prompt \n{}", promptContext);
Generator.Response r =
model.generate(UUID.randomUUID(), promptContext, 0.7f, 1024, (l, f) -> {
});
LOGGER.info("Response: {}", r.responseText);
}
}
3:50:04.938 [main] DEBUG com.github.tjake.jlama.safetensors.SafeTensorIndex -- Adding split 0-2008209408 with 451 tensors of 451
23:50:05.954 [main] INFO com.github.tjake.jlama.model.AbstractModel -- Model type = Q4, Working memory type = F32, Quantized memory type = F32
23:50:06.037 [main] INFO com.x.app.ai.jlama.JlamaLlamaTest -- First prompt
PromptContext{prompt='<|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
<|eot_id|><|start_header_id|>user<|end_header_id|>
您能讲中文吗<|eot_id|><|start_header_id|>assistant<|end_header_id|>
', optionalTools=Optional.empty}
23:50:06.048 [main] DEBUG com.github.tjake.jlama.tensor.KvBufferCache -- Optimal page size: 28 layers, 36 context length, 8257536 bytes, 1 layer pages, 3641 length pages
23:50:06.049 [main] DEBUG com.github.tjake.jlama.model.AbstractModel -- Starting at token 0 for session 1c54efb0-8879-4604-b9f3-03ba378a6052 with prompt <|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
<|eot_id|><|start_header_id|>user<|end_header_id|>
您能讲中文吗<|eot_id|><|start_header_id|>assistant<|end_header_id|>
23:50:06.167 [main] DEBUG com.github.tjake.jlama.tensor.operations.util.JarSupport -- Loaded jlama-native library: /tmp/jlama13277383875011588717/libjlama.so
23:50:06.191 [main] DEBUG com.github.tjake.jlama.util.MachineSpec -- Machine Vector Spec: AVX_256
23:50:06.191 [main] DEBUG com.github.tjake.jlama.util.MachineSpec -- Byte Order: LITTLE_ENDIAN
23:50:06.191 [main] INFO com.github.tjake.jlama.tensor.operations.TensorOperationsProvider -- Using Native SIMD Operations (OffHeap)
23:50:08.263 [main] DEBUG com.github.tjake.jlama.model.AbstractModel -- 34 prompt tokens in 2210ms | 65.0ms per token
23:50:15.437 [main] DEBUG com.github.tjake.jlama.model.AbstractModel --
elapsed: 9s, prompt 65.0ms per token, gen 152.0ms per token
23:50:15.440 [main] INFO com.x.app.ai.jlama.JlamaLlamaTest -- Response: You didn't provide any information or context for me to understand what you'd like to talk about. If you have a specific question or topic in mind, feel free to share it, and I'll do my best to assist you!
please help me!