Different Embeddings from sentence-transformers/all-MiniLM-L6-v2 compared to Python

First, thank you for the amazing work on Jlama! It's great to have native Java libraries for embeddings and LLMs.

## Issue

We're getting different embedding values from Jlama compared to Python's sentence-transformers for the `all-MiniLM-L6-v2` model, even though we've verified that tokenization is identical.

## Java Code (Jlama)

```java
var modelName = "sentence-transformers/all-MiniLM-L6-v2";
var workingDirectory = System.getProperty("user.home") + "/.jlama/models/";
var downloader = new Downloader(workingDirectory, modelName);
var modelPath = downloader.huggingFaceModel();

var model = ModelSupport.loadEmbeddingModel(modelPath, DType.F32, DType.F32);

String text = "This is a test document about machine learning";
float[] embedding = model.embed(text, Generator.PoolingType.AVG);

System.out.println("First 10 values:");
for (int i = 0; i < 10; i++) {
    System.out.println("  [" + i + "] = " + embedding[i]);
}
```

**Java Output:**
```
Magnitude: 1.0000001
[0] = -0.0009431843
[1] = 0.006532612
[2] = 0.070363656
[3] = 0.0154365115
```

## Python Code (sentence-transformers)

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
text = "This is a test document about machine learning"
embedding = model.encode(text)

print("First 10 values:")
for i in range(10):
    print(f"  [{i}] = {embedding[i]}")
```

**Python Output:**
```
Magnitude: 1.0
[0] = -0.038466498255729675
[1] = 0.00013165567361284047
[2] = 0.01088548544794321
[3] = 0.040931958705186844
```

## What We've Verified

1. **Tokenization is identical**: Both produce the same token IDs: `[101, 2023, 2003, 1037, 3231, 6254, 2055, 3698, 4083, 102]`
2. **Same pooling strategy**: Both use mean/average pooling (`PoolingType.AVG` in Java, `pooling_mode_mean_tokens=True` in Python)
3. **Same model source**: Both download from HuggingFace `sentence-transformers/all-MiniLM-L6-v2`

## The Problem

The actual embedding values are completely different (not just minor floating-point differences). 

## Questions

1. Is the `all-MiniLM-L6-v2` model fully supported/tested with Jlama?
2. Are we missing any configuration or preprocessing steps?

Any guidance would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Different Embeddings from sentence-transformers/all-MiniLM-L6-v2 compared to Python #188

Issue

Java Code (Jlama)

Python Code (sentence-transformers)

What We've Verified

The Problem

Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Different Embeddings from sentence-transformers/all-MiniLM-L6-v2 compared to Python #188

Description

Issue

Java Code (Jlama)

Python Code (sentence-transformers)

What We've Verified

The Problem

Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions