Skip to content

Release v2.4.2 #303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Feb 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
d3ec141
remove libundreamai_ios.a from embed frameworks
amakropoulos Jan 10, 2025
556162e
use relative path of library in build
amakropoulos Jan 10, 2025
4f26da8
fix embed frameworks code
amakropoulos Jan 13, 2025
c648e1f
update changelogs
amakropoulos Jan 17, 2025
d310cde
add warm-up for provided prompt
amakropoulos Jan 21, 2025
d030272
update changelogs
amakropoulos Jan 21, 2025
f37f25d
add tooltips for the different parameters
amakropoulos Jan 21, 2025
bea75d5
remove tail-free sampling
amakropoulos Jan 21, 2025
9741585
script to automatically add tooltips based on <summary>
amakropoulos Jan 21, 2025
5952baf
workflow to automatically add tooltips based on <summary>
amakropoulos Jan 21, 2025
7174aa8
deprecate options page
amakropoulos Jan 21, 2025
b304a41
update tooltips
amakropoulos Jan 21, 2025
543e743
Merge b304a419428ce50861c1d3550bd4fcd00fadb38f into e040c976794eb48fd…
amakropoulos Jan 21, 2025
a4c435c
update VERSION
amakropoulos Jan 21, 2025
e7c4fdc
allow to override tooltip removal
amakropoulos Jan 21, 2025
473c91b
note on Allow Downloads Over HTTP
amakropoulos Jan 21, 2025
7c44b29
automatically update llamalib url
amakropoulos Jan 21, 2025
519e70e
add link to MaiMai AI Agent System project
amakropoulos Jan 21, 2025
48d667f
persist debug mode and use of extras to the build
amakropoulos Jan 21, 2025
a446045
update changelogs
amakropoulos Jan 21, 2025
390a1b2
add caller graph
amakropoulos Jan 22, 2025
7f7accd
remove support for CUDA 11.7.1
amakropoulos Feb 13, 2025
df3d837
implement DeepSeek chat templates
amakropoulos Feb 14, 2025
5fd5597
add Qwen 2.5 and DeepSeek R1 Distil models
amakropoulos Feb 14, 2025
774ede0
bump LlamaLib to v1.2.3
amakropoulos Feb 14, 2025
f264b1b
fix DeepSeek chat templates
amakropoulos Feb 18, 2025
e2abc59
adapt unit tests
amakropoulos Feb 18, 2025
364e9a5
adapt CUDA full tests
amakropoulos Feb 19, 2025
0c5819c
adapt tests for windows
amakropoulos Feb 19, 2025
ec70b64
update changelogs
amakropoulos Feb 19, 2025
a8f2efe
load dependencies for full CUDA and vulkan architectures
amakropoulos Feb 19, 2025
04f3cf6
free up dependencies
amakropoulos Feb 19, 2025
3e7044b
update changelogs
amakropoulos Feb 19, 2025
fdeea25
update changelogs
amakropoulos Feb 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/doxygen/Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ PROJECT_NAME = "LLM for Unity"
# could be handy for archiving the generated documentation or if some version
# control system is used.

PROJECT_NUMBER = v2.4.1
PROJECT_NUMBER = v2.4.2

# Using the PROJECT_BRIEF tag one can provide an optional one line description
# for a project that appears at the top of each page and should give viewer a
Expand Down Expand Up @@ -2689,7 +2689,7 @@ CALL_GRAPH = NO
# The default value is: NO.
# This tag requires that the tag HAVE_DOT is set to YES.

CALLER_GRAPH = NO
CALLER_GRAPH = YES

# If the GRAPHICAL_HIERARCHY tag is set to YES then doxygen will graphical
# hierarchy of all classes instead of a textual one.
Expand Down
104 changes: 104 additions & 0 deletions .github/update_tooltips.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import os
import sys


def get_classname(line):
if ' class ' not in line or ':' not in line:
return None, None
classParts = line.strip().split(' ')
delimInd = classParts.index(':')
className = classParts[delimInd - 1]
parentName = classParts[delimInd + 1]
return className, parentName

def find_eligible_classes(file_paths):
child_classes = {}
for file_path in file_paths:
with open(file_path, 'r') as file:
lines = file.readlines()
for line in lines:
className, parentName = get_classname(line)
if className is not None:
child_classes[parentName] = child_classes.get(parentName, []) + [className]

ret_classes = []
check_classes = ['MonoBehaviour']
while len(check_classes) > 0:
check_class = check_classes.pop()
if check_class in ret_classes:
continue
if check_class != 'MonoBehaviour':
ret_classes.append(check_class)
check_classes += child_classes.get(check_class, [])
return ret_classes



def add_tooltips_to_unity_file(file_path, allowed_classes):
# Read the content of the file
with open(file_path, 'r') as file:
lines = file.readlines()

# Initialize variables
updated_lines = []
in_summary = False
allowed_class = False
summary_text = ""

for line in lines:
stripped_line = line.strip()
className, __ = get_classname(line)
if className is not None:
allowed_class = className in allowed_classes

if allowed_class:
if '<summary>' in stripped_line:
in_summary = True
summary_text = ''

if in_summary:
if summary_text != "": summary_text += ' '
summary_text += stripped_line.replace("///", "").replace("<summary>", "").replace("</summary>", "").strip()

if '</summary>' in stripped_line:
in_summary = False

if 'Tooltip' in stripped_line:
if ('Tooltip: ignore' not in stripped_line):
continue

include_terms = ['public', ';']
exclude_terms = ['{', 'static', 'abstract']
if all([x in stripped_line for x in include_terms]) and not any([x in stripped_line for x in exclude_terms]):
if summary_text != '':
num_spaces = len(line) - len(line.lstrip())
tooltip = ''.join([' '] * num_spaces + [f'[Tooltip("{summary_text}")]', '\n'])
updated_lines.append(tooltip)
summary_text = ''

if not in_summary and ('{' in stripped_line or '}' in stripped_line):
summary_text = ''

# Add the current line to the updated lines
updated_lines.append(line)

# Write the updated content back to the file
with open(file_path, 'w') as file:
file.writelines(updated_lines)





if __name__ == '__main__':
# Find all .cs files
search_directory = 'Runtime'
cs_files = []
for root, _, files in os.walk(search_directory):
for file in files:
if file.endswith(".cs"):
cs_files.append(os.path.join(root, file))

classes = find_eligible_classes(cs_files)
for file in cs_files:
add_tooltips_to_unity_file(file, classes)
30 changes: 30 additions & 0 deletions .github/workflows/update_tooltips.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Changelog
on:
pull_request:
types: [closed]

jobs:
build:
runs-on: ubuntu-latest
if: startsWith(github.base_ref, 'release/') && github.event.pull_request.merged == true
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
persist-credentials: false
token: ${{ github.token }}

- name: Update tooltips
id: update_tooltips
run: |
python .github/update_tooltips.py
git config --global user.name $GITHUB_ACTOR
git config --global user.email [email protected]
git add Runtime
git commit -m "update tooltips"

- name: Push changes
uses: ad-m/github-push-action@master
with:
github_token: ${{ github.token }}
branch: ${{ github.base_ref }}
6 changes: 6 additions & 0 deletions .github/workflows/version.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,12 @@ jobs:
git add package.json
git add Runtime/LLMUnitySetup.cs
git add .github/doxygen/Doxyfile

llamalibVersion=`cat Runtime/LLMUnitySetup.cs | grep LlamaLibVersion | grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+'`
sed -Ei "s:https\://github.com/undreamai/LlamaLib/releases/download/v[0-9]+\.[0-9]+\.[0-9]+/undreamai-v[0-9]+\.[0-9]+\.[0-9]+:https\://github.com/undreamai/LlamaLib/releases/download/$llamalibVersion/undreamai-$llamalibVersion:g" README.md

git add README.md

git commit -m "update VERSION"

- name: Push changes
Expand Down
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
## v2.4.2
#### 🚀 Features

- Integrate DeepSeek models (PR: #312)
- Update LlamaLib to v1.2.3 (llama.cpp b4688) (PR: #312)
- Drop CUDA 11.7.1 support (PR: #312)
- Add warm-up function for provided prompt (PR: #301)
- Add documentation in Unity tooltips (PR: #302)

#### 🐛 Fixes

- Fix code signing on iOS (PR: #298)
- Persist debug mode and use of extras to the build (PR: #304)
- Fix dependency resolution for full CUDA and vulkan architectures (PR: #313)


## v2.4.1
#### 🚀 Features

Expand Down
10 changes: 8 additions & 2 deletions CHANGELOG.release.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
### 🚀 Features

- Static library linking on mobile (fixes iOS signing) (PR: #289)
- Integrate DeepSeek models (PR: #312)
- Update LlamaLib to v1.2.3 (llama.cpp b4688) (PR: #312)
- Drop CUDA 11.7.1 support (PR: #312)
- Add warm-up function for provided prompt (PR: #301)
- Add documentation in Unity tooltips (PR: #302)

### 🐛 Fixes

- Fix support for extras (flash attention, iQ quants) (PR: #292)
- Fix code signing on iOS (PR: #298)
- Persist debug mode and use of extras to the build (PR: #304)
- Fix dependency resolution for full CUDA and vulkan architectures (PR: #313)

44 changes: 36 additions & 8 deletions Editor/LLMBuildProcessor.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
using UnityEditor.Build;
using UnityEditor.Build.Reporting;
using UnityEngine;
using System.IO;

#if UNITY_IOS
using UnityEditor.iOS.Xcode;
#endif
Expand Down Expand Up @@ -48,24 +50,50 @@ private void OnBuildError(string condition, string stacktrace, LogType type)

#if UNITY_IOS
/// <summary>
/// Adds the Accelerate framework (for ios)
/// Postprocess the iOS Build
/// </summary>
public static void AddAccelerate(string outputPath)
public static void PostprocessIOSBuild(string outputPath)
{
string projPath = PBXProject.GetPBXProjectPath(outputPath);
PBXProject proj = new PBXProject();
proj.ReadFromFile(projPath);
proj.AddFrameworkToProject(proj.GetUnityMainTargetGuid(), "Accelerate.framework", false);
proj.AddFrameworkToProject(proj.GetUnityFrameworkTargetGuid(), "Accelerate.framework", false);
proj.WriteToFile(projPath);
PBXProject project = new PBXProject();
project.ReadFromFile(projPath);

string targetGuid = project.GetUnityFrameworkTargetGuid();
string frameworkTargetGuid = project.GetUnityFrameworkTargetGuid();
string unityMainTargetGuid = project.GetUnityMainTargetGuid();
string embedFrameworksGuid = project.GetResourcesBuildPhaseByTarget(frameworkTargetGuid);

// Add Accelerate framework
project.AddFrameworkToProject(unityMainTargetGuid, "Accelerate.framework", false);
project.AddFrameworkToProject(targetGuid, "Accelerate.framework", false);

// Remove libundreamai_ios.a from Embed Frameworks
string libraryFile = Path.Combine("Libraries", LLMBuilder.PluginLibraryDir("iOS", true), "libundreamai_ios.a");
string fileGuid = project.FindFileGuidByProjectPath(libraryFile);
if (string.IsNullOrEmpty(fileGuid)) Debug.LogError($"Library file {libraryFile} not found in project");
else
{
foreach (var phaseGuid in project.GetAllBuildPhasesForTarget(unityMainTargetGuid))
{
if (project.GetBuildPhaseName(phaseGuid) == "Embed Frameworks")
{
project.RemoveFileFromBuild(phaseGuid, fileGuid);
break;
}
}
project.RemoveFileFromBuild(unityMainTargetGuid, fileGuid);
}

project.WriteToFile(projPath);
}

#endif

// called after the build
public void OnPostprocessBuild(BuildReport report)
{
#if UNITY_IOS
AddAccelerate(report.summary.outputPath);
PostprocessIOSBuild(report.summary.outputPath);
#endif
BuildCompleted();
}
Expand Down
116 changes: 116 additions & 0 deletions Options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Options (deprecated)
## LLM Settings

- `Show/Hide Advanced Options` Toggle to show/hide advanced options from below
- `Log Level` select how verbose the log messages are
- `Use extras` select to install and allow the use of extra features (flash attention and IQ quants)

## 💻 Setup Settings

<div>
<img width="300" src=".github/LLM_GameObject.png" align="right"/>
</div>

- `Remote` select to provide remote access to the LLM
- `Port` port to run the LLM server (if `Remote` is set)
- `Num Threads` number of threads to use (default: -1 = all)
- `Num GPU Layers` number of model layers to offload to the GPU.
If set to 0 the GPU is not used. Use a large number i.e. >30 to utilise the GPU as much as possible.
Note that higher values of context size will use more VRAM.
If the user's GPU is not supported, the LLM will fall back to the CPU
- `Debug` select to log the output of the model in the Unity Editor
- <details><summary>Advanced options</summary>

- <details><summary><code>Parallel Prompts</code> number of prompts / slots that can happen in parallel (default: -1 = number of LLMCharacter objects). Note that the context size is divided among the slots.</summary> If you want to retain as much context for the LLM and don't need all the characters present at the same time, you can set this number and specify the slot for each LLMCharacter object.
e.g. Setting `Parallel Prompts` to 1 and slot 0 for all LLMCharacter objects will use the full context, but the entire prompt will need to be computed (no caching) whenever a LLMCharacter object is used for chat. </details>
- `Dont Destroy On Load` select to not destroy the LLM GameObject when loading a new Scene

</details>

## Server Security Settings

- `API key` API key to use to allow access to requests from LLMCharacter objects (if `Remote` is set)
- <details><summary>Advanced options</summary>

- `Load SSL certificate` allows to load a SSL certificate for end-to-end encryption of requests (if `Remote` is set). Requires SSL key as well.
- `Load SSL key` allows to load a SSL key for end-to-end encryption of requests (if `Remote` is set). Requires SSL certificate as well.
- `SSL certificate path` the SSL certificate used for end-to-end encryption of requests (if `Remote` is set).
- `SSL key path` the SSL key used for end-to-end encryption of requests (if `Remote` is set).

</details>

## 🤗 Model Settings
- `Download model` click to download one of the default models
- `Load model` click to load your own model in .gguf format
- `Download on Start` enable to downloaded the LLM models the first time the game starts. Alternatively the LLM models wil be copied directly in the build
- <details><summary><code>Context Size</code> size of the prompt context (0 = context size of the model)</summary> This is the number of tokens the model can take as input when generating responses. Higher values use more RAM or VRAM (if using GPU). </details>

- <details><summary>Advanced options</summary>

- `Download lora` click to download a LoRA model in .gguf format
- `Load lora` click to load a LoRA model in .gguf format
- `Batch Size` batch size for prompt processing (default: 512)
- `Model` the path of the model being used (relative to the Assets/StreamingAssets folder)
- `Chat Template` the chat template being used for the LLM
- `Lora` the path of the LoRAs being used (relative to the Assets/StreamingAssets folder)
- `Lora Weights` the weights of the LoRAs being used
- `Flash Attention` click to use flash attention in the model (if `Use extras` is enabled)

</details>

## LLMCharacter Settings

- `Show/Hide Advanced Options` Toggle to show/hide advanced options from below
- `Log Level` select how verbose the log messages are
- `Use extras` select to install and allow the use of extra features (flash attention and IQ quants)

## 💻 Setup Settings
<div>
<img width="300" src=".github/LLMCharacter_GameObject.png" align="right"/>
</div>

- `Remote` whether the LLM used is remote or local
- `LLM` the LLM GameObject (if `Remote` is not set)
- `Hort` ip of the LLM server (if `Remote` is set)
- `Port` port of the LLM server (if `Remote` is set)
- `Num Retries` number of HTTP request retries from the LLM server (if `Remote` is set)
- `API key` API key of the LLM server (if `Remote` is set)
- <details><summary><code>Save</code> save filename or relative path</summary> If set, the chat history and LLM state (if save cache is enabled) is automatically saved to file specified. <br> The chat history is saved with a json suffix and the LLM state with a cache suffix. <br> Both files are saved in the [persistentDataPath folder of Unity](https://docs.unity3d.com/ScriptReference/Application-persistentDataPath.html).</details>
- `Save Cache` select to save the LLM state along with the chat history. The LLM state is typically around 100MB+.
- `Debug Prompt` select to log the constructed prompts in the Unity Editor

## 🗨️ Chat Settings
- `Player Name` the name of the player
- `AI Name` the name of the AI
- `Prompt` description of the AI role

## 🤗 Model Settings
- `Stream` select to receive the reply from the model as it is produced (recommended!).<br>
If it is not selected, the full reply from the model is received in one go
- <details><summary><code>Num Predict</code> maximum number of tokens to predict (default: 256, -1 = infinity, -2 = until context filled)</summary>This is the maximum amount of tokens the model will maximum predict. When N tokens are reached the model will stop generating. This means words / sentences might not get finished if this is too low. </details>

- <details><summary>Advanced options</summary>

- `Load grammar` click to load a grammar in .gbnf format
- `Grammar` the path of the grammar being used (relative to the Assets/StreamingAssets folder)
- <details><summary><code>Cache Prompt</code> save the ongoing prompt from the chat (default: true)</summary> Saves the prompt while it is being created by the chat to avoid reprocessing the entire prompt every time</details>
- `Slot` slot of the server to use for computation. Value can be set from 0 to `Parallel Prompts`-1 (default: -1 = new slot for each character)
- `Seed` seed for reproducibility. For random results every time use -1
- <details><summary><code>Temperature</code> LLM temperature, lower values give more deterministic answers (default: 0.2)</summary>The temperature setting adjusts how random the generated responses are. Turning it up makes the generated choices more varied and unpredictable. Turning it down makes the generated responses more predictable and focused on the most likely options.</details>
- <details><summary><code>Top K</code> top-k sampling (default: 40, 0 = disabled)</summary>The top k value controls the top k most probable tokens at each step of generation. This value can help fine tune the output and make this adhere to specific patterns or constraints.</details>
- <details><summary><code>Top P</code> top-p sampling (default: 0.9, 1.0 = disabled)</summary>The top p value controls the cumulative probability of generated tokens. The model will generate tokens until this theshold (p) is reached. By lowering this value you can shorten output & encourage / discourage more diverse outputs.</details>
- <details><summary><code>Min P</code> minimum probability for a token to be used (default: 0.05)</summary> The probability is defined relative to the probability of the most likely token.</details>
- <details><summary><code>Repeat Penalty</code> control the repetition of token sequences in the generated text (default: 1.1)</summary>The penalty is applied to repeated tokens.</details>
- <details><summary><code>Presence Penalty</code> repeated token presence penalty (default: 0.0, 0.0 = disabled)</summary> Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.</details>
- <details><summary><code>Frequency Penalty</code> repeated token frequency penalty (default: 0.0, 0.0 = disabled)</summary> Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.</details>
- `Typical P`: enable locally typical sampling with parameter p (default: 1.0, 1.0 = disabled).
- `Repeat Last N`: last N tokens to consider for penalizing repetition (default: 64, 0 = disabled, -1 = ctx-size).
- `Penalize Nl`: penalize newline tokens when applying the repeat penalty (default: true).
- `Penalty Prompt`: prompt for the purpose of the penalty evaluation. Can be either `null`, a string or an array of numbers representing tokens (default: `null` = use original `prompt`).
- `Mirostat`: enable Mirostat sampling, controlling perplexity during text generation (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0).
- `Mirostat Tau`: set the Mirostat target entropy, parameter tau (default: 5.0).
- `Mirostat Eta`: set the Mirostat learning rate, parameter eta (default: 0.1).
- `N Probs`: if greater than 0, the response also contains the probabilities of top N tokens for each generated token (default: 0)
- `Ignore Eos`: enable to ignore end of stream tokens and continue generating (default: false).

</details>
7 changes: 7 additions & 0 deletions Options.md.meta

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading