add extra and flash attention options to readme

amakropoulos · amakropoulos · commit 8f5679d9501e · 2024-08-21T15:37:32.000+03:00
diff --git a/README.md b/README.md
@@ -345,6 +345,7 @@ If you have loaded a model locally you need to set its URL through the expanded
 
 - `Show/Hide Advanced Options` Toggle to show/hide advanced options from below
 - `Log Level` select how verbose the log messages are
+- `Use extras` select to install and allow the use of extra features (flash attention and IQ quants)
 
 #### 💻 Setup Settings
 
@@ -381,6 +382,7 @@ If the user's GPU is not supported, the LLM will fall back to the CPU
   - `Model` the path of the model being used (relative to the Assets/StreamingAssets folder)
   - `Chat Template` the chat template being used for the LLM
   - `Lora` the path of the LoRA being used (relative to the Assets/StreamingAssets folder)
+  - `Flash Attention` click to use flash attention in the model (if `Use extras` is enabled)
 
 </details>
 
@@ -395,6 +397,7 @@ If the user's GPU is not supported, the LLM will fall back to the CPU
 
 - `Show/Hide Advanced Options` Toggle to show/hide advanced options from below
 - `Log Level` select how verbose the log messages are
+- `Use extras` select to install and allow the use of extra features (flash attention and IQ quants)
 
 #### 💻 Setup Settings
 <div>