|
5 | 5 |
|
6 | 6 | Flare AI SDK for Consensus Learning. |
7 | 7 |
|
8 | | -### 🚀 Key Features |
| 8 | +## 🚀 Key Features |
9 | 9 |
|
10 | 10 | - **Consensus Learning Implementation** |
11 | | - A Python implementation of single-node, multi-model Consensus Learning (CL). CL is a decentralized ensemble learning paradigm introduced in [arXiv:2402.16157](https://arxiv.org/abs/2402.16157). |
| 11 | + A Python implementation of single-node, multi-model Consensus Learning (CL). CL is a decentralized ensemble learning paradigm introduced in [arXiv:2402.16157](https://arxiv.org/abs/2402.16157), which is now being generalized to large language models (LLMs). |
12 | 12 |
|
13 | 13 | - **300+ LLM Support** |
14 | 14 | Leverages OpenRouter to access over 300 models via a unified interface. |
@@ -122,18 +122,18 @@ Deploy on a [Confidential Space](https://cloud.google.com/confidential-computing |
122 | 122 |
|
123 | 123 | ### Prerequisites |
124 | 124 |
|
125 | | -- **Google Cloud Platform Account:** |
| 125 | +- **Google Cloud Platform Account:** |
126 | 126 | Access to the [`verifiable-ai-hackathon`](https://console.cloud.google.com/welcome?project=verifiable-ai-hackathon) project is required. |
127 | 127 |
|
128 | | -- **OpenRouter API Key:** |
| 128 | +- **OpenRouter API Key:** |
129 | 129 | Ensure your [OpenRouter API key](https://openrouter.ai/settings/keys) is in your `.env`. |
130 | 130 |
|
131 | | -- **gcloud CLI:** |
| 131 | +- **gcloud CLI:** |
132 | 132 | Install and authenticate the [gcloud CLI](https://cloud.google.com/sdk/docs/install). |
133 | 133 |
|
134 | 134 | ### Environment Configuration |
135 | 135 |
|
136 | | -1. **Set Environment Variables:** |
| 136 | +1. **Set Environment Variables:** |
137 | 137 | Update your `.env` file with: |
138 | 138 |
|
139 | 139 | ```bash |
@@ -217,28 +217,27 @@ If you encounter issues, follow these steps: |
217 | 217 | gcloud compute instances get-serial-port-output $INSTANCE_NAME --project=verifiable-ai-hackathon |
218 | 218 | ``` |
219 | 219 |
|
220 | | -2. **Verify API Key(s):** |
| 220 | +2. **Verify API Key(s):** |
221 | 221 | Ensure that all API Keys are set correctly (e.g. `OPEN_ROUTER_API_KEY`). |
222 | 222 |
|
223 | | -3. **Check Firewall Settings:** |
| 223 | +3. **Check Firewall Settings:** |
224 | 224 | Confirm that your instance is publicly accessible on port `80`. |
225 | 225 |
|
226 | 226 | ## 💡 Next Steps |
227 | 227 |
|
228 | 228 | - **Security & TEE Integration:** |
229 | 229 | - Ensure execution within a Trusted Execution Environment (TEE) to maintain confidentiality and integrity. |
230 | | -- **Factual correctness**: |
| 230 | +- **Factual Correctness**: |
231 | 231 | - In line with the main theme of the hackathon, one important aspect of the outputs generated by the LLMs is their accuracy. In this regard, producing sources/citations with the answers would lead to higher trust in the setup. Sample prompts that can be used for this purpose can be found in the appendices of [arXiv:2305.14627](https://arxiv.org/pdf/2305.14627), or in [James' Coffee Blog](https://jamesg.blog/2023/04/02/llm-prompts-source-attribution). |
232 | 232 | - _Note_: only certain models may be suitable for this purpose, as references generated by LLMs are often inaccurate or not even real! |
233 | | -- **Prompt engineering**: |
| 233 | +- **Prompt Engineering**: |
234 | 234 | - Our approach is very similar to the **Mixture-of-Agents (MoA)** introduced in [arXiv:2406.04692](https://arxiv.org/abs/2406.04692), which uses iterative aggregations of model responses. Ther [github repository](https://github.com/togethercomputer/MoA) does include other examples of prompts that can be used for additional context for the LLMs. |
235 | 235 | - New iterations of the consensus learning algorithm could have different prompts for improving the previous responses. In this regard, the _few-shot_ prompting techniques introduced by OpenAI in [arXiv:2005.14165](https://arxiv.org/pdf/2005.14165) work by providing models with a _few_ examples of similar queries and responses in addition to the initial prompt. (See also previous work by [Radford et al.](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).) |
236 | 236 | - _Chain of Thought_ prompting techniques are a linear problem solving approach where each step builds upon the previous one. Google's approach in [arXiv:2201.11903](https://arxiv.org/pdf/2201.11903) is to augment each prompt with an additional example and chain of thought for an associated answer. (See the paper for multiple examples.) |
237 | | -- **Dynamic resource allocation**: |
| 237 | +- **Dynamic resource allocation and Semantic Filters**: |
238 | 238 | - An immediate improvement to the current approach would be to use dynamically-adjusted parameters. Namely, the number of iterations and number of models used in the algorithm could be adjusted to the input prompt: _e.g._ simple prompts do not require too many resources. For this, a centralized model could be used to decide the complexity of the task, prior to sending the prompt to the other LLMs. |
239 | | - - On a similar note, the number of iterations for making progress could adjusted according to how _different_ are the model responses. While semantic entailment for LLM outputs is a notoriously difficult topic, the use of [LLM-as-a-Judge](https://arxiv.org/pdf/2306.05685) for evaluating other LLM outputs has shown good progress -- see also this [Confident AI blogpost](https://www.confident-ai.com/blog/why-llm-as-a-judge-is-the-best-llm-evaluation-method). |
240 | | -- **Semantic filters**: |
241 | | - - In line with the previously mentioned LLM-as-a-Judge, a model could potentially be used for filtering _bad_ responses. |
242 | | - - LLM-Blender, for instance, introduced in [arXiv:2306.02561](https://arxiv.org/abs/2306.02561), uses a PairRanker that achieves a ranking of outputs through pairwise comparisons via a _cross-attention encoder_. |
| 239 | + - On a similar note, the number of iterations for making progress could adjusted according to how _different_ are the model responses. Semantic entailment for LLM outputs is an active field of research, but a rather quick solution is to rely on _embeddings_. [TBC] |
| 240 | + the use of [LLM-as-a-Judge](https://arxiv.org/pdf/2306.05685) for evaluating other LLM outputs has shown good progress -- see also this [Confident AI blogpost](https://www.confident-ai.com/blog/why-llm-as-a-judge-is-the-best-llm-evaluation-method). |
| 241 | + - In line with the previously mentioned LLM-as-a-Judge, a model could potentially be used for filtering _bad_ responses. LLM-Blender, for instance, introduced in [arXiv:2306.02561](https://arxiv.org/abs/2306.02561), uses a PairRanker that achieves a ranking of outputs through pairwise comparisons via a _cross-attention encoder_. |
243 | 242 | - **AI Agent Swarm**: |
244 | 243 | - The structure of the reference CL implementation can be changed to adapt _swarm_-type algorithms, where tasks are broken down and distributed among specialized agents for parallel processing. In this case a centralized LLM would act as an orchestrator for managing distribution of tasks -- see _e.g._ [swarms repo](https://github.com/kyegomez/swarms). |
0 commit comments