update readme

magurh · magurh · commit c324f5df7ba9 · 2025-02-22T20:20:13.000Z
diff --git a/README.md b/README.md
@@ -5,10 +5,10 @@
 
 Flare AI SDK for Consensus Learning.
 
-### 🚀 Key Features
+## 🚀 Key Features
 
 - **Consensus Learning Implementation**
-  A Python implementation of single-node, multi-model Consensus Learning (CL). CL is a decentralized ensemble learning paradigm introduced in [arXiv:2402.16157](https://arxiv.org/abs/2402.16157).
+  A Python implementation of single-node, multi-model Consensus Learning (CL). CL is a decentralized ensemble learning paradigm introduced in [arXiv:2402.16157](https://arxiv.org/abs/2402.16157), which is now being generalized to large language models (LLMs).
 
 - **300+ LLM Support**
   Leverages OpenRouter to access over 300 models via a unified interface.
@@ -122,18 +122,18 @@ Deploy on a [Confidential Space](https://cloud.google.com/confidential-computing
 
 ### Prerequisites
 
-- **Google Cloud Platform Account:**  
+- **Google Cloud Platform Account:**
   Access to the [`verifiable-ai-hackathon`](https://console.cloud.google.com/welcome?project=verifiable-ai-hackathon) project is required.
 
-- **OpenRouter API Key:**  
+- **OpenRouter API Key:**
   Ensure your [OpenRouter API key](https://openrouter.ai/settings/keys) is in your `.env`.
 
-- **gcloud CLI:**  
+- **gcloud CLI:**
   Install and authenticate the [gcloud CLI](https://cloud.google.com/sdk/docs/install).
 
 ### Environment Configuration
 
-1. **Set Environment Variables:**  
+1. **Set Environment Variables:**
    Update your `.env` file with:
 
    ```bash
@@ -217,28 +217,27 @@ If you encounter issues, follow these steps:
    gcloud compute instances get-serial-port-output $INSTANCE_NAME --project=verifiable-ai-hackathon
    ```
 
-2. **Verify API Key(s):**  
+2. **Verify API Key(s):**
    Ensure that all API Keys are set correctly (e.g. `OPEN_ROUTER_API_KEY`).
 
-3. **Check Firewall Settings:**  
+3. **Check Firewall Settings:**
    Confirm that your instance is publicly accessible on port `80`.
 
 ## 💡 Next Steps
 
 - **Security & TEE Integration:**
   - Ensure execution within a Trusted Execution Environment (TEE) to maintain confidentiality and integrity.
-- **Factual correctness**:
+- **Factual Correctness**:
   - In line with the main theme of the hackathon, one important aspect of the outputs generated by the LLMs is their accuracy. In this regard, producing sources/citations with the answers would lead to higher trust in the setup. Sample prompts that can be used for this purpose can be found in the appendices of [arXiv:2305.14627](https://arxiv.org/pdf/2305.14627), or in [James' Coffee Blog](https://jamesg.blog/2023/04/02/llm-prompts-source-attribution).
   - _Note_: only certain models may be suitable for this purpose, as references generated by LLMs are often inaccurate or not even real!
-- **Prompt engineering**:
+- **Prompt Engineering**:
   - Our approach is very similar to the **Mixture-of-Agents (MoA)** introduced in [arXiv:2406.04692](https://arxiv.org/abs/2406.04692), which uses iterative aggregations of model responses. Ther [github repository](https://github.com/togethercomputer/MoA) does include other examples of prompts that can be used for additional context for the LLMs.
   - New iterations of the consensus learning algorithm could have different prompts for improving the previous responses. In this regard, the _few-shot_ prompting techniques introduced by OpenAI in [arXiv:2005.14165](https://arxiv.org/pdf/2005.14165) work by providing models with a _few_ examples of similar queries and responses in addition to the initial prompt. (See also previous work by [Radford et al.](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).)
   - _Chain of Thought_ prompting techniques are a linear problem solving approach where each step builds upon the previous one. Google's approach in [arXiv:2201.11903](https://arxiv.org/pdf/2201.11903) is to augment each prompt with an additional example and chain of thought for an associated answer. (See the paper for multiple examples.)
-- **Dynamic resource allocation**:
+- **Dynamic resource allocation and Semantic Filters**:
   - An immediate improvement to the current approach would be to use dynamically-adjusted parameters. Namely, the number of iterations and number of models used in the algorithm could be adjusted to the input prompt: _e.g._ simple prompts do not require too many resources. For this, a centralized model could be used to decide the complexity of the task, prior to sending the prompt to the other LLMs.
-  - On a similar note, the number of iterations for making progress could adjusted according to how _different_ are the model responses. While semantic entailment for LLM outputs is a notoriously difficult topic, the use of [LLM-as-a-Judge](https://arxiv.org/pdf/2306.05685) for evaluating other LLM outputs has shown good progress -- see also this [Confident AI blogpost](https://www.confident-ai.com/blog/why-llm-as-a-judge-is-the-best-llm-evaluation-method).
-- **Semantic filters**:
-  - In line with the previously mentioned LLM-as-a-Judge, a model could potentially be used for filtering _bad_ responses.
-  - LLM-Blender, for instance, introduced in [arXiv:2306.02561](https://arxiv.org/abs/2306.02561), uses a PairRanker that achieves a ranking of outputs through pairwise comparisons via a _cross-attention encoder_.
+  - On a similar note, the number of iterations for making progress could adjusted according to how _different_ are the model responses. Semantic entailment for LLM outputs is an active field of research, but a rather quick solution is to rely on _embeddings_. [TBC]
+   the use of [LLM-as-a-Judge](https://arxiv.org/pdf/2306.05685) for evaluating other LLM outputs has shown good progress -- see also this [Confident AI blogpost](https://www.confident-ai.com/blog/why-llm-as-a-judge-is-the-best-llm-evaluation-method).
+  - In line with the previously mentioned LLM-as-a-Judge, a model could potentially be used for filtering _bad_ responses. LLM-Blender, for instance, introduced in [arXiv:2306.02561](https://arxiv.org/abs/2306.02561), uses a PairRanker that achieves a ranking of outputs through pairwise comparisons via a _cross-attention encoder_.
 - **AI Agent Swarm**:
   - The structure of the reference CL implementation can be changed to adapt _swarm_-type algorithms, where tasks are broken down and distributed among specialized agents for parallel processing. In this case a centralized LLM would act as an orchestrator for managing distribution of tasks -- see _e.g._ [swarms repo](https://github.com/kyegomez/swarms).