update documentation

magurh · magurh · commit 3c1dc51c739a · 2025-02-25T12:14:18.000Z
diff --git a/.gitignore b/.gitignore
@@ -165,7 +165,7 @@ cython_debug/
 *.pdf
 *.svg
 # *.jpeg
-*.png
+# *.png
 *.bmp
 
 ### VirtualEnv template
diff --git a/README.md b/README.md
@@ -249,8 +249,8 @@ If you encounter issues, follow these steps:
   - _Chain of Thought_ prompting techniques are a linear problem solving approach where each step builds upon the previous one. Google's approach in [arXiv:2201.11903](https://arxiv.org/pdf/2201.11903) is to augment each prompt with an additional example and chain of thought for an associated answer. (See the paper for multiple examples.)
 - **Dynamic resource allocation and Semantic Filters**:
   - An immediate improvement to the current approach would be to use dynamically-adjusted parameters. Namely, the number of iterations and number of models used in the algorithm could be adjusted to the input prompt: _e.g._ simple prompts do not require too many resources. For this, a centralized model could be used to decide the complexity of the task, prior to sending the prompt to the other LLMs.
-  - On a similar note, the number of iterations for making progress could adjusted according to how _different_ are the model responses. Semantic entailment for LLM outputs is an active field of research, but a rather quick solution is to rely on _embeddings_. [TBC]
-   the use of [LLM-as-a-Judge](https://arxiv.org/pdf/2306.05685) for evaluating other LLM outputs has shown good progress -- see also this [Confident AI blogpost](https://www.confident-ai.com/blog/why-llm-as-a-judge-is-the-best-llm-evaluation-method).
+  - On a similar note, the number of iterations for making progress could adjusted according to how _different_ are the model responses. Semantic entailment for LLM outputs is an active field of research, but a rather quick solution is to rely on _embeddings_. These are commonly used in RAG pipelines, and could also be used here with _e.g._ cosine similarity. You can get started with [GCloud's text embeddings](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings) -- see [flare-ai-rag](https://github.com/flare-foundation/flare-ai-rag/tree/main) for more details.
+  - The use of [LLM-as-a-Judge](https://arxiv.org/pdf/2306.05685) for evaluating other LLM outputs has shown good progress -- see also this [Confident AI blogpost](https://www.confident-ai.com/blog/why-llm-as-a-judge-is-the-best-llm-evaluation-method).
   - In line with the previously mentioned LLM-as-a-Judge, a model could potentially be used for filtering _bad_ responses. LLM-Blender, for instance, introduced in [arXiv:2306.02561](https://arxiv.org/abs/2306.02561), uses a PairRanker that achieves a ranking of outputs through pairwise comparisons via a _cross-attention encoder_.
 - **AI Agent Swarm**:
   - The structure of the reference CL implementation can be changed to adapt _swarm_-type algorithms, where tasks are broken down and distributed among specialized agents for parallel processing. In this case a centralized LLM would act as an orchestrator for managing distribution of tasks -- see _e.g._ [swarms repo](https://github.com/kyegomez/swarms).
diff --git a/src/README.md b/src/README.md
@@ -3,6 +3,19 @@
 
 # Flare AI Consensus
 
+## flare-ai-consensus Pipeline
+
+The flare-ai-consensus template consists of the following components:
+
+* **Router:** The primary interface that receives user requests, distributes them to the various AI models, and collects their intermediate responses.
+* **Aggregator:** synthesizes multiple model responses into a single, coherent output.
+* **Consensus Layer:** Defines logic for the consensus algorithm. The reference implementation is setup in the following steps:
+  * The initial prompt is sent to a set of models, with additional system instructions.
+  * Initial responses are aggregated by the Aggregator.
+  * Improvement rounds follow up where aggregated responses are sent as additional context or system instructions to the models.
+
+<img width="500" alt="flare-ai-consensus" src="./cl_pipeline.png" />
+
 ## OpenRouter Clients
 
 We implement two OpenRouter clients for interacting with the OpenRouter API: a standard sync client and an asynchronous client.
diff --git a/src/cl_pipeline.png b/src/cl_pipeline.png