Skip to content

RAG & Text Embedding#84

Draft
mhughes2k wants to merge 21 commits intobycs-lp:mainfrom
mhughes2k:RAG
Draft

RAG & Text Embedding#84
mhughes2k wants to merge 21 commits intobycs-lp:mainfrom
mhughes2k:RAG

Conversation

@mhughes2k
Copy link
Copy Markdown

Following conversations with Peter at Moot Global 25, I started looking at an approach to implement the key steps for RAG (embedding and retrieval) as "purposes" that would be available.

So this is some really early pre-cursor work trying to understand the bycs-lp model and concepts.

At the moment this is "scaffolding", the back-end parts of these are not yet implemented, and will require much more work, but in following the manager's models, I have also been able to update my "reference" activity chat module (https://github.com/mhughes2k/moodle-mod_xaichat/tree/ai_manager_version) to use both the AI Manager as it's AI provider, but also linking to call to these 2 new purposes to perform RAG.

The "RAG" purpose at the moment simply returns exactly 1 "document" (that lies that "yellow is a shade of blue"), but it's enough to see that the orchestration of these purposes should work.

This seems to be about 9 lines of code (once migrated to AI manager):

// $data is a stdClass object with the data from an mform.

$chatmanager = new \local_ai_manager\manager('chat');
$embeddingmanager = new \local_ai_manager\manager('embedding');
$ragmanager = new \local_ai_manager\manager('rag');
[...]
$embeddingrequest = $embeddingmanager->perform_request($data->userprompt, 'local_xaichat', $modulecontext->id);
$embedding = $embeddingrequest->get_content();

/*
* Docs are returned as simple string:
* Title: Document Title
* URL: https://someurl
* document / fragement content
*
* The RAG purpose would basically implement dynamic access checking on the resulting documents from the underlying
* store *before* returning them back out to a developer/user.
*/
$ragrequest = $ragmanager->perform_request($embedding, 'local_xaichat', $modulecontext->id);
$docs = $ragrequest->get_content(); 
$prompt = $data->userprompt;
    if (!empty($docs)) {
        debugging("Got RAG content returned:" . $docs);
        $prompt = "Use the following information\n\n{$docs} to answer: \n\n{$prompt}";
    }
[...]
$response = $chatmanager->perform_request(
    $prompt,
    'mod_xaichat',
    $modulecontext->id
);
    
$result = $response->get_content();

// Do stuff with the AI synthesised response.

Anyway I thought I'd open this draft PR early to allow for a conversation about this.

One issue I've already discovered, relating to the text-embedding, is that Open AI has a different endpoint URL to text embedding vs the endpoint that is currently encoded into the chatgpt tool (around about

) and so I'm not sure how to approach this aspect.

I'd originally thought that this could simply leverage the existing "tools" plugins, but does this mean each tool needs to support a slightly different option/configuration for this extra action it can do.

Also I don't know what this would be like across all the AI tools or if they all approach, specifically the text-embedding process, differently.

My next steps is to start looking at what back-end engineering is necessary to implement text-embedding, and the more complicated aspect relating to the indexing, processing and storing of the vector data from Moodle content. (On this last one it did occur to me that text-embedding for the user-prompt and for the vectors needs to be the same...so having them as separate purposes could creating conflicts through configurations the admins could set up...i.e. they use different embedding models).

Anway, thanks to Peter for his time, I hope this is useful/interesting and please do re-direct me if I'm going off away from the approach or mis-understanding any thing (also the dev docs branch was really useful!)

Regards

Michael

@PhMemmel
Copy link
Copy Markdown
Member

Hi @mhughes2k ,

thank you very much for this. I will try to have a look and pick up on your questions hopefully next week!

@PhMemmel
Copy link
Copy Markdown
Member

Also I'm not sure, if you realized #76

I started to create some (dev) docs. Maybe this also already helps.

@mhughes2k
Copy link
Copy Markdown
Author

Also I'm not sure, if you realized #76

I started to create some (dev) docs. Maybe this also already helps.

Yes, those were the docs I was meaning! Peter pointed me at them, it's really helpful. If there's anything I can contribute to those too, or you want an external person to look through happy to do so!

@PhMemmel
Copy link
Copy Markdown
Member

oh yes, sorry, I totally missed the line you were mentioning in your post!

@mhughes2k
Copy link
Copy Markdown
Author

After doing some more work I fudged getting text-embedding to work with the OpenAI plugin.
This had a few quirks and "hacks", since the endpoint part of the connector objects presumes that the end-point URL is the same, but it's fundamentally different for Text Embedding vs Chat Completion...

At the same time, I'm not necessarily sure that having a 2nd connector implementation for each AI Provider to simply have a different URL that it's calling is a good idea...so I've done some internal logic switching to offer an "end point" choice.

I think the limitation is due to the concept (correct me if I'm wrong) that a connector offers different models but they all have to go to the same end point...

So this allowed me to create an Open AI provider A for doing with purpose for doing chat completion and B with purpose for doing text-embedding. As far as I could see getting 1 provider to do both purposes would mean the connector has to know about the purpose more than I'd like.

Making these changes, I've got a rudimentary function to do a perform_request() call against an "embedding" purpose, and get back a string representation of the vector (you can see this working in https://github.com/mhughes2k/moodle-mod_xaichat/blob/ai_manager_version/view.php around about L123).

RAG however became more problematic, because this isn't actually an LLM interaction at all...it's a Vector DB search, it just needs to be a $"somevectordbplugin"->search($embedding); call, but with it's own eco-system of vectorDBs and indexing etc (which is basically just Moodle Global Search with some bells and whistles)...

However integrating the text-embedding function & vector DB search into the same purpose (and of course text-embedding could be left as a standalone purpose) does now actually seem to make more sense, than having a "just" embedding and "just" vector search purpose...

(hope this makes sense)

@mhughes2k
Copy link
Copy Markdown
Author

OK I have a whole new approach to try out :-) will close this and re-raise once I've got my head around it.

@mhughes2k
Copy link
Copy Markdown
Author

I have done a complete re-work of this. I have implemented the qdrant vectorDB as a backend and as a "tool" class. This is on the basis that it is effectively an HTTP endpoint.

The qdrant aitool can be added via the Ai tools menu:
image
With the following settings form:
image

For testing purposes I create a qdrant instance using docker and access it via the host.docker.internal name (which required relaxing the endpoint validation slightly).

Not entirely sure why but I ended up creating a qdrant collection with multiple vectors using:

PUSH http://localhost:6333/collections/moodle 
{
    "vectors": {
        "contentvector": {
            "size": 1536,
            "distance": "Cosine"
        }
    }
}

In addition I have defined both "(text-)embedding" and "rag" purposes:
image
In this case I've configured text-embedding to use an Open AI back end and I extended that connector to accomodate a 2nd end-point:
image

With all 3 of these implemented, I have currently implemented a very simple test script in the "rag" purpose.

This has 2 functions "store" and "retrieve" (use these values on ?action=XXX"):

Store

This simply places a "document" into the vector db:

    $storeprompt = json_encode([
        'action' => 'store',
        'content' => 'This is a test document. It is only a test document.',
        'metadata' => [
            'title' => 'Test Document',
            'author' => 'Moodle AI Manager',
            'source' => 'Generated',
        ],
    ]);

The perform_request() method will use the chatgpt connector (in this case) to perform a "rag" purpose, with a "store" action.

This is connected to the qdrant aitool, which in turn calls the "embedding" purpose to get the "content" vector, and then store the document and vector in qdrant.

Retrieve

The Retrieve action is simply a test that the same document is found again.

Next steps

  1. Move the "options" for the request out of the prompt and into the request_options. This should make the "prompt" parameter clearer. In the case of a "store" action it should just be "document" or some reference to a "document", and for retrieval it should just be the query from the user.
  2. Clean up debugging code.
  3. Model is a redundant setting on the aitool for qdrant / vector dbs so this should probably get surpressed.
  4. Azure settings do odd things with end-points (probably because aitools "expect" only 1 end point, but I didn't want to have to double the number of connectors to simply add an extra operation), so I had to disable the "freeze" on the endpoint to allow the selector to work...

@PhMemmel
Copy link
Copy Markdown
Member

Hi @mhughes2k ,

thank you so much for all your work. I had a quick look and it looks really promising to me. Regarding your next steps:

  1. I agree. Besides that, if the purpose/call does not need a prompt, it's totally ok if the prompt is just empty and all request data just comes with the request options. The $prompt is really intended for direct user input. If for a call there is no user input to make, but it's all technical information (which should go to the request options) empty prompt is totally fine. Everything else the connector should take care of.
  2. Always great ;-)
  3. Yes, but you will have to make sure that a model in some way is specified, it is being used for logging etc. For example the aitool_option_azure also defines a hardcoded string and removes the model option in the mform, because the model is being configured in the Azure backend.
  4. Yes, I saw the issue about different endpoints. Let me think about how this can be solved easily.

Also, we still have to think about how to make other connectors work. I'm still torn between should we use a separate connector plugin which makes implementation a lot easier and reduce the necessity of "if-else". If you look at the current structure, basically I could also have created a "unified OpenAI connector" being able to serve for text, image and speech generation. But I decided to split it up into 3 connectors aitool_chatgpt, aitool_dalle and aitool_openaitts.

Would it be helpful to make a different connector that only allows embedding? In this case I've already used two basic techniques: Inheriting from chatgpt and just adapt what I need (overwrite endpoint for example). In this case it should also be easy to handle the azure endpoint thing. Second option would be to create a different connector object and pass the calls to it like I did in the telli connector which is basically a wrapper for the chatgpt and dalle connector.

Maybe you're open to discuss some of the points and could show/explain me some more things. I'm gonna reach out to you via matrix.

Looking forward to it! I'm very excited about your code! :)

@mhughes2k
Copy link
Copy Markdown
Author

Absolutely open discuss all the points :-) so please do grab me via Matrix

I did wrestle with the idea of a whole separate connector since the text-embedding is a different "mode", that probably makes most sense, but I did wonder on how much duplication of code also ends up being generated for what seems like it's really just a different end-point and payload form. However for the text-embedding, moving the code out into an aitool_openaite (openai text embedding) plugin makes sense...and deals with (3) in the "best" way, and allows for the same approach to be used for Open AI on Azure text-embedding to provide "extra" configuration.

However for the qdrant connector, as this has loads of different suffixes that are appended on to the base endpoint, I implemented some switching logic to make endpoint generation more dynamic, as I didn't think that a connector for each endpoint it needs to connect to made sense.

The option here would be to define (and I had a different branch that played with this) a new sub-plugin type, like "aidb_", but after I tried it (and duplicated lots of plumbing code to simply manage it), I found that just having it as an "aitool_" plugin worked quite well and was simpler.

Maybe a more "soft" categorisation of "aitools" to distinguish between "AI Provider Connectors" and "Vector DB Backends" is all that's needed, and it would eliminate an extra UI.

Other next steps that I forgot:

  • I'm going to think about content indexing. I'd have liked to have just re-used the Moodle Global Search Indexer, but I think that's too closely coupled to it's own back end DB engines for global search, so I was going replicate some relevant parts and put it under the "control" of the rag purpose. This indexer would then work in tandem with a configured aitool_* plugin that supports the rag purpose to get the "documents" into the backend.

@mhughes2k
Copy link
Copy Markdown
Author

I've put the work I've been doing specifically on "indexing" into a separate branch for the moment: https://github.com/mhughes2k/moodle-local_ai_manager/tree/RAG_indexer

@PM84
Copy link
Copy Markdown
Contributor

PM84 commented Dec 16, 2025

Hello Michael,
I hope you are well!
What is the current status of the RAG development?
I am currently planning for the first half of 2026 and am considering whether and, if so, how many resources we need to allocate for this on our side. :-)
Best regards
Peter

@mhughes2k
Copy link
Copy Markdown
Author

mhughes2k commented Dec 17, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants