This project is the boilerplate for the "Build your own AI-powered internal search engine" workshop for TechSauce Global Summit 2024 on 8th August 2024.
This workshop's audience is developers who have never worked with AI/LLM before. The goal is to introduce the concept of how LLM works, RAG (Retrieval-Augmented Generation), Embeddings. And build a simple search engine and a chatbot using these concepts.
The
mainbranch has incomplete code. The workshop divided into 3 stages. Each stage has its own branch.mainbranch has the starting code for the workshop. Then,pre-workshop-2branch has the completed code from stage 1, and so on. The final code is in thefinalbranch.
There are 4 main components in this project:
- Embedding API - A simple REST API to save text and retrieve text relevant to the input query. Hiding the complexity of embeddings and vector manipulation.
- Search UI - A simple search engine UI to search for text using the Embedding API.
- Chatbot UI - A simple chatbot UI to chat with information retrieved via the Embedding API.
- Retriever - A simple script to scan knowledge base and save the text to the Embedding API.
Each component is its own service and communicates with each other via REST API. The system diagram is as follows:
The folder structure is as follows:
common- Contains the common code used by multiple components. (config, LLM, models)embedding-api,search_web,chatbot,retriever- Contains the code for each component. Theserver.pyfile is the entry point of each.docs- Contains the images and other files used in the README.demo- Contains sample data or prompts.
This workshop primaliry uses OpenAI services, so you need to have an OpenAI account and API key to run this project. But feel free to update the code (common/llm.py) to use any other provider.
Again, to skip the workshop, you can directly go to the final branch to see the final code.
