This wiki provides an overview of the folder structure, tools, libraries, and a step-by-step guide on how to set up and run the project. Additionally, it explains the functionality of each file at a high level.
To visualize full implementation, step into h_events_persistence folder.
Youtube Video Walkthrough Playlist
h_events_persistence/
│
├── docker-compose.yml # Docker Compose file for setting up PostgreSQL and Adminer
├── requirements.txt # Python dependencies for the project
├── create_table.py # Script to create the PostgreSQL database table
├── models.py # SQLAlchemy model and database connection setup
├── hooks.py # Prefect hooks for handling flow completion and failure
├── llm_flow.py # Prefect flow to query LLM and store results
├── trigger_flow.py # trigger flows using API
└── README.md # Instructions on how to set up and run the project
- Prefect is a workflow orchestration tool that manages data pipelines.
- Why we use it: It orchestrates flows, retries, and error handling. Version 3.0.4 is used for managing tasks and flows, and integration with PostgreSQL.
- OpenAI's API is used to interact with the LLM (Ollama) model.
- Why we use it: To send questions to an LLM and retrieve responses that we log and store.
- SQLAlchemy is a Python SQL toolkit and Object Relational Mapper (ORM).
- Why we use it: It simplifies the creation and interaction with PostgreSQL databases, enabling us to define models and handle database queries.
- Psycopg2-binary is a PostgreSQL adapter for Python.
- Why we use it: It enables Python to connect and interact with the PostgreSQL database where flow results are stored.
- Docker Compose is a tool used for defining and running multi-container Docker applications.
- Why we use it: To set up PostgreSQL and Adminer services for database management in a containerized environment.
First, create a virtual environment or activate an existing one, then install all dependencies using the requirements.txt file.
pip install -r requirements.txtrequirements.txt:
prefect
OpenAI
sqlalchemy
psycopg2-binary
Use Docker Compose to set up the PostgreSQL and Adminer services for database management.
docker-compose up -dThis will spin up PostgreSQL on port 5432 and Adminer on port 8080. You can access Adminer at http://localhost:8080 to manage the database.
Run the create_table.py script to create the table in PostgreSQL for storing the flow results.
python create_table.pyexport PREFECT_API_URL="http://localhost:4200/api"Run the llm_flow.py script to start the Prefect flow, which queries the LLM model and stores the result in PostgreSQL.
python llm_flow.pyYou can use Adminer to connect to your PostgreSQL database and view the flow results stored in the flow_results table.
- Purpose: Sets up two services—PostgreSQL for storing the flow results and Adminer for managing the database.
- How it works: It defines the PostgreSQL database with user credentials, creates a volume for persistent storage, and exposes the necessary ports for interaction.
- Purpose: Lists all dependencies required for running the project.
- How it works: When you run
pip install -r requirements.txt, all listed libraries, including Prefect, OpenAI, SQLAlchemy, and psycopg2, are installed.
- Purpose: Creates the
flow_resultstable in PostgreSQL. - How it works: It uses SQLAlchemy to define the database schema and creates the table when executed.
- Purpose: Defines the schema of the
flow_resultstable and establishes the database connection. - How it works: SQLAlchemy is used to map the
FlowResultclass to theflow_resultstable in PostgreSQL. It also defines the connection string to connect to the database.
- Purpose: Contains hooks that are triggered when a flow is completed or fails.
- How it works: The hooks extract the flow result and store it in PostgreSQL by interacting with the
FlowResultmodel inmodels.py.
- Purpose: Defines the Prefect flow that sends queries to the LLM and stores the result.
- How it works: This file contains a flow with a task that queries the LLM (Ollama), and the result is stored in PostgreSQL using the hooks defined in
hooks.py.
- Flow Execution: A flow (defined in
llm_flow.py) queries the LLM with a user-defined question. - Flow Result: The LLM returns a response that is processed and logged.
- Result Storage: Hooks trigger on completion or failure of the flow and store the response (or error message) in the
flow_resultstable in PostgreSQL. - Result Management: The results can be viewed and managed using Adminer.