Skip to content

Commit 07fb77b

Browse files
ignaciovalle20agtk-nachovcoderabbitai[bot]JPAmorin
authored
Feature/upload documents (#17)
* Implement document management API with upload, retrieval, and deletion features - Added FastAPI endpoints for uploading, retrieving, listing, and deleting documents. - Integrated MinIO for file storage and PostgreSQL for metadata management. - Implemented RabbitMQ for message publishing upon document upload. - Enhanced logging for better traceability and error handling. - Updated project dependencies for database and messaging support. * Remove RagManager * Remove unused RabbitMQ and MinIO service files along with the document worker and related worker initialization. This cleanup eliminates legacy code that is no longer needed in the project. * Ensure sender_type enum is created only if it does not exist in the database to prevent errors during initialization. * Update .gitignore Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * fix all coderabbit comments * Enhance database and RabbitMQ URL handling by URL-encoding credentials to support special characters. Updated related docstrings for clarity. * Remove deprecated PostgreSQL configuration and initialization scripts from DocsManager and RAGManager. This cleanup eliminates legacy database setup files that are no longer in use. * Added trivial coderabbit suggestions. * Refactor environment variable configuration in docker-compose.yml and update README.md to reflect changes. Environment variables for PostgreSQL, RabbitMQ, and MinIO are now set to default values from a .env file, enhancing security and flexibility. Added detailed instructions for initial setup and environment variable management in README.md. * Refactor database connection management by renaming the module from `database_connection` to `db_connection` and updating import statements accordingly. Removed the old `database_connection.py` file and consolidated database initialization and session management in the new `db_connection.py` file. * Refactor DocumentChunk model by updating import statements, enhancing the created_at timestamp default to use datetime.utcnow, and reorganizing the class structure for clarity. Removed redundant code and ensured proper indexing for database performance. * Implement pagination for document listing endpoint and add corresponding response schema. The `list_documents` function now accepts `limit` and `offset` parameters, returning a paginated response with total document count. * Update DocumentChunk model by removing redundant index from primary key definition for clarity and consistency. * Refactor Docker setup and database connection management * Add blank lines for improved readability in multiple files including Dockerfile, SQL initialization, and various Python modules. * Refactor database connection management in chatMessage.py by replacing the session maker with a direct database connection function. This simplifies the code and improves readability. --------- Co-authored-by: Nacho V <ignacio.valle@agrotoken.io> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: JPAmorin <juanpabloamorinjusto@gmail.com>
1 parent 1dbcbe5 commit 07fb77b

29 files changed

+1845
-436
lines changed

.env.example

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# =============================================================================
2+
# DOCKER COMPOSE CONFIGURATION
3+
# =============================================================================
4+
# These values are used by docker-compose.yml to configure the infrastructure services
5+
6+
# PostgreSQL Configuration
7+
POSTGRES_USER=postgres
8+
POSTGRES_PASSWORD=postgres
9+
POSTGRES_DB=vectordb
10+
POSTGRES_PORT=5432
11+
12+
# RabbitMQ Configuration
13+
RABBITMQ_USER=guest
14+
RABBITMQ_PASSWORD=guest
15+
RABBITMQ_HOST=rabbitmq
16+
RABBITMQ_PORT=5672
17+
RABBITMQ_MANAGEMENT_PORT=15672
18+
19+
# MinIO Configuration (S3 compatible storage)
20+
MINIO_ROOT_USER=minioadmin
21+
MINIO_ROOT_PASSWORD=minioadmin

.gitignore

Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,218 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
share/python-wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# PyInstaller
30+
# Usually these files are written by a python script from a template
31+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
32+
*.manifest
33+
*.spec
34+
35+
# Installer logs
36+
pip-log.txt
37+
pip-delete-this-directory.txt
38+
39+
# Unit test / coverage reports
40+
htmlcov/
41+
.tox/
42+
.nox/
43+
.coverage
44+
.coverage.*
45+
.cache
46+
nosetests.xml
47+
coverage.xml
48+
*.cover
49+
*.py.cover
50+
.hypothesis/
51+
.pytest_cache/
52+
cover/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
.pybuilder/
76+
target/
77+
78+
# Jupyter Notebook
79+
.ipynb_checkpoints
80+
81+
# IPython
82+
profile_default/
83+
ipython_config.py
84+
85+
# pyenv
86+
# For a library or package, you might want to ignore these files since the code is
87+
# intended to run in multiple environments; otherwise, check them in:
88+
# .python-version
89+
90+
# pipenv
91+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
93+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
94+
# install all needed dependencies.
95+
# Pipfile.lock
96+
97+
# UV
98+
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
99+
# This is especially recommended for binary packages to ensure reproducibility, and is more
100+
# commonly ignored for libraries.
101+
# uv.lock
102+
103+
# poetry
104+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105+
# This is especially recommended for binary packages to ensure reproducibility, and is more
106+
# commonly ignored for libraries.
107+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108+
# poetry.lock
109+
# poetry.toml
110+
111+
# pdm
112+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
113+
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
114+
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
115+
# pdm.lock
116+
# pdm.toml
117+
.pdm-python
118+
.pdm-build/
119+
120+
# pixi
121+
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
122+
# pixi.lock
123+
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
124+
# in the .venv directory. It is recommended not to include this directory in version control.
125+
.pixi
126+
127+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
128+
__pypackages__/
129+
130+
# Celery stuff
131+
celerybeat-schedule
132+
celerybeat.pid
133+
134+
# Redis
135+
*.rdb
136+
*.aof
137+
*.pid
138+
139+
# RabbitMQ
140+
mnesia/
141+
rabbitmq/
142+
rabbitmq-data/
143+
144+
# ActiveMQ
145+
activemq-data/
146+
147+
# SageMath parsed files
148+
*.sage.py
149+
150+
# Environments
1151
.env
152+
.envrc
153+
.venv
154+
env/
155+
venv/
156+
ENV/
157+
env.bak/
158+
venv.bak/
159+
160+
# Spyder project settings
161+
.spyderproject
162+
.spyproject
163+
164+
# Rope project settings
165+
.ropeproject
166+
167+
# mkdocs documentation
168+
/site
169+
170+
# mypy
171+
.mypy_cache/
172+
.dmypy.json
173+
dmypy.json
174+
175+
# Pyre type checker
176+
.pyre/
177+
178+
# pytype static type analyzer
179+
.pytype/
180+
181+
# Cython debug symbols
182+
cython_debug/
183+
184+
# PyCharm
185+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
186+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
187+
# and can be added to the global gitignore or merged into this file. For a more nuclear
188+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
189+
# .idea/
190+
191+
# Abstra
192+
# Abstra is an AI-powered process automation framework.
193+
# Ignore directories containing user credentials, local state, and settings.
194+
# Learn more at https://abstra.io/docs
195+
.abstra/
196+
197+
# Visual Studio Code
198+
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
199+
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
200+
# and can be added to the global gitignore or merged into this file. However, if you prefer,
201+
# you could uncomment the following to ignore the entire vscode folder
202+
# .vscode/
203+
204+
# Ruff stuff:
205+
.ruff_cache/
206+
207+
# PyPI configuration file
208+
.pypirc
209+
210+
# Marimo
211+
marimo/_static/
212+
marimo/_lsp/
213+
__marimo__/
214+
215+
# Streamlit
216+
.streamlit/secrets.toml
217+
.env
218+

Dockerfile.pgvector

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
FROM pgvector/pgvector:pg16
2+
COPY ./init.sql /docker-entrypoint-initdb.d/init.sql
3+
4+
5+

DocsManager/.env.example

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# PostgreSQL Configuration
2+
POSTGRES_USER=postgres
3+
POSTGRES_PASSWORD=postgres
4+
POSTGRES_DB=vectordb
5+
POSTGRES_HOST=localhost
6+
POSTGRES_PORT=5432
7+
8+
# RabbitMQ Configuration
9+
RABBITMQ_USER=guest
10+
RABBITMQ_PASSWORD=guest
11+
RABBITMQ_HOST=localhost
12+
RABBITMQ_PORT=5672
13+
RABBITMQ_MANAGEMENT_PORT=15672
14+
15+
# MinIO Configuration
16+
MINIO_ENDPOINT=localhost:9000
17+
MINIO_ACCESS_KEY=minioadmin
18+
MINIO_SECRET_KEY=minioadmin
19+
MINIO_BUCKET=goland-bucket
20+
MINIO_USE_SSL=false
21+
22+
# OpenAI Configuration
23+
OPENAI_API_KEY=your-openai-api-key-here
24+
25+
# Chunking Configuration
26+
CHUNK_SIZE=1000
27+
CHUNK_OVERLAP=200
28+
29+
# =============================================================================
30+
# IMPORTANT NOTES
31+
# =============================================================================
32+
# 1. For local development (app running outside Docker):
33+
# - Set POSTGRES_HOST=localhost
34+
# - Set RABBITMQ_HOST=localhost
35+
# - Set MINIO_ENDPOINT=localhost:9000
36+
# - Set MINIO_USE_SSL=false
37+
#
38+
# 2. For app running inside Docker:
39+
# - Set POSTGRES_HOST=postgres
40+
# - Set RABBITMQ_HOST=rabbitmq
41+
# - Set MINIO_ENDPOINT=minio:9000
42+
#
43+
# 3. For production/external services:
44+
# - Use proper external endpoints (e.g., https://minio.yoursite.com)
45+
# - Set MINIO_USE_SSL=true
46+
# - Use strong, random passwords
47+
# - Add your actual OPENAI_API_KEY

DocsManager/app/api/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# API package
2+
3+
4+
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Routes package
2+
3+
4+

0 commit comments

Comments
 (0)