Skip to content

Commit 9d4671b

Browse files
authored
Merge pull request #11 from dataforgoodfr/kotaemon-dev-good-2
Rag system with Kotaemon inside The project "kotaemon" is a git subtree shared with other teams and contains : - libs/kotaemon & libs/ktem forked with all metadatas system ingestion (UI metadatas management in progress) - libs/pipeline_blocks, that is a DataForGood python package to develop common blocks for all teams, linked to the kotaemon ecosystem The 'rag system' folder contains : - the 'kotaemon' folder = a git subtree folder (see above) - 'pipeline scripts' folder with custom scripts for ingestion/retrieve pipelines - a fast docker compose set up with all the services neeeded - documentations...
2 parents e074775 + d813c89 commit 9d4671b

File tree

434 files changed

+58862
-3
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

434 files changed

+58862
-3
lines changed

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,4 +164,8 @@ dmypy.json
164164
cython_debug/
165165

166166
# Precommit hooks: ruff cache
167-
.ruff_cache
167+
.ruff_cache
168+
169+
# the flow stuff
170+
.theflow
171+
kotaemon-custom/kotaemon/ktem_app_data

.pre-commit-config.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
repos:
22
- repo: https://github.com/charliermarsh/ruff-pre-commit
33
# Ruff version.
4-
rev: "v0.2.1"
4+
rev: "v0.11.0"
55
hooks:
66
- id: ruff
77
args: [ --fix ]
88
- repo: https://github.com/pre-commit/pre-commit-hooks
9-
rev: v4.3.0
9+
rev: v5.0.0
1010
hooks:
1111
- id: check-merge-conflict
1212
- id: mixed-line-ending
13+
exclude: 'rag_system/kotaemon/libs/kotaemon/.*|rag_system/kotaemon/libs/ktem/.*'

rag_system/.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
kotaemon/libs/taxonomy/* merge=union

rag_system/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.theflow/*
2+
ollama
3+
qdrant_data

rag_system/README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#### Kotaemon Project : a DATA4GOOD git subtree folder
2+
3+
4+
The folder 'kotaemon' is a git subtree folder, that can be synchronized with the 'root' common project : https://github.com/dataforgoodfr/kotaemon#
5+
6+
This 'root' project hosts all the common / generic tools built by Data4Good.
7+
8+
When you make a code change in your 'local' project, all the code alterations that are included in the 'root' project (the 'kotaemon' folder) could be push to contribute to the 'root' subtree project. (by the use of your intermediary branch like "13_ecoskills_version")
9+
10+
Inversely, you can regularly pull the 'root' main/branch changes.
11+
12+
13+
#### Synchronisation commands
14+
15+
To pull the recent changes of your branch / or MAIN branch, from the 'root' common project : https://github.com/dataforgoodfr/kotaemon#
16+
17+
(of course, these are the MAIN branch changes that you commonly want to pull... The changes of your branch should be already shared within you team repo github of the project)
18+
19+
```git subtree pull --prefix=rag_system/kotaemon https://github.com/dataforgoodfr/kotaemon.git [MY_BRANCH or MAIN] --squash```
20+
21+
Contributing to the "root" project :
22+
23+
To contribute within your branch :
24+
25+
```git subtree push --prefix=rag_system/kotaemon https://github.com/dataforgoodfr/kotaemon.git [MY_BRANCH]```
26+
27+
Replace [MY_BRANCH] with your branch version : 13_ecoskills_version, 13_sufficiency_version ...
28+
29+
And if the changes are very generics => Do a Merge Request to the MAIN branch with GitHub.
30+
31+
Be Careful ! All changes must be 'generics' to satisfy all projects ... (or explicitly written as '_example' )
32+
33+
... Please, Exclude "taxonomy" libs from your MR !...
34+
35+
36+
###### Little Tips
37+
38+
Loccaly, you can create a git alias for the commands above ... Example (with the main branch of the subtree Project) for the pull command :
39+
40+
```git config --global alias.st-[MY_BRANCH or MAIN]-pull 'subtree pull --prefix=rag_system/kotaemon https://github.com/dataforgoodfr/kotaemon.git [MY_BRANCH or MAIN] --squash' ```
41+
42+
and for the push command :
43+
44+
```git config --global alias.st-[MY_BRANCH or MAIN]-push 'subtree push --prefix=rag_system/kotaemon https://github.com/dataforgoodfr/kotaemon.git [MY_BRANCH] ' ```
45+
46+
Now, you can use these alias :
47+
48+
49+
```git st-[MY_BRANCH]-pull ```
50+
51+
or
52+
53+
```git st-[MY_BRANCH]-push ```

rag_system/docker-compose.yml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
services:
2+
kotaemon:
3+
build:
4+
context: ./kotaemon
5+
target: full
6+
pull_policy: if_not_present
7+
entrypoint: ["/bin/sh", "-c", "pip install -e /app/taxonomy && tail -f /dev/null"]
8+
environment:
9+
GRADIO_SERVER_NAME: 0.0.0.0
10+
GRADIO_SERVER_PORT: 7860
11+
OLLAMA_DEPLOYMENT: docker
12+
VECTOR_STORE_DEPLOYMENT: docker
13+
# PDF_FOLDER:
14+
ports:
15+
- '7860:7860'
16+
volumes:
17+
- './kotaemon/flowsettings.py:/app/flowsettings.py'
18+
- './kotaemon/libs:/app/libs'
19+
- './kotaemon/ktem_app_data:/app/ktem_app_data'
20+
- './pipeline_scripts/:/app/pipeline_scripts'
21+
- './taxonomy/:/app/taxonomy'
22+
depends_on:
23+
- ollama
24+
- qdrant
25+
26+
27+
ollama:
28+
image : ollama/ollama
29+
container_name: ollama
30+
volumes:
31+
- './ollama/:/root/.ollama'
32+
ports:
33+
- '11434:11434'
34+
deploy:
35+
resources:
36+
reservations:
37+
devices:
38+
- driver: nvidia
39+
count: all
40+
capabilities: [gpu]
41+
42+
qdrant:
43+
image: qdrant/qdrant:latest
44+
restart: always
45+
container_name: qdrant
46+
ports:
47+
- 6333:6333
48+
- 6334:6334
49+
expose:
50+
- 6333
51+
- 6334
52+
- 6335
53+
volumes:
54+
- ./qdrant_data:/qdrant/storage
55+
56+

rag_system/kotaemon/.commitlintrc

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
{
2+
"extends": ["@commitlint/config-conventional"],
3+
"defaultIgnores": true,
4+
"rules": {
5+
"body-leading-blank": [1, "always"],
6+
"body-max-line-length": [2, "always", 100],
7+
"footer-leading-blank": [1, "always"],
8+
"footer-max-line-length": [2, "always", 10000],
9+
"header-max-length": [2, "always", 200],
10+
"subject-case": [
11+
2,
12+
"never",
13+
[]
14+
],
15+
"subject-empty": [2, "never"],
16+
"subject-full-stop": [2, "never", "."],
17+
"type-case": [2, "always", "lower-case"],
18+
"type-empty": [2, "never"],
19+
"type-enum": [
20+
2,
21+
"always",
22+
[
23+
"build",
24+
"chore",
25+
"ci",
26+
"docs",
27+
"feat",
28+
"fix",
29+
"perf",
30+
"refactor",
31+
"revert",
32+
"style",
33+
"test"
34+
]
35+
]
36+
}
37+
}

rag_system/kotaemon/.dockerignore

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
.github/
2+
.git/
3+
.mypy_cache/
4+
__pycache__/
5+
ktem_app_data/
6+
env/
7+
.pre-commit-config.yaml
8+
.commitlintrc
9+
.gitignore
10+
.gitattributes
11+
README.md
12+
*.zip
13+
*.sh
14+
15+
!/launch.sh

rag_system/kotaemon/.env.example

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# this is an example .env file, use it to create your own .env file and place it in the root of the project
2+
3+
# settings for OpenAI
4+
OPENAI_API_BASE=https://api.openai.com/v1
5+
OPENAI_API_KEY=<YOUR_OPENAI_KEY>
6+
OPENAI_CHAT_MODEL=gpt-4o-mini
7+
OPENAI_EMBEDDINGS_MODEL=text-embedding-3-large
8+
9+
# settings for Azure OpenAI
10+
AZURE_OPENAI_ENDPOINT=
11+
AZURE_OPENAI_API_KEY=
12+
OPENAI_API_VERSION=2024-02-15-preview
13+
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo
14+
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002
15+
16+
# settings for Cohere
17+
COHERE_API_KEY=<COHERE_API_KEY>
18+
19+
# settings for local models
20+
LOCAL_MODEL=qwen2.5:7b
21+
LOCAL_MODEL_EMBEDDINGS=nomic-embed-text
22+
23+
# settings for GraphRAG
24+
GRAPHRAG_API_KEY=<YOUR_OPENAI_KEY>
25+
GRAPHRAG_LLM_MODEL=gpt-4o-mini
26+
GRAPHRAG_EMBEDDING_MODEL=text-embedding-3-small
27+
28+
# set to true if you want to use customized GraphRAG config file
29+
USE_CUSTOMIZED_GRAPHRAG_SETTING=false
30+
31+
# settings for Azure DI
32+
AZURE_DI_ENDPOINT=
33+
AZURE_DI_CREDENTIAL=
34+
35+
# settings for Adobe API
36+
# get free credential at https://acrobatservices.adobe.com/dc-integration-creation-app-cdn/main.html?api=pdf-extract-api
37+
# also install pip install "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"
38+
PDF_SERVICES_CLIENT_ID=
39+
PDF_SERVICES_CLIENT_SECRET=
40+
41+
# settings for PDF.js
42+
PDFJS_VERSION_DIST="pdfjs-4.0.379-dist"
43+
44+
# variable for authentication method selection
45+
# for authentication with google leave empty
46+
# for authentication with keycloak :
47+
# AUTHENTICATION_METHOD="KEYCLOAK"
48+
49+
AUTHENTICATION_METHOD=
50+
51+
# settings for keycloak
52+
KEYCLOAK_SERVER_URL=
53+
KEYCLOAK_CLIENT_ID=
54+
KEYCLOAK_REALM=
55+
KEYCLOAK_CLIENT_SECRET=

rag_system/kotaemon/.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.bat text eol=crlf

0 commit comments

Comments
 (0)