Skip to content

Commit a7eb6f4

Browse files
committed
feat: Add source harvester for various data sources
- Introduced a new configuration file `config.example.yaml` for source harvester settings. - Implemented fetchers for Azure DevOps Wiki, GitHub repositories, HTML documents, and Hugging Face datasets. - Created a base fetcher class to standardize fetching logic across different sources. - Added a pipeline to manage the fetching process and store the results in a JSONL format. - Developed a training matrix generator for Hugging Face datasets, including scoring and prioritization logic. - Included example YAML and Markdown files for configuration and output formats. - Established a SQLite database for state management of fetched documents. - Updated dependencies in `requirements.txt` and `pyproject.toml` for necessary libraries.
1 parent 3d6ccfc commit a7eb6f4

37 files changed

Lines changed: 2374 additions & 380 deletions

.continue/agents/new-config.yaml

Lines changed: 31 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,27 +3,31 @@ version: 1.0.0
33
schema: v1
44

55
models:
6-
- name: "Remote Agent (codestral, no-tools)"
6+
- name: "Remote Agent (gemma4)"
77
provider: ollama
88
apiBase: "http://192.168.178.106:11434"
9-
model: "codestral:latest"
9+
model: "gemma4:latest"
10+
capabilities:
11+
- tool_use
12+
- image_input
1013
roles:
1114
- chat
1215
chatOptions:
1316
baseAgentSystemMessage: |
1417
You are a senior coding agent for ThemisDB.
1518
Prioritize correctness, small safe patches, and verification steps.
16-
This model does not support tool calling on this Ollama instance.
17-
If a task requires external tools, clearly state the limitation and propose the safest fallback.
19+
Use tools when available and keep responses grounded in retrieved context.
1820
completionOptions:
1921
temperature: 0.2
2022
topP: 0.9
2123
maxTokens: 4096
2224

23-
- name: "Remote Coding (codestral)"
25+
- name: "Remote Coding (qwen2.5-coder-14b)"
2426
provider: ollama
2527
apiBase: "http://192.168.178.106:11434"
26-
model: "codestral:latest"
28+
model: "qwen2.5-coder:14b"
29+
capabilities:
30+
- tool_use
2731
roles:
2832
- edit
2933
- apply
@@ -33,6 +37,19 @@ models:
3337
topP: 0.9
3438
maxTokens: 4096
3539

40+
- name: "Remote Deep Coding (deepseek-coder-v2-16b)"
41+
provider: ollama
42+
apiBase: "http://192.168.178.106:11434"
43+
model: "deepseek-coder-v2:16b"
44+
roles:
45+
- edit
46+
- apply
47+
- summarize
48+
completionOptions:
49+
temperature: 0.1
50+
topP: 0.9
51+
maxTokens: 6144
52+
3653
- name: "Remote Autocomplete (codestral)"
3754
provider: ollama
3855
apiBase: "http://192.168.178.106:11434"
@@ -46,11 +63,11 @@ models:
4663
onlyMyCode: true
4764

4865
# Prepared profiles (activate after model download + quick validation)
49-
# 1) Agent candidate (enable only if /api/chat tools test succeeds)
50-
# - name: "Remote Agent (gemma4)"
66+
# 1) Additional agent candidate
67+
# - name: "Remote Agent (qwen2.5-coder-14b)"
5168
# provider: ollama
5269
# apiBase: "http://192.168.178.106:11434"
53-
# model: "gemma4:latest"
70+
# model: "qwen2.5-coder:14b"
5471
# capabilities:
5572
# - tool_use
5673
# roles:
@@ -64,11 +81,11 @@ models:
6481
# topP: 0.9
6582
# maxTokens: 4096
6683

67-
# 2) Strong coding profile
68-
# - name: "Remote Coding (qwen2.5-coder-14b)"
84+
# 2) Fallback coding profile
85+
# - name: "Remote Coding (codestral)"
6986
# provider: ollama
7087
# apiBase: "http://192.168.178.106:11434"
71-
# model: "qwen2.5-coder:14b"
88+
# model: "codestral:latest"
7289
# roles:
7390
# - edit
7491
# - apply
@@ -78,18 +95,14 @@ models:
7895
# topP: 0.9
7996
# maxTokens: 4096
8097

81-
# 3) Deep reasoning coding profile
82-
# - name: "Remote Deep Coding (deepseek-coder-v2-16b)"
98+
# 3) Deep coding fallback profile
99+
# - name: "Remote Deep Coding Fallback (deepseek-coder-v2-16b)"
83100
# provider: ollama
84101
# apiBase: "http://192.168.178.106:11434"
85102
# model: "deepseek-coder-v2:16b"
86103
# roles:
87104
# - edit
88105
# - apply
89-
# completionOptions:
90-
# temperature: 0.1
91-
# topP: 0.9
92-
# maxTokens: 6144
93106

94107
# 4) Fast fallback chat/autocomplete profile
95108
# - name: "Remote Fast (llama3.1-8b)"
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# ThemisDB as Copilot Backend: Graph-RAG + MCP + Re-Ranking
2+
3+
## Scope
4+
- Create a practical architecture analysis for integrating ThemisDB as coding-assistant backend in VS Code/Copilot.
5+
- Include current state (IST), target state (SOLL), gap analysis, and implementation deep-dive.
6+
- Assume ThemisDB can host local inference models (llama.cpp family, CodeLlama, DeepSeek, Gemma).
7+
8+
## Inputs Used
9+
- ARCHITECTURE.md
10+
- docs/architecture/FEATURE_FLAGS_REFERENCE.md
11+
- include/llama_cpp/*
12+
- include/search/llm_reranker.h + README
13+
- include/rag/reranker.h + src/rag/reranker.cpp
14+
- tools/copilot-ollama-router/*
15+
- audit/docs/implementation-history/LLM_LORA_QLORA_INTEGRATION_AUDIT.md
16+
17+
## Deliverable Outline
18+
1. Executive summary
19+
2. IST architecture map
20+
3. SOLL architecture map
21+
4. SOLL-IST comparison table
22+
5. Deep-dive by capability
23+
6. Security and governance model
24+
7. Rollout phases and acceptance criteria
25+
8. Risks and mitigations

deployment/docker-compose.yml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
version: '3.8'
2+
3+
services:
4+
# 1. Database Service
5+
db:
6+
image: postgres:14-alpine
7+
container_name: themisdb_postgres
8+
environment:
9+
POSTGRES_USER: user
10+
POSTGRES_PASSWORD: password
11+
POSTGRES_DB: themisdb
12+
volumes:
13+
# Persist data volume for the database
14+
- db_data:/var/lib/postgresql/data
15+
expose:
16+
- "5432"
17+
18+
# 2. Authentication Service (Handles JWT generation and user auth)
19+
auth_api:
20+
build: ./services/auth_api # Path relative to the root of the project, or within a subfolder Dockerfile should expect
21+
container_name: themisdb_auth_service
22+
environment:
23+
DATABASE_URL: postgresql://user:password@db:5432/themisdb
24+
JWT_SECRET_KEY: ${JWT_SECRET_KEY} # Loaded from .env and passed through docker-compose
25+
ALGORITHM: HS256
26+
depends_on:
27+
- db
28+
ports:
29+
# Expose port for external testing, if needed. Usually best left internal.
30+
- "8001:8001"
31+
volumes:
32+
# Optional: For hot reloading during development
33+
- ./services/auth_api:/app
34+
35+
# 3. Main API Gateway / Gateway Service (Handles request routing and protected routes)
36+
main_gateway:
37+
build: . # Assuming main project structure is the root directory
38+
container_name: themisdb_main_gateway
39+
environment:
40+
DATABASE_URL: postgresql://user:password@db:5432/themisdb
41+
JWT_SECRET_KEY: ${JWT_SECRET_KEY}
42+
AUTH_SERVICE_URL: http://auth_api:8001 # Use service name as hostname
43+
depends_on:
44+
- auth_api
45+
- db
46+
ports:
47+
# Expose the main facing port
48+
- "80:80"
49+
volumes:
50+
- .:/app # Assuming this maps the entire codebase for development
51+
52+
volumes:
53+
db_data:

0 commit comments

Comments
 (0)