Skip to content

Commit 5aabfc5

Browse files
committed
Initial commit
0 parents  commit 5aabfc5

100 files changed

Lines changed: 18645 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.dockerignore

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# NOTE: It seems surprising, but do not ignore .git.
2+
# The command `natural-language-geocoding init` that runs in the image expects
3+
# to be run from a git repository.
4+
.pytest_cache
5+
.python-version
6+
.ruff_cache
7+
.venv
8+
Dockerfile
9+
Dockerfile.*
10+
cdktf.out
11+
dist

.env.template

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# AWS_PROFILE=kmnlp
2+
AWS_ACCOUNT_ID=SET_ME
3+
# For local testing
4+
FRONTEND_ENDPOINT=http://localhost:5173
5+
6+
# Gdal Warp path
7+
GDAL_WARP_BIN_PATH="/usr/bin/gdalwarp"
8+
9+
# If set, will prefer Geocode API
10+
GEOCODE_INDEX_REGION=us-east-1
11+
GEOCODE_INDEX_HOST=localhost
12+
13+
# Only used if GEOCODE_INDEX_* vars are not eset
14+
NOMINATIM_USER_AGENT="e84_nl_geocoding"
15+
16+
DASK_ADDRESS=tcp://127.0.0.1:8786
17+
18+
# If you want to enable langfuse logging. Get this from the secret or the UI.
19+
# LANGFUSE_HOST=...
20+
# LANGFUSE_SECRET_KEY=...
21+
# LANGFUSE_PUBLIC_KEY=...
22+
23+
# ALLOW_LOCAL_DASK_CLUSTER=true

.github/copilot-instructions.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# KMNLP API - Copilot Instructions
2+
3+
## Architecture Overview
4+
5+
This is a FastAPI-based chat service that provides streaming responses via LLM agents for NOAA Knowledge Management NLP. The service implements a multi-user chat system with SQLite persistence and integrates with the `llm-agent` private dependency for orchestration.
6+
7+
### Key Components
8+
9+
- **FastAPI app** (`app/main.py`): RESTful chat API with CORS, user management, and streaming responses
10+
- **Database layer** (`app/db/`): SQLite-based persistence with Protocol interface for user/chat storage
11+
- **Agent orchestration**: Uses `OrchestrationAgent` from private `llm-agent` library for data search/analysis
12+
- **Type system** (`app/types.py`): Comprehensive Pydantic models with discriminated unions for chat content
13+
- **Utilities** (`app/utils/`): Agent factories, response streaming, and plan management
14+
15+
## Development Workflows
16+
17+
### Local Development
18+
19+
```bash
20+
# FIRST: Activate virtual environment (required for all commands below)
21+
source .venv/bin/activate
22+
23+
# Start development server (hot reload enabled)
24+
fastapi dev app/main.py
25+
26+
# Run all linting checks
27+
scripts/lint.sh # includes shellcheck, ruff, pyright, format check
28+
29+
# Run tests with verbose output
30+
scripts/test.sh # pytest with -vv -rA --log-cli-level=INFO
31+
32+
# Run integration tests separately
33+
scripts/integration_tests.sh # pytest -m integration with verbose logging
34+
35+
# Build everything (package + docker images)
36+
scripts/build.sh # builds both api and dask containers
37+
```
38+
39+
### Testing Patterns
40+
41+
- **Integration tests**: Marked with `@pytest.mark.integration`, require `-m integration` flag
42+
- **Default behavior**: Only unit tests run unless explicitly marked for integration
43+
- **Test structure**: `tests/unit/` and `tests/integration/` mirrors `app/` structure
44+
- **API testing**: Uses `TestClient` from FastAPI for endpoint testing
45+
46+
## Dependency Management
47+
48+
### Python Environment
49+
50+
- **Package manager**: `uv` (modern pip replacement)
51+
- **Python version**: 3.12+ required
52+
- **Virtual environment**: **ALWAYS** activate with `source .venv/bin/activate` before running any scripts or commands
53+
- **Private dependency**: `llm-agent` from GitLab SSH/HTTPS with CI tokens
54+
- **Lock file**: `uv.lock` must be kept in sync with `pyproject.toml`
55+
56+
### Key Dependencies
57+
58+
- `fastapi[standard]>=0.116.1` - Web framework with built-in async support
59+
- `llm-agent[data-processing]` - Private orchestration agent library
60+
- `natural-language-geocoding` - Geocoding functionality for natural language queries
61+
- `e84-geoai-common` - Shared LLM message types and content models (transitive dependency via llm-agent)
62+
63+
## Code Quality Standards
64+
65+
### Linting Configuration
66+
67+
- **Ruff**: Line length 100, Google docstring convention, ALL rules with specific ignores
68+
- **Pyright**: Strict type checking with comprehensive error reporting
69+
- **Pre-commit**: Hooks for automated quality checks
70+
71+
### Critical Ignores
72+
73+
```python
74+
# In tests/: Allow assert statements (S101), missing docstrings (D1)
75+
# In __init__.py: Allow unused imports (F401) for re-exports
76+
# Project-wide: No TO D O author requirements (TD002/TD003)
77+
```
78+
79+
## API Patterns
80+
81+
### Authentication
82+
83+
- **Header-based**: `x-user-id` header required for all authenticated endpoints
84+
- **User management**: In-memory user storage with UUID-based chat sessions
85+
- **Error handling**: Custom HTTPException subclasses (`MissingUserIdError`, etc.)
86+
87+
### Streaming Responses
88+
89+
```python
90+
# Pattern for chat endpoints
91+
@app.post("/chat/{chat_id}", response_class=StreamingResponse)
92+
async def post_to_chat(...) -> StreamingResponse:
93+
# Convert to LLMMessage with TextContent + CachePointContent
94+
# Stream via get_agent_response_content_stream()
95+
# Return application/x-ndjson with ChatResponse objects
96+
```
97+
98+
### Type Safety
99+
100+
- **Discriminated unions**: `chat_response_content_types` uses `content_type` discriminator
101+
- **Pydantic validation**: All request/response models with comprehensive field validation
102+
103+
## Container Architecture
104+
105+
### Multi-stage Build
106+
107+
```dockerfile
108+
# base: Python + uv + dependencies
109+
# build: Package building stage
110+
# api: FastAPI service (port 8000)
111+
# dask: Distributed computing image
112+
```
113+
114+
### CI/CD Integration
115+
116+
- **Local builds**: Use SSH keys for private repo access
117+
- **CI builds**: Use GitLab CI tokens with HTTPS git URL rewriting
118+
- **Environment detection**: `BUILD_ENV_IS_CI` flag controls auth method
119+
120+
## Agent Integration
121+
122+
### Tool Integration
123+
124+
- **Data search**: `data_search_tool_with_tracking`
125+
- **Data analysis**: `data_analysis_tool_with_tracking`
126+
- **Response shaping**: `agent_plan_tool`, `text_response_tool`, `end_turn_tool`
127+
- **Plan management**: Custom `PlanManager` for step tracking
128+
129+
## File Organization
130+
131+
### Core Structure
132+
133+
```
134+
app/
135+
├── main.py # FastAPI app, routes, middleware
136+
├── types.py # Pydantic models, type definitions
137+
├── db/
138+
│ ├── __init__.py # Database module exports
139+
│ ├── interface.py # UserDatabase Protocol interface
140+
│ └── sqlite_db.py # SQLite implementation with user/chat tables
141+
└── utils/
142+
├── utils.py # Agent factories, streaming utilities
143+
└── plan_manager.py # Plan/step management logic
144+
```
145+
146+
### Testing Mirror
147+
148+
```
149+
tests/
150+
├── conftest.py # Pytest configuration, integration marker handling
151+
├── unit/app/ # Mirrors app/ structure for unit tests
152+
└── integration/app/ # Integration tests requiring network calls
153+
```
154+
155+
## Common Gotchas
156+
157+
1. **Integration tests**: Default pytest run skips them; use `-m integration` explicitly
158+
2. **Private repo access**: Ensure SSH keys configured locally or CI tokens in GitLab
159+
3. **UV lock sync**: Run `uv lock --check` to validate lock file consistency
160+
4. **Type checking**: Pyright strict mode catches many runtime issues early
161+
5. **Response streaming**: Use `application/x-ndjson` for chat streaming, not regular JSON
162+
163+
## Code Quality Requirements
164+
165+
**CRITICAL**: Before declaring any code changes complete, you MUST run and resolve ALL of the following:
166+
167+
### Mandatory Quality Checks
168+
169+
```bash
170+
# FIRST: Activate virtual environment (required for all commands below)
171+
source .venv/bin/activate
172+
173+
# 1. Fix all linting errors
174+
scripts/lint.sh
175+
176+
# 2. Run tests to ensure functionality
177+
scripts/test.sh
178+
179+
# 3. Format code properly
180+
ruff format app/ tests/
181+
```
182+
183+
### Quality Standards
184+
185+
- **Zero tolerance**: No ruff linting errors, pyright type errors, or shellcheck issues
186+
- **Type safety**: All functions must have proper type annotations and pass strict pyright checks
187+
- **Import organization**: Follow PEP 8 import ordering, remove unused imports
188+
- **Docstrings**: All public functions/classes must have Google-style docstrings
189+
- **Error handling**: Use proper exception chaining with `from err` or `from None`
190+
191+
### Before Completion Checklist
192+
193+
- [ ] `scripts/lint.sh` passes with no errors
194+
- [ ] All imports are used and properly organized
195+
- [ ] Type annotations are complete and accurate
196+
- [ ] No trailing whitespace or formatting issues
197+
- [ ] Tests pass (when applicable)
198+
- [ ] Code follows established patterns in the codebase

0 commit comments

Comments
 (0)