Purpose: If the current team leads leave tomorrow, this document should allow the next person to pick up where they left off with zero handholding.
Audience: Team leads and senior contributors. For onboarding, see README.md. For dev workflow, see CONTRIBUTING.md. For task status, see TASK_TRACKER.md.
External Data (Wikipedia, NGO sites)
│ Web scraping (Python)
▼
┌──────────────┐ ┌──────────────────┐
│ Saayam DB │ │ GenAI Lambda │
│ (PostgreSQL) │ │ (More_Org_GenAI │
│ │ │ _Py_v3126) │
└──────┬───────┘ └────────┬─────────┘
└────────┬────────────┘
▼
saayam-org-aggregator Lambda
(merges sources, deduplicates)
│
▼
API Gateway → webapp frontend
PostgreSQL → S3 Data Lake → Vectorize → Vector DB → AI Agent
Not yet built. The team needs to produce a functional spec / design spec for this.
Only 7 of 40+ repos are actively developed:
webapp (React) ←── api (API Gateway) ←── mobileapp
│ │
▼ ▼
volunteer request
(Java/Spring) (Help requests)
│ │
▼ ▼
database (PostgreSQL)
│
▼
data (Python) ← YOU ARE HERE
│
▼
ai (Python/Flask — GenAI)
devsecops — CI/CD, infra (all teams)
| Team | Repo | How We Interact |
|---|---|---|
| GenAI / AI | ai | We invoke their Lambda. Future: we feed vectorized data to their agent. |
| Frontend | webapp | They consume our Lambda endpoints. #99 is a cross-team task. |
| Backend / API | api | They set up API Gateway routes to our Lambdas. |
| Database | database | We read from their DB. Coordinate for schema changes. |
| DevSecOps | devsecops | They manage AWS infra our Lambdas run on. |
| Product | prod | Defines what we build. MVP Pages wiki. |
Lambda that accepts a help request (subject, description, location, category) and fetches matching orgs from:
- Saayam DB — registered orgs (tagged "verified")
- GenAI Lambda — AI-suggested orgs (tagged "genai")
Merges, deduplicates (DB takes priority), returns unified list with graceful degradation.
Lambda: saayam-org-aggregator (us-east-1)
Input:
{
"category": "Shelter",
"subject": "Shelter",
"description": "i need a place to stay",
"location": "tampa"
}Output:
{
"statusCode": 200,
"body": [
{
"name": "The Salvation Army Tampa",
"location": "Tampa, FL",
"contact": "(813) 223-1320",
"email": "...",
"web_url": "...",
"mission": "...",
"source": "..."
}
]
}Scrapes emergency contact numbers from Wikipedia → cleans with pandas → inserts into PostgreSQL via SQLAlchemy.
Files: src/scrapers/emergency_contacts/ — scraper.py, cleaner.py, loader.py.
Country-specific scrapers for nonprofit listings: src/scrapers/ngo/afghanistan.py, india.py, malaysia.py. Run independently, produce CSVs. Not yet in an automated pipeline.
src/translation/lang_detection.py — detects language with langdetect, translates to English with GoogleTranslator.
src/models/fraud_requests.py — SQLAlchemy model for fraud requests. Schema defined, no active detection logic built.
ETL architecture design (#57), Aurora schema for nonprofits (#56), AWS architecture doc (#55), IRS S3 Lambda (#60), Charity Navigator scraper (#62), IRS nonprofit categorization (#67). These informed the current pipeline but IRS data was later dropped (see Decision Log).
| Service | Purpose | Access |
|---|---|---|
| Lambda | Serverless functions | Team leads only |
| S3 | Data lake, datasets | Team leads only |
| Aurora PostgreSQL | Primary database | Team leads only |
| API Gateway | Routes to Lambdas | API/DevSecOps team |
Invoking other Lambdas:
client = boto3.client('lambda', region_name='us-east-1')
response = client.invoke(FunctionName='More_Org_GenAI_Py_v3126', ...)| Date | Decision | Rationale |
|---|---|---|
| Feb 2026 | Dropped IRS data from org-aggregator | Too noisy, not useful for matching. Aggregator uses Saayam DB + GenAI only. |
| Feb 2026 | #99 is cross-team (data + webapp) | Lambda done. Frontend work remains — needs React-comfortable volunteer. |
| 2025 | Local-first development | AWS access cannot be given to all volunteers. Cost and security. |
| 2025 | Pair programming mandate | 97% churn means no single person should own a task alone. |
| Apr 2025 | Aurora PostgreSQL as primary DB | PostgreSQL compatible, managed AWS, scalable. |
- No tests. No unit tests written yet.
- No CI/CD. No GitHub Actions for automated testing/linting.
- Stale issues (#80-90). Need triage — reassign or close.
If you are leaving the team lead role:
- Update this document and the README with any changes.
- Brief the incoming lead on active issues (see TASK_TRACKER.md).
- Introduce the new lead in the WhatsApp group.
- Transfer AWS access (coordinate with Rao and DevSecOps).
- Share any credentials not in the shared environment.
- Walk through deployed Lambdas on AWS — what, where, how configured.
- Review and hand off open PRs.
- Inform Rao about the transition.
- Update Key Contacts in the README.
- Close or reassign stale issues.
Last updated: February 2026 · Maintained by: Data Engineering Team Leads