Maintained by: Project Managers (Janhavi · Ruthwik)
Update frequency: Weekly — update statuses and assignments before each Tuesday team meeting.
| Issue | Title | Assigned To | Status | Notes |
|---|---|---|---|---|
| #102 | Read and Review Data Engineering Onboarding Documentation | — | 🔄 Open | Before being assigned any development work, all new Data Engineering volunteers must read and understand the team's documentation. This is your first task. |
| #78 | Beginner Task: Environment Check + Saayam Practice Dataset Exploration (Local Only) | — | 🔄 Open | Before being assigned any development work, all new Data Engineering volunteers must complete the beginner-friendly local exercise. |
| Issue | Title | Assigned To | Status | Notes |
|---|---|---|---|---|
| #103 | Daily Metrics Aggregation Lambda | Arjun2110exe, Keerthana19-p, Nitish0615, gauri-d, sanobarasna | 🔄 Open | |
| #104 | Homepage Metrics Display from S3 (Web App Team) | Neharik335, PallaviP31, Srilaxmi1616, Ujwalap910 | 🔄 Open | |
| #114 | Generate Synthetic CSV Data for "users" and "request" tables from Database Schema | AnushaDusakanti15, Sravy-Kolli, VeeraVSDeekshith, navyasri0820 | 🔄 Open | |
| #117 | Generate Synthetic CSV Data for "volunteers_assigned"and "volunteer_details" tables from Database Schema | jeevanbanoth, priyankarao89, trikotrsh, vamisaigarapati | 🔄 Open | |
| #118 | Generate Synthetic CSV Data for "volunteer_applications" and "user_skills" tables from Database Schema | emmax07, harshinianubrolu-12, sagarikapatha, shendu-95 | 🔄 Open | |
| #119 | Generate Synthetic CSV Data for "request_comments" and "volunteer_rating" tables from Database Schema | Nishu2000-hub, jeminmiyani, rohitsurya7393, sanobarsana | 🔄 Open | |
| #120 | Generate Synthetic CSV Data for "request_guest_details" and "req_add_info" tables from Database Schema | PoojaryAnusha98, Sindhu782, Slakkimsetty, gkswapna | 🔄 Open | |
| #121 | Generate Synthetic CSV Data for "fraud_requests" and "notifications" tables from Database Schema | AbhiBadola, Bhvnikirn, Gyashaswi, pulipakav1 | 🔄 Open |
Note: Many issues in the #80-90 range are stale — created by volunteers who are no longer active. These need triage (reassign or close).
| Issue | Title | Completed | Notes |
|---|---|---|---|
| #98 | Aggregate Organization Listings from Multiple Sources | Feb 2026 | Lambda deployed. See KNOWLEDGE_TRANSFER.md for specs. |
| #57 | ETL Architecture Design | 2025 | Informed current pipeline. |
| #56 | Aurora Schema for Nonprofits | 2025 | Schema defined. |
| #55 | AWS Architecture Document | 2025 | Informed current pipeline. |
| #62 | Charity Navigator Scraper | 2025 | Completed. |
| #60 | IRS S3 Lambda | 2025 | IRS data later dropped. |
| #67 | IRS Nonprofit Categorization | 2025 | IRS data later dropped. |
| — | Emergency Contact Data Pipeline | 2025 | Wikipedia → clean → PostgreSQL. |
| — | NGO Web Scrapers | 2025 | Afghanistan, India, Malaysia. CSVs, not yet automated. |
| — | Language Detection & Translation | 2025 | langdetect + GoogleTranslator. |
| — | Fraud Detection Model (Schema Only) | 2025 | SQLAlchemy model only, no detection logic. |
| #99 | Integrate Org Aggregator into Frontend | March 2026 | Cross-team (data + webapp). Lambda is done. Remaining work is React frontend integration. |
| #100 | Auto-Categorize Help Requests Using Lambda | March 2026 | LLM-based classification. |
| #84 | Super Admin Dashboard Application Analytics design : creation of Test data | March 2026 | |
| #89 | Application Analytics Dashboard implmentation in Python using plotly libaray | March 2026 | |
| #80 | Prepare and Categorize ProPublica Nonprofit Data | March 2026 | |
| #81 | Creation of Super Admin Dashboard Analytics | March 2026 | |
| #82 | Design of Application Analytics tab under Super Admin Dashboard | March 2026 | |
| #83 | Design Infrastructure analytics wiki page under ML repo | March 2026 | |
| #85 | Work on Infrastructure analytics dashboard under Super Admin Dashboard Analytics tab | March 2026 | |
| #87 | Use Sentiment Analysis to detect Depressive or Threatening Language | March 2026 | |
| #88 | Design of Google Analytics tab under Super Admin Dashboard | March 2026 | |
| #96 | Design of Consolidated metrics of application analytics on the Saayam Home Page | March 2026 |
| Priority | Item | Details | Status |
|---|---|---|---|
| 1 | RDBMS → S3 Data Lake | Periodic data export from Aurora to S3. Needs functional spec. | Not started |
| 2 | Data Vectorization | Generate embeddings from exported data. Tech selection needed. | Not started |
| 3 | Vector DB Setup | Evaluate Pinecone, Weaviate, pgvector, etc. | Not started |
| 4 | Saayam AI Agent | Agent that queries vector DB with Saayam's own data. | Not started |
| 5 | Content Safety | Sentiment analysis (#87), translation for content filtering (#86). | Stale |
| 6 | Analytics Dashboard | Super Admin analytics (#81-89). Data Analytics team scope. | Stale |
These align with Rao's long-term directive:
PostgreSQL (RDBMS) → S3 Data Lake → Vectorize → Vector DB → AI Agent
Last updated: March 2026