๐ AI Berkeley Hackathon Project - Semantic drift detection using embedding-powered analysis of git history
./setup.sh demoShows semantic drift analysis on a generated repository
./setup.sh full
# Open http://localhost:3000Complete interface for analyzing any GitHub repository
./setup.sh helpModern software development moves fast. Features change, APIs evolve, and sometimes small commits create huge unexpected impactsโwhether breaking downstream functionality, causing subtle bugs, or violating intended product behavior. Teams lose track of why things were changed or when something started behaving differently.
DiffSense solves this by using embedding-powered semantic drift detection over git diffs, commit messages, issue tickets, and changelogs.
โ You input a function, file, or API you want to audit
โ The system retrieves its historical versions, compares the semantic meaning of changes over time via embeddings
โ Generates clear, human-readable explanations of how and why that feature changed
./start.shBackend Setup:
cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.pyFrontend Setup:
cd frontend
npm install
npm run devDemo Script:
cd backend
python demo.py- Backend: Python FastAPI with ML pipeline
- Frontend: React with Recharts visualization
- ML Models:
- CodeBERT for code embeddings
- sentence-transformers for text embeddings
- Hybrid embedding approach
 
- Git Analysis: GitPython for repository parsing
- API: RESTful endpoints with real-time analysis
- Git Analyzer (git_analyzer.py) - Extract and parse git history
- Embedding Engine (embedding_engine.py) - Generate semantic embeddings
- Drift Detector (drift_detector.py) - Analyze semantic changes over time
- FastAPI Backend (main.py) - REST API for frontend integration
- React Frontend - Interactive visualization and analysis interface
- Track how code meaning changes over time using AI embeddings
- Identify gradual vs sudden semantic shifts
- Measure cumulative drift from original implementation
- ML-powered risk scoring for commits
- Predict potentially risky changes before they impact users
- Historical pattern analysis for risk assessment
- Visual drift timeline with commit details
- Identify significant change events
- Hover details with commit messages and metrics
- File-level: Analyze entire file evolution
- Function-level: Track specific function changes
- Repository-level: Overall project health metrics
- Repository Input: Enter GitHub repository URL
- File Selection: Choose file or function to analyze
- Semantic Analysis: AI processes git history and generates embeddings
- Drift Visualization: Interactive timeline showing semantic changes
- Risk Assessment: Breaking change prediction with explanations
- Export Results: Summary reports and recommendations
- Detect undocumented breaking changes before a release
- Help new developers quickly catch up on why parts of the codebase evolved
- Trace regressions back to their origins, even in noisy or badly documented projects
- Risk assessment for releases
- Technical debt tracking over time
- API stability monitoring
- Contributor onboarding with feature evolution stories
- Impact analysis for proposed changes
- Documentation gap identification
# Hybrid approach combining code and text semantics
code_embedding = CodeBERT.encode(code_diff)
text_embedding = SentenceTransformer.encode(commit_message)
hybrid_embedding = 0.7 * code_embedding + 0.3 * text_embedding# Semantic similarity tracking over time
def calculate_drift(embeddings_timeline):
    drift_scores = []
    for i in range(1, len(embeddings_timeline)):
        similarity = cosine_similarity(embeddings_timeline[0], embeddings_timeline[i])
        drift_scores.append(1 - similarity)  # Higher = more drift
    return drift_scores- Feature Engineering: Code metrics + semantic embeddings + commit metadata
- Heuristic Model: Risk scoring based on drift patterns and change magnitude
- Contextual Analysis: Related issues, commit message sentiment, file importance
DiffSense/
โโโ README.md                 # This file
โโโ TECHNICAL_ROADMAP.md      # Detailed implementation guide
โโโ start.sh                  # One-command startup script
โโโ backend/
โ   โโโ main.py              # FastAPI server
โ   โโโ demo.py              # Standalone demo script
โ   โโโ requirements.txt     # Python dependencies
โ   โโโ src/
โ       โโโ git_analyzer.py     # Git repository analysis
โ       โโโ embedding_engine.py # AI embedding generation
โ       โโโ drift_detector.py   # Semantic drift detection
โโโ frontend/
    โโโ package.json         # Node.js dependencies
    โโโ vite.config.js       # Vite configuration
    โโโ tailwind.config.js   # Tailwind CSS config
    โโโ src/
        โโโ App.jsx              # Main React application
        โโโ components/
            โโโ RepositoryCloner.jsx  # Repository input interface
            โโโ DriftAnalyzer.jsx     # Main analysis interface
            โโโ FileSelector.jsx      # File selection component
            โโโ DriftSummary.jsx      # Analysis results summary
            โโโ DriftTimeline.jsx     # Interactive timeline chart
- Novel application of code embeddings for semantic drift detection
- Hybrid embedding approach combining code and natural language understanding
- Real-time git history analysis with visual feedback
- Addresses real pain points in software development
- Scalable to any git repository
- Immediate actionable insights for development teams
- Intuitive web interface with beautiful visualizations
- One-click repository analysis
- Interactive timeline exploration
- Clear risk assessments and explanations
- LLM Integration: Use Claude/GPT for natural language explanations
- Advanced ML: Train custom models for breaking change prediction
- Integration: GitHub Apps, VS Code extensions, CI/CD webhooks
- Collaboration: Team insights, change approval workflows
- Scale: Enterprise deployment, multi-repository analysis
- Quick Demo: ./start.shโ Open http://localhost:3000
- Standalone Demo: cd backend && python demo.py
- Example Repository: Try with https://github.com/microsoft/vscode
- Explore: Select a file like src/vs/editor/editor.api.ts
Built for the AI Berkeley Hackathon. Special thanks to the open-source community for the foundational tools that make this possible.
Ready to detect feature drift in your codebase? Let's get started! ๐