Skip to content

SadeekFarhan21/DiffSense

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

25 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

DiffSense: Feature Drift Detector

๐Ÿ† AI Berkeley Hackathon Project - Semantic drift detection using embedding-powered analysis of git history

Demo Documentation Presentation

๐Ÿš€ Quick Start for Judges

Instant Demo (2 minutes)

./setup.sh demo

Shows semantic drift analysis on a generated repository

Full Web Application (5 minutes)

./setup.sh full
# Open http://localhost:3000

Complete interface for analyzing any GitHub repository

Help & Options

./setup.sh help

๐ŸŽฏ The Problem

Modern software development moves fast. Features change, APIs evolve, and sometimes small commits create huge unexpected impactsโ€”whether breaking downstream functionality, causing subtle bugs, or violating intended product behavior. Teams lose track of why things were changed or when something started behaving differently.

๐Ÿ’ก Solution: DiffSense

DiffSense solves this by using embedding-powered semantic drift detection over git diffs, commit messages, issue tickets, and changelogs.

โ†’ You input a function, file, or API you want to audit
โ†’ The system retrieves its historical versions, compares the semantic meaning of changes over time via embeddings
โ†’ Generates clear, human-readable explanations of how and why that feature changed

๐Ÿš€ Quick Start

Option 1: One-Command Start (Recommended)

./start.sh

Option 2: Manual Setup

Backend Setup:

cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py

Frontend Setup:

cd frontend
npm install
npm run dev

Demo Script:

cd backend
python demo.py

๐Ÿ—๏ธ Architecture

Technology Stack

  • Backend: Python FastAPI with ML pipeline
  • Frontend: React with Recharts visualization
  • ML Models:
    • CodeBERT for code embeddings
    • sentence-transformers for text embeddings
    • Hybrid embedding approach
  • Git Analysis: GitPython for repository parsing
  • API: RESTful endpoints with real-time analysis

Core Components

  1. Git Analyzer (git_analyzer.py) - Extract and parse git history
  2. Embedding Engine (embedding_engine.py) - Generate semantic embeddings
  3. Drift Detector (drift_detector.py) - Analyze semantic changes over time
  4. FastAPI Backend (main.py) - REST API for frontend integration
  5. React Frontend - Interactive visualization and analysis interface

โœจ Core Features

1. Semantic Drift Detection

  • Track how code meaning changes over time using AI embeddings
  • Identify gradual vs sudden semantic shifts
  • Measure cumulative drift from original implementation

2. Breaking Change Prediction

  • ML-powered risk scoring for commits
  • Predict potentially risky changes before they impact users
  • Historical pattern analysis for risk assessment

3. Interactive Timeline Visualization

  • Visual drift timeline with commit details
  • Identify significant change events
  • Hover details with commit messages and metrics

4. Multi-Level Analysis

  • File-level: Analyze entire file evolution
  • Function-level: Track specific function changes
  • Repository-level: Overall project health metrics

๐ŸŽฎ Demo Flow

  1. Repository Input: Enter GitHub repository URL
  2. File Selection: Choose file or function to analyze
  3. Semantic Analysis: AI processes git history and generates embeddings
  4. Drift Visualization: Interactive timeline showing semantic changes
  5. Risk Assessment: Breaking change prediction with explanations
  6. Export Results: Summary reports and recommendations

๐Ÿ“Š Use Cases

For Development Teams

  • Detect undocumented breaking changes before a release
  • Help new developers quickly catch up on why parts of the codebase evolved
  • Trace regressions back to their origins, even in noisy or badly documented projects

For Project Managers

  • Risk assessment for releases
  • Technical debt tracking over time
  • API stability monitoring

For Open Source Maintainers

  • Contributor onboarding with feature evolution stories
  • Impact analysis for proposed changes
  • Documentation gap identification

๐Ÿ› ๏ธ Technical Implementation

Embedding Strategy

# Hybrid approach combining code and text semantics
code_embedding = CodeBERT.encode(code_diff)
text_embedding = SentenceTransformer.encode(commit_message)
hybrid_embedding = 0.7 * code_embedding + 0.3 * text_embedding

Drift Calculation

# Semantic similarity tracking over time
def calculate_drift(embeddings_timeline):
    drift_scores = []
    for i in range(1, len(embeddings_timeline)):
        similarity = cosine_similarity(embeddings_timeline[0], embeddings_timeline[i])
        drift_scores.append(1 - similarity)  # Higher = more drift
    return drift_scores

Breaking Change Prediction

  • Feature Engineering: Code metrics + semantic embeddings + commit metadata
  • Heuristic Model: Risk scoring based on drift patterns and change magnitude
  • Contextual Analysis: Related issues, commit message sentiment, file importance

๐Ÿ“ Project Structure

DiffSense/
โ”œโ”€โ”€ README.md                 # This file
โ”œโ”€โ”€ TECHNICAL_ROADMAP.md      # Detailed implementation guide
โ”œโ”€โ”€ start.sh                  # One-command startup script
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ main.py              # FastAPI server
โ”‚   โ”œโ”€โ”€ demo.py              # Standalone demo script
โ”‚   โ”œโ”€โ”€ requirements.txt     # Python dependencies
โ”‚   โ””โ”€โ”€ src/
โ”‚       โ”œโ”€โ”€ git_analyzer.py     # Git repository analysis
โ”‚       โ”œโ”€โ”€ embedding_engine.py # AI embedding generation
โ”‚       โ””โ”€โ”€ drift_detector.py   # Semantic drift detection
โ””โ”€โ”€ frontend/
    โ”œโ”€โ”€ package.json         # Node.js dependencies
    โ”œโ”€โ”€ vite.config.js       # Vite configuration
    โ”œโ”€โ”€ tailwind.config.js   # Tailwind CSS config
    โ””โ”€โ”€ src/
        โ”œโ”€โ”€ App.jsx              # Main React application
        โ””โ”€โ”€ components/
            โ”œโ”€โ”€ RepositoryCloner.jsx  # Repository input interface
            โ”œโ”€โ”€ DriftAnalyzer.jsx     # Main analysis interface
            โ”œโ”€โ”€ FileSelector.jsx      # File selection component
            โ”œโ”€โ”€ DriftSummary.jsx      # Analysis results summary
            โ””โ”€โ”€ DriftTimeline.jsx     # Interactive timeline chart

๐ŸŽฏ Hackathon Demo Points

Technical Innovation

  • Novel application of code embeddings for semantic drift detection
  • Hybrid embedding approach combining code and natural language understanding
  • Real-time git history analysis with visual feedback

Practical Value

  • Addresses real pain points in software development
  • Scalable to any git repository
  • Immediate actionable insights for development teams

User Experience

  • Intuitive web interface with beautiful visualizations
  • One-click repository analysis
  • Interactive timeline exploration
  • Clear risk assessments and explanations

๐Ÿ”„ Future Enhancements

  • LLM Integration: Use Claude/GPT for natural language explanations
  • Advanced ML: Train custom models for breaking change prediction
  • Integration: GitHub Apps, VS Code extensions, CI/CD webhooks
  • Collaboration: Team insights, change approval workflows
  • Scale: Enterprise deployment, multi-repository analysis

๐Ÿƒโ€โ™‚๏ธ Getting Started for Judges

  1. Quick Demo: ./start.sh โ†’ Open http://localhost:3000
  2. Standalone Demo: cd backend && python demo.py
  3. Example Repository: Try with https://github.com/microsoft/vscode
  4. Explore: Select a file like src/vs/editor/editor.api.ts

๐Ÿค Team & Acknowledgments

Built for the AI Berkeley Hackathon. Special thanks to the open-source community for the foundational tools that make this possible.


Ready to detect feature drift in your codebase? Let's get started! ๐Ÿš€

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 69.5%
  • Python 29.0%
  • Other 1.5%