PyGit is a lightweight implementation of Git's core functionality in Python. This project implements the fundamental concepts of version control including commits, branches, merging, and remote operations.
-
Basic Version Control
- Initialize repositories (
init) - Stage changes (
add) - Create commits (
commit) - View status (
status) - View history (
log)
- Initialize repositories (
-
Branch Management
- Create branches (
branch) - Switch branches (
checkout) - Merge branches (
merge) - View branch structure (
k)
- Create branches (
-
Remote Operations
- Clone repositories (
clone) - Fetch changes (
fetch) - Push changes (
push)
- Clone repositories (
-
Low-level Operations
- Hash objects (
hash-object) - View object contents (
cat-file) - Manipulate trees (
write-tree,read-tree)
- Hash objects (
- Python 3.6 or higher
- Graphviz (for visualization features)
- Windows: Download from https://graphviz.org/download/
- Linux:
sudo apt-get install graphviz - macOS:
brew install graphviz
-
Install Graphviz (see Requirements above)
-
Clone the repository:
git clone https://github.com/NyasimiPhilip/0-source-control-system.git cd pygit -
Install in development mode:
pip install -e . -
Verify installation:
pygit --help
Initialize a repository:
# Create project directory
mkdir my-project
cd my-project
# Initialize repository
pygit init
# Create and add files
echo "Hello, PyGit!" > hello.txt
pygit add hello.txt or pygit add .
pygit commit -m "Initial commit"
# View status and history
pygit status
pygit log# Create and switch branches
pygit branch feature
pygit checkout feature
pygit branch
# Make changes
echo "New feature" > feature.txt
pygit add feature.txt
pygit commit -m "Add feature"
# Merge changes
pygit checkout master
pygit merge feature# Clone a repository
pygit clone /path/to/source/repo /path/to/destination
# Fetch and push
pygit fetch /path/to/remote
pygit push /path/to/remote masterpygit/
├── __init__.py
├── base.py # Core VCS functionality
├── cli.py # Command-line interface
├── commands.py # Command implementations
├── data.py # Data storage operations
├── diff.py # Diff and merge logic
├── parser.py # Command parsing
└── remote.py # Remote operations
- Objects are stored in
.pygit/objects/using SHA-1 hashes - Supports blobs (files), trees (directories), and commits
- Staging area implemented in
.pygit/index - JSON format for simplicity
- Branches stored in
.pygit/refs/heads/ - HEAD reference tracks current branch
Both PyGit and Git use SHA-1 hash algorithms to store objects, creating a unique identifier for each piece of content. This approach ensures data integrity and allows for efficient content retrieval. In both systems, objects are stored based on their content hash, which means identical files will always have the same hash, regardless of their location or when they were created.
Both implementations use a reference system to manage branches, tags, and the current state of the repository. References are essentially pointers to specific commits, allowing for easy navigation and tracking of the repository's history. This includes maintaining HEAD references, branch pointers, and the ability to create and switch between different references.
PyGit and Git both represent the repository's history as a directed acyclic graph (DAG) of commits. Each commit points to its parent commit(s), creating a linear or branching history. This structure allows for tracking changes, understanding the evolution of the project, and supporting branching and merging operations.
Unlike Git, which offers a comprehensive set of version control features, PyGit is focused on core functionality. Git provides advanced features like:
- Interactive rebasing
- Stashing changes
- Sophisticated conflict resolution
- Submodule management
- Extensive branching strategies
Git supports multiple named remotes with complex remote management capabilities, including:
- Multiple remote repositories
- Different push and fetch URLs
- Advanced remote tracking
- Detailed remote branch management
In contrast, PyGit offers basic remote operations with limited remote management. It provides a simplified approach to working with remote repositories, focusing on the core concepts of fetching and pushing changes.
Git uses a sophisticated packfile system for:
- Compressing repository data
- Reducing storage space
- Improving network transfer efficiency
- Handling large repositories with many objects
PyGit stores objects as loose files without advanced compression techniques. This approach:
- Simplifies the implementation
- Increases storage requirements
- Reduces performance for large repositories
- Makes the internal storage mechanism more transparent and easier to understand
Git offers advanced merge capabilities:
- Automatic conflict detection
- Multiple merge strategies (recursive, octopus, etc.)
- Interactive merge conflict resolution
- Detailed conflict marking
PyGit implements only basic merging:
- Simple linear merge attempts
- Limited conflict handling
- Requires manual intervention for complex merge scenarios
- Focuses on demonstrating the fundamental merge concept
PyGit uses .pygitignore files to determine which files and directories to ignore. Only the .pygit/ directory is ignored by default.
Create a .pygitignore file in your repository root:
# Create .pygitignore
touch .pygitignore
# Add patterns to ignore
echo "*.log" >> .pygitignore
echo "node_modules/" >> .pygitignoreThe .pygitignore file supports several pattern formats:
-
Directory patterns (ending with
/):node_modules/ # Ignores the entire directory build/ # Ignores build directory -
File patterns with wildcards (
*):*.log # Ignores all .log files *.pyc # Ignores Python compiled files test_*.py # Ignores test files -
Exact matches:
secret.txt # Ignores specific file config.json # Ignores specific file -
Comments and formatting:
# This is a comment # Python files *.pyc __pycache__/ # Build directories dist/ build/
# Development
__pycache__/
*.pyc
*.pyo
.env
# Dependencies
node_modules/
venv/
.venv/
# Build outputs
dist/
build/
*.egg-info/
# Logs and databases
*.log
*.sqlite
# IDE specific
.vscode/
.idea/
*.swp
Note: Unlike Git, PyGit only ignores files that are explicitly listed in .pygitignore (except for the .pygit/ directory which is always ignored).
PyGit includes a comprehensive test suite covering core functionality. The tests are located in the test/ directory.
test/
├── __init__.py
├── run_tests.py # Test runner script
├── test_base.py # Core operations tests
├── test_branch.py # Branch operations tests
├── test_data.py # Data storage tests
├── test_diff.py # Diff functionality tests
├── test_ignore.py # Ignore pattern tests
├── test_remote.py # Remote operations tests
└── test_status.py # Status reporting tests
There are several ways to run the tests:
- Using the test runner script (recommended):
# From project root directory
python -m test.run_tests
# Or from test directory
cd test
python run_tests.py- Using Python's unittest directly:
# Run all tests
python -m unittest discover test
# Run specific test file
python -m unittest test.test_base
python -m unittest test.test_branch- Using pytest (optional):
# Install pytest first
pip install pytest
# Run all tests
pytest test/
# Run specific test file
pytest test/test_base.pyThe test suite covers:
- Basic operations (init, add, commit)
- Branch management (create, checkout, switch)
- Remote operations (clone, fetch, push)
- Data storage (objects, refs, index)
- Diff functionality
- Ignore patterns
- Status reporting
- Working directory operations
Before running tests:
- Ensure you're in your virtual environment (if using one)
- Install the package in development mode:
pip install -e .Test output will show:
- Number of tests run
- Test results (pass/fail)
- Any errors or failures
- Test execution time