Welcome to the Berry project! This document provides comprehensive guidelines for contributing to the high-performance Yarn lockfile parser.
Berry is a high-performance, zero-allocation parser for Yarn v3/v4 lockfiles, built with Rust and nom. The project focuses on:
- Performance: Sub-millisecond parsing for most lockfiles
- Memory Efficiency: Zero-allocation parsing with minimal heap usage
- Modularity: Clean architecture for WASM and Node.js integration
- Reliability: Comprehensive testing and benchmarking
crates/
├── berry-core/ # Main parser library
├── berry-test/ # Integration tests
├── berry-bench/ # Criterion microbenchmarks
├── berry-bench-bin/ # CLI benchmarking tool
└── node-bindings/ # Node.js bindings (planned)
- Rust: Latest stable version (1.70+)
- Cargo: With workspace support
- Git: For version control
# Clone the repository
git clone <repository-url>
cd berry
# Build all crates
cargo build --workspace
# Run tests
cargo test --workspace
# Check code quality
cargo clippy --workspaceBerry includes a comprehensive benchmarking system to ensure performance and detect regressions.
The benchmarking infrastructure consists of two main components:
-
Criterion Microbenchmarks (
crates/berry-bench/)- Statistical benchmarking with confidence intervals
- Beautiful HTML reports
- Regression detection
- Memory usage tracking
-
CLI Benchmarking Tool (
crates/berry-bench-bin/)- Quick performance testing for development
- Multiple fixture support
- Regression detection
- JSON output for CI integration
# Test a specific fixture
cargo run --bin berry-bench-bin -- -f minimal-berry.lock -v
# Test all working fixtures
cargo run --bin berry-bench-bin -- --all -r 10
# Get JSON output for CI integration
cargo run --bin berry-bench-bin -- --all --format json# Run comprehensive Criterion benchmarks
cargo bench --package berry-bench
# Quick benchmark run
cargo bench --package berry-bench --bench parser_benchmarks -- --quick
# Generate HTML reports
cargo bench --package berry-bench -- --html- Small fixtures (1-10 packages):
minimal-berry.lock,workspaces.yarn.lock - Medium fixtures (10-1000 packages):
yarn4-mixed-protocol.lock,auxiliary-packages.yarn.lock - Large fixtures (1000+ packages):
berry.lock,resolutions-patches.yarn.lock
- Heap usage tracking: Physical and virtual memory measurement
- Zero-allocation validation: Verify no allocations during parsing
- Memory scaling: Correlation between file size and memory usage
- Simple lockfiles: Basic dependency structures
- Mixed protocols: npm, workspace, and other protocols
- Resolutions: Complex resolution scenarios
- Patches: Patch protocol handling
- Parsing speed: < 1ms for small files (< 1KB), < 10ms for medium files (< 100KB), < 100ms for large files (< 1MB)
- Memory usage: Zero allocations during parsing phase, minimal allocations for final data structures
- Regression detection: Automated alerts for >5% performance degradation
Benchmark Results:
Fixture Size (bytes) Mean (ms) Min (ms) Max (ms) Heap (bytes)
------------------------------------------------------------------------------------------
minimal-berry.lock 1152 0.132 0.131 0.133 20480
workspaces.yarn.lock 2005 0.048 0.046 0.050 8192
auxiliary-packages.yarn.lock 40540 0.082 0.080 0.085 20480
Performance Analysis:
workspaces.yarn.lock performance looks normal (1.0x vs fastest)
minimal-berry.lock is 2.8x slower than workspaces.yarn.lock (potential regression)
fixture_parsing/minimal_berry
time: [6.1249 b5s 6.2624 b5s 6.2968 b5s]
change: [-3.4204% -0.9236% +1.4829%] (p = 0.85 > 0.05)
No change in performance detected.
heap_usage/heap_small time: [1.2025 ms 1.2383 ms 1.2472 ms]
The benchmarking system tracks:
- Physical memory: Actual heap usage in bytes
- Virtual memory: Virtual memory allocation
- Allocation patterns: Zero-allocation validation
- Memory scaling: Correlation with file size
The system automatically detects:
- Performance regressions: >50% slower than baseline
- Statistical significance: p < 0.05 in Criterion tests
- Memory usage increases: Unexpected heap usage growth
- Zero-allocation violations: Unexpected allocations during parsing
# Run all tests
cargo test --workspace
# Run specific crate tests
cargo test --package berry-core
cargo test --package berry-test
# Run with verbose output
cargo test --workspace -- --nocapture- Individual parsing function tests
- Edge case handling
- Error condition validation
- End-to-end lockfile parsing
- Real fixture validation
- Cross-platform compatibility
- Performance regression detection
- Memory usage validation
- Statistical significance testing
# Format code
cargo fmt --workspace
# Check code quality
cargo clippy --workspace
# Check for security issues
cargo audit- Feature commits:
feat: add multi-descriptor support - Bug fixes:
fix: resolve parsing issue with large fixtures - Performance:
perf: optimize dependency parsing - Documentation:
docs: update benchmarking guide - Tests:
test: add edge case validation
- Create feature branch:
git checkout -b feature/your-feature - Make changes: Follow code style guidelines
- Add tests: Include unit and integration tests
- Run benchmarks: Ensure no performance regressions
- Update documentation: Update relevant docs
- Submit PR: Include detailed description and benchmark results
- Use borrowed strings:
&strinstead ofStringduring parsing - Avoid intermediate collections: Use
fold_many0instead ofmany0 - Defer allocation: Only allocate when building final data structures
- Single-pass parsing: Parse everything in one go
- Profile first: Use benchmarks to identify bottlenecks
- Measure impact: Always benchmark before and after changes
- Consider trade-offs: Performance vs. memory vs. complexity
- Document decisions: Explain optimization choices
- Premature optimization: Optimize only after profiling
- Ignoring benchmarks: Always run benchmarks before committing
- Memory leaks: Ensure proper cleanup in long-running scenarios
- Over-engineering: Keep solutions simple and maintainable
- Before changes: Run benchmarks to establish baseline
- During development: Use CLI tool for quick feedback
- Before commit: Run full benchmark suite
- After merge: Monitor for regressions
# Example GitHub Actions workflow
name: Performance Benchmarks
on: [push, pull_request]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
- run: cargo bench --workspace
- run: cargo run --bin berry-bench-bin -- --all- Parsing failures: Check fixture format and parser logic
- Performance regressions: Compare with baseline benchmarks
- Memory issues: Use heap usage tracking to identify leaks
- Test failures: Check fixture availability and format
# Run with debug output
RUST_LOG=debug cargo test
# Profile with flamegraph
cargo install flamegraph
cargo flamegraph --bench parser_benchmarks
# Memory profiling
cargo run --bin berry-bench-bin -- -f large-fixture.lock -v- Task List - Detailed development progress
- Benchmarking Plan - Comprehensive benchmarking strategy
- Nom Documentation - Parser combinator library
- Criterion Documentation - Benchmarking framework
- Issues: Use GitHub issues for bugs and feature requests
- Discussions: Use GitHub discussions for questions and ideas
- Benchmarks: Share benchmark results and performance analysis
- Contributions: Follow this guide for code contributions
Thank you for contributing to Berry! Your work helps make this parser faster, more reliable, and more useful for the community.