Skip to content

Latest commit

 

History

History
380 lines (286 loc) · 9.26 KB

File metadata and controls

380 lines (286 loc) · 9.26 KB

Contributing to ahocorasick

Thank you for considering contributing to ahocorasick! This document outlines the development workflow and guidelines.

Git Workflow (Git-Flow)

This project uses Git-Flow branching model for development.

Branch Structure

main                 # Production-ready code (tagged releases)
  └─ develop         # Integration branch for next release
       ├─ feature/*  # New features
       ├─ bugfix/*   # Bug fixes
       └─ hotfix/*   # Critical fixes from main

Branch Purposes

  • main: Production-ready code. Only releases are merged here.
  • develop: Active development branch. All features merge here first.
  • feature/*: New features. Branch from develop, merge back to develop.
  • bugfix/*: Bug fixes. Branch from develop, merge back to develop.
  • hotfix/*: Critical production fixes. Branch from main, merge to both main and develop.

Workflow Commands

Starting a New Feature

# Create feature branch from develop
git checkout develop
git pull origin develop
git checkout -b feature/my-new-feature

# Work on your feature...
git add .
git commit -m "feat: add my new feature"

# When done, merge back to develop
git checkout develop
git merge --squash feature/my-new-feature  # Squash merge for clean history
git commit -m "feat: my new feature (squashed)"
git branch -d feature/my-new-feature
git push origin develop

Fixing a Bug

# Create bugfix branch from develop
git checkout develop
git pull origin develop
git checkout -b bugfix/fix-issue-123

# Fix the bug...
git add .
git commit -m "fix: resolve issue #123"

# Merge back to develop
git checkout develop
git merge --squash bugfix/fix-issue-123  # Squash merge for clean history
git commit -m "fix: resolve issue #123 (squashed)"
git branch -d bugfix/fix-issue-123
git push origin develop

Creating a Release

# Create release branch from develop
git checkout develop
git pull origin develop
git checkout -b release/v0.2.0

# Update version numbers, CHANGELOG, etc.
git add .
git commit -m "chore: prepare release v0.2.0"

# Merge to main and tag
git checkout main
git merge --no-ff release/v0.2.0
git tag -a v0.2.0 -m "Release v0.2.0"

# Merge back to develop
git checkout develop
git merge --no-ff release/v0.2.0

# Delete release branch
git branch -d release/v0.2.0

# Push everything
git push origin main develop --tags

Hotfix (Critical Production Bug)

# Create hotfix branch from main
git checkout main
git pull origin main
git checkout -b hotfix/critical-bug

# Fix the bug...
git add .
git commit -m "fix: critical production bug"

# Merge to main and tag
git checkout main
git merge --no-ff hotfix/critical-bug
git tag -a v0.1.1 -m "Hotfix v0.1.1"

# Merge to develop
git checkout develop
git merge --no-ff hotfix/critical-bug

# Delete hotfix branch
git branch -d hotfix/critical-bug

# Push everything
git push origin main develop --tags

Commit Message Guidelines

Follow Conventional Commits specification:

<type>(<scope>): <description>

[optional body]

[optional footer]

Types

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation changes
  • style: Code style changes (formatting, etc.)
  • refactor: Code refactoring
  • test: Adding or updating tests
  • chore: Maintenance tasks (build, dependencies, etc.)
  • perf: Performance improvements

Examples

feat: add Unicode property class support
fix: correct epsilon-closure in NFA compilation
docs: update README with performance benchmarks
refactor: simplify DFA state cache implementation
test: add fuzz tests for memchr edge cases
chore: update go.mod to Go 1.25.4
perf: optimize Teddy SIMD prefilter for AVX2

Code Quality Standards

Before Committing

  1. Format code:

    go fmt ./...
  2. Run linter:

    golangci-lint run
  3. Run tests:

    go test ./...
  4. Run tests with race detector:

    go test -race ./...
  5. All-in-one (use pre-release script):

    bash scripts/pre-release-check.sh

Pull Request Requirements

  • Code is formatted (go fmt ./...)
  • Linter passes (golangci-lint run - 0 issues)
  • All tests pass (go test ./...)
  • Race detector passes (go test -race ./...)
  • New code has tests (minimum 70% coverage)
  • Documentation updated (if applicable)
  • Commit messages follow conventions
  • No sensitive data (credentials, tokens, etc.)
  • Benchmarks added for performance-critical code

Development Setup

Prerequisites

  • Go 1.25 or later
  • golangci-lint
  • GCC or Clang (for race detector)
  • Optional: WSL2 with Go (for Windows users without GCC)

Install Dependencies

# Clone repository
git clone https://github.com/coregx/ahocorasick.git
cd ahocorasick

# Download dependencies
go mod download

# Install golangci-lint
# See: https://golangci-lint.run/welcome/install/

Running Tests

# Run all tests
go test ./...

# Run with coverage
go test -cover ./...

# Run with race detector
go test -race ./...

# Run benchmarks
go test -bench=. -benchmem ./simd/
go test -bench=. -benchmem ./prefilter/

Running Linter

# Run linter
golangci-lint run

# Run with verbose output
golangci-lint run -v

# Verify config
golangci-lint config verify

Project Structure

ahocorasick/
├── .github/              # GitHub workflows and templates
│   ├── CODEOWNERS       # Code ownership
│   └── workflows/       # CI/CD pipelines
├── ahocorasick.go        # Package entry point
├── automaton.go          # Search API (Find, IsMatch, FindAll)
├── builder.go            # Pattern builder
├── byteclasses.go        # Alphabet compression
├── match.go              # Match types and semantics
├── nfa.go                # Trie + failure links
├── *_test.go             # Tests and benchmarks
├── .golangci.yml         # Linter configuration
├── CHANGELOG.md          # Version history
├── CONTRIBUTING.md       # This file
├── LICENSE               # MIT License
└── README.md             # Main documentation

Adding New Features

  1. Check if issue exists, if not create one
  2. Discuss approach in the issue
  3. Create feature branch from develop
  4. Implement feature with tests
  5. Update documentation
  6. Run quality checks (bash scripts/pre-release-check.sh)
  7. Create pull request to develop
  8. Wait for code review
  9. Address feedback
  10. Merge when approved

Code Style Guidelines

General Principles

  • Follow Go conventions and idioms
  • Write self-documenting code
  • Add comments for complex logic (especially SIMD/assembly code)
  • Keep functions small and focused
  • Use meaningful variable names
  • Optimize for clarity first, performance second (except in hot paths)

Naming Conventions

  • Public types/functions: PascalCase (e.g., Compile, Memchr)
  • Private types/functions: camelCase (e.g., epsilonClosure, selectPrefilter)
  • Constants: PascalCase with context prefix (e.g., UseNFA, UseDFA)
  • Test functions: Test* (e.g., TestLazyDFAFind)
  • Benchmark functions: Benchmark* (e.g., BenchmarkMemchr)
  • Example functions: Example* (e.g., ExampleCompile)

Error Handling

  • Always check and handle errors
  • Use descriptive error variables (ErrCacheFull, ErrInvalidPattern)
  • Return errors immediately, don't wrap unnecessarily
  • Validate inputs before processing

Testing

  • Use table-driven tests when appropriate
  • Test both success and error cases
  • Include test data in testdata/ if needed
  • Test edge cases (empty input, boundaries, alignment)
  • Add fuzz tests for parsers and matchers
  • Compare with stdlib regexp for correctness
  • Benchmarks are mandatory for performance-critical code

Aho-Corasick Implementation Guidelines

Core Algorithm

  • Trie construction followed by failure link computation
  • BFS traversal for failure links (ensures correct order)
  • Match propagation along failure links for overlapping patterns
  • O(n + m) time complexity (n = text length, m = total pattern length)

Memory Efficiency

  • Use ByteClasses for alphabet compression
  • Consider contiguous state layout for cache efficiency
  • Minimize allocations in hot paths

Performance Validation

// Benchmarks should include:
// - Various input sizes (1KB, 64KB, 1MB)
// - Different pattern counts (4, 16, 64, 256)
// - Memory allocation tracking
// - Throughput calculation (MB/s)

func BenchmarkFind(b *testing.B) {
    ac, _ := NewBuilder().
        AddStrings(patterns).
        Build()

    b.SetBytes(int64(len(haystack)))
    for i := 0; i < b.N; i++ {
        _ = ac.Find(haystack, 0)
    }
}

Getting Help

  • Check existing issues and discussions
  • Ask questions in GitHub Issues
  • See coregex for integration examples

Performance Expectations

  • Multi-pattern matching: O(n) regardless of pattern count
  • Throughput: 100+ MB/s for typical workloads
  • Zero allocations in IsMatch hot path
  • Comparable to Rust's aho-corasick crate

License

By contributing, you agree that your contributions will be licensed under the MIT License.


Thank you for contributing to ahocorasick!