Skip to content

Latest commit

 

History

History
786 lines (620 loc) · 16.6 KB

File metadata and controls

786 lines (620 loc) · 16.6 KB

Refactoring Guidelines

Purpose: How to refactor large files and improve code structure
Audience: AI coding agents
Hard Limit: 2500 lines per file


When to Refactor

File Size Triggers

  • 2000 lines: Start planning refactoring
  • 2500 lines: MUST refactor before adding more code
  • 3000+ lines: Emergency - refactor immediately

Code Smell Triggers

  • Multiple unrelated responsibilities in one file
  • Repeated code patterns (DRY violations)
  • Difficult to test (too many dependencies)
  • Hard to understand (complex logic)
  • Frequent merge conflicts (hotspot file)

Refactoring Process

Step 1: Analyze Current Structure

Identify distinct responsibilities:

# Count lines per file
wc -l pkg/cypher/*.go | sort -n

# Find large files
find pkg -name "*.go" -exec wc -l {} \; | sort -n | tail -20

# Analyze file structure
grep "^func " pkg/cypher/executor.go | wc -l  # Count functions
grep "^type " pkg/cypher/executor.go | wc -l  # Count types

Example analysis of executor.go (3752 lines):

Functions by category:
- Query parsing: 15 functions (600 lines)
- Execution logic: 25 functions (1200 lines)
- Result formatting: 10 functions (400 lines)
- Caching: 8 functions (350 lines)
- Optimization: 12 functions (500 lines)
- Helpers: 20 functions (700 lines)

Conclusion: 6 distinct responsibilities → Split into 6 files

Step 2: Define Module Boundaries

Create clear separation of concerns:

executor.go (3752 lines) →

├── executor.go (800 lines)         # Core orchestration
│   - Execute() entry point
│   - Query routing
│   - Error handling
│   - Transaction management
│
├── parser.go (600 lines)           # Query parsing
│   - Parse Cypher queries
│   - Build AST
│   - Syntax validation
│
├── optimizer.go (500 lines)        # Query optimization
│   - Query planning
│   - Index selection
│   - Join ordering
│
├── cache.go (400 lines)            # Result caching
│   - Cache management
│   - Cache invalidation
│   - Cache statistics
│
├── formatter.go (300 lines)        # Result formatting
│   - Format results for Neo4j
│   - Type conversions
│   - Error formatting
│
└── helpers.go (700 lines)          # Shared utilities
    - Type conversions
    - Validation
    - Common operations

Step 3: Define Interfaces

Create clean contracts between modules:

// parser.go
type Parser interface {
    Parse(query string) (*AST, error)
    Validate(ast *AST) error
}

// optimizer.go
type Optimizer interface {
    Optimize(ast *AST) (*ExecutionPlan, error)
    EstimateCost(plan *ExecutionPlan) int64
}

// cache.go
type Cache interface {
    Get(key string) (interface{}, bool)
    Set(key string, value interface{})
    Invalidate(pattern string)
}

// formatter.go
type Formatter interface {
    FormatResult(rows [][]interface{}) (*Result, error)
    FormatError(err error) *Neo4jError
}

Step 4: Extract Modules

Move code to new files systematically:

// 1. Create new file: parser.go
package cypher

import (
    // ... imports
)

// Move parser-related types
type AST struct {
    // ... fields
}

type ParseError struct {
    // ... fields
}

// Move parser-related functions
func Parse(query string) (*AST, error) {
    // ... implementation
}

func validateSyntax(ast *AST) error {
    // ... implementation
}

// 2. Update executor.go to use new module
package cypher

import (
    // ... imports
)

type Executor struct {
    parser    Parser      // ← Use interface
    optimizer Optimizer   // ← Use interface
    cache     Cache       // ← Use interface
    formatter Formatter   // ← Use interface
}

func (e *Executor) Execute(ctx context.Context, query string) (*Result, error) {
    // Parse query
    ast, err := e.parser.Parse(query)
    if err != nil {
        return nil, e.formatter.FormatError(err)
    }
    
    // Optimize
    plan, err := e.optimizer.Optimize(ast)
    if err != nil {
        return nil, err
    }
    
    // Check cache
    if cached, ok := e.cache.Get(query); ok {
        return cached.(*Result), nil
    }
    
    // Execute plan
    rows, err := e.executePlan(ctx, plan)
    if err != nil {
        return nil, err
    }
    
    // Format result
    result, err := e.formatter.FormatResult(rows)
    if err != nil {
        return nil, err
    }
    
    // Cache result
    e.cache.Set(query, result)
    
    return result, nil
}

Step 5: Update Tests

Ensure all tests still pass:

# Run tests after each extraction
go test ./pkg/cypher -v

# Check coverage hasn't decreased
go test ./pkg/cypher -coverprofile=coverage.out
go tool cover -func=coverage.out

# Run race detector
go test ./pkg/cypher -race

Add tests for new interfaces:

// parser_test.go
func TestParser(t *testing.T) {
    parser := NewParser()
    
    t.Run("valid query", func(t *testing.T) {
        ast, err := parser.Parse("MATCH (n) RETURN n")
        assert.NoError(t, err)
        assert.NotNil(t, ast)
    })
    
    t.Run("syntax error", func(t *testing.T) {
        _, err := parser.Parse("INVALID QUERY")
        assert.Error(t, err)
    })
}

Step 6: Update Documentation

Document new module structure:

// Package cypher provides Neo4j-compatible Cypher query execution.
//
// The package is organized into several modules:
//
// **Core Execution** (executor.go):
//   - Query routing and orchestration
//   - Transaction management
//   - Error handling
//
// **Query Parsing** (parser.go):
//   - Cypher query parsing
//   - AST construction
//   - Syntax validation
//
// **Query Optimization** (optimizer.go):
//   - Query planning
//   - Index selection
//   - Cost estimation
//
// **Result Caching** (cache.go):
//   - Query result caching
//   - Cache invalidation
//   - Statistics tracking
//
// **Result Formatting** (formatter.go):
//   - Neo4j-compatible result formatting
//   - Type conversions
//   - Error formatting
//
// Example Usage:
//
//	executor := cypher.NewExecutor(storage)
//	result, err := executor.Execute(ctx, "MATCH (n) RETURN n", nil)
package cypher

Real-World Example

Before: executor.go (3752 lines)

// pkg/cypher/executor.go (3752 lines) ❌

package cypher

// Parsing types and functions (600 lines)
type AST struct { /* ... */ }
func Parse(query string) (*AST, error) { /* ... */ }
func validateSyntax(ast *AST) error { /* ... */ }
// ... 15 more parsing functions

// Execution types and functions (1200 lines)
type Executor struct { /* ... */ }
func (e *Executor) Execute(ctx context.Context, query string) (*Result, error) { /* ... */ }
func (e *Executor) executeMatch(ctx context.Context, match *Match) ([][]interface{}, error) { /* ... */ }
// ... 25 more execution functions

// Caching types and functions (350 lines)
type QueryCache struct { /* ... */ }
func (c *QueryCache) Get(key string) (interface{}, bool) { /* ... */ }
// ... 8 more caching functions

// Optimization types and functions (500 lines)
type QueryPlan struct { /* ... */ }
func optimizeQuery(ast *AST) (*QueryPlan, error) { /* ... */ }
// ... 12 more optimization functions

// Formatting types and functions (400 lines)
type ResultFormatter struct { /* ... */ }
func formatResult(rows [][]interface{}) (*Result, error) { /* ... */ }
// ... 10 more formatting functions

// Helper functions (700 lines)
func toInt64(v interface{}) (int64, error) { /* ... */ }
func toString(v interface{}) (string, error) { /* ... */ }
// ... 20 more helper functions

After: Split into focused modules ✅

// pkg/cypher/executor.go (800 lines) ✅
package cypher

// Executor orchestrates query execution using injected dependencies
type Executor struct {
    storage   storage.Engine
    parser    Parser
    optimizer Optimizer
    cache     Cache
    formatter Formatter
}

func NewExecutor(storage storage.Engine) *Executor {
    return &Executor{
        storage:   storage,
        parser:    NewParser(),
        optimizer: NewOptimizer(),
        cache:     NewQueryCache(),
        formatter: NewFormatter(),
    }
}

func (e *Executor) Execute(ctx context.Context, query string, params map[string]interface{}) (*Result, error) {
    // Parse
    ast, err := e.parser.Parse(query)
    if err != nil {
        return nil, err
    }
    
    // Optimize
    plan, err := e.optimizer.Optimize(ast)
    if err != nil {
        return nil, err
    }
    
    // Check cache
    cacheKey := e.cache.Key(query, params)
    if cached, ok := e.cache.Get(cacheKey); ok {
        return cached.(*Result), nil
    }
    
    // Execute
    rows, err := e.executePlan(ctx, plan, params)
    if err != nil {
        return nil, err
    }
    
    // Format
    result, err := e.formatter.FormatResult(rows)
    if err != nil {
        return nil, err
    }
    
    // Cache
    e.cache.Set(cacheKey, result)
    
    return result, nil
}

// ... core execution methods (25 functions, 700 lines)
// pkg/cypher/parser.go (600 lines) ✅
package cypher

// Parser parses Cypher queries into AST
type Parser interface {
    Parse(query string) (*AST, error)
    Validate(ast *AST) error
}

type parser struct {
    // ... fields
}

func NewParser() Parser {
    return &parser{}
}

// Parse converts Cypher query string to AST
func (p *parser) Parse(query string) (*AST, error) {
    // ... implementation
}

// ... parsing functions (15 functions, 600 lines)
// pkg/cypher/optimizer.go (500 lines) ✅
package cypher

// Optimizer optimizes query execution plans
type Optimizer interface {
    Optimize(ast *AST) (*ExecutionPlan, error)
    EstimateCost(plan *ExecutionPlan) int64
}

type optimizer struct {
    // ... fields
}

func NewOptimizer() Optimizer {
    return &optimizer{}
}

// ... optimization functions (12 functions, 500 lines)
// pkg/cypher/cache.go (400 lines) ✅
package cypher

// Cache manages query result caching
type Cache interface {
    Get(key string) (interface{}, bool)
    Set(key string, value interface{})
    Invalidate(pattern string)
    Stats() CacheStats
}

type queryCache struct {
    // ... fields
}

func NewQueryCache() Cache {
    return &queryCache{
        data: make(map[string]interface{}),
    }
}

// ... caching functions (8 functions, 400 lines)
// pkg/cypher/formatter.go (300 lines) ✅
package cypher

// Formatter formats query results for Neo4j compatibility
type Formatter interface {
    FormatResult(rows [][]interface{}) (*Result, error)
    FormatError(err error) *Neo4jError
}

type formatter struct {
    // ... fields
}

func NewFormatter() Formatter {
    return &formatter{}
}

// ... formatting functions (10 functions, 300 lines)
// pkg/cypher/helpers.go (700 lines) ✅
package cypher

// Shared utility functions used across modules

// toInt64 safely converts interface{} to int64
func toInt64(v interface{}) (int64, error) {
    // ... implementation
}

// toString safely converts interface{} to string
func toString(v interface{}) (string, error) {
    // ... implementation
}

// ... helper functions (20 functions, 700 lines)

Refactoring Patterns

Pattern 1: Extract Interface

Before:

type Executor struct {
    storage *storage.MemoryEngine  // ← Concrete type
}

After:

type Executor struct {
    storage storage.Engine  // ← Interface
}

Benefits:

  • Testability (can inject mocks)
  • Flexibility (can swap implementations)
  • Decoupling (no concrete dependencies)

Pattern 2: Extract Helper Package

Before:

// pkg/cypher/executor.go
func toInt64(v interface{}) (int64, error) { /* ... */ }
func toString(v interface{}) (string, error) { /* ... */ }
// ... used in multiple packages

After:

// pkg/convert/convert.go
package convert

func ToInt64(v interface{}) (int64, error) { /* ... */ }
func ToString(v interface{}) (string, error) { /* ... */ }

// pkg/cypher/executor.go
import "github.com/orneryd/nornicdb/pkg/convert"

value, err := convert.ToInt64(input)

Benefits:

  • Reusability across packages
  • Single source of truth
  • Easier to test

Pattern 3: Extract Configuration

Before:

type Executor struct {
    cacheSize      int
    cacheTimeout   time.Duration
    maxConcurrency int
    enableMetrics  bool
    // ... 20 more config fields
}

After:

// pkg/cypher/config.go
type Config struct {
    Cache struct {
        Size    int
        Timeout time.Duration
    }
    Execution struct {
        MaxConcurrency int
        EnableMetrics  bool
    }
}

// pkg/cypher/executor.go
type Executor struct {
    config Config
}

Benefits:

  • Grouped related settings
  • Easier to validate
  • Cleaner initialization

Pattern 4: Extract Strategy Pattern

Before:

func (e *Executor) execute(query string) (*Result, error) {
    if isSimpleQuery(query) {
        return e.executeSimple(query)
    } else if isComplexQuery(query) {
        return e.executeComplex(query)
    } else if isAggregationQuery(query) {
        return e.executeAggregation(query)
    }
    // ... many more conditions
}

After:

// pkg/cypher/strategy.go
type ExecutionStrategy interface {
    CanHandle(query *AST) bool
    Execute(ctx context.Context, query *AST) (*Result, error)
}

type SimpleStrategy struct { /* ... */ }
type ComplexStrategy struct { /* ... */ }
type AggregationStrategy struct { /* ... */ }

// pkg/cypher/executor.go
type Executor struct {
    strategies []ExecutionStrategy
}

func (e *Executor) execute(ctx context.Context, ast *AST) (*Result, error) {
    for _, strategy := range e.strategies {
        if strategy.CanHandle(ast) {
            return strategy.Execute(ctx, ast)
        }
    }
    return nil, errors.New("no strategy found")
}

Benefits:

  • Open/closed principle
  • Easy to add new strategies
  • Each strategy independently testable

Refactoring Checklist

Before refactoring:

  • Analyze file structure and responsibilities
  • Identify clear module boundaries
  • Define interfaces between modules
  • Plan extraction order (least dependent first)
  • Ensure full test coverage exists

During refactoring:

  • Extract one module at a time
  • Run tests after each extraction
  • Maintain or improve test coverage
  • Update documentation
  • Keep commits small and focused

After refactoring:

  • All tests pass
  • Coverage ≥90% maintained
  • No performance regression
  • Documentation updated
  • Code review completed

Common Pitfalls

Pitfall 1: Over-Abstraction

Bad:

// Too many layers of abstraction
type QueryExecutorFactoryBuilder interface {
    Build() QueryExecutorFactory
}

type QueryExecutorFactory interface {
    Create() QueryExecutor
}

type QueryExecutor interface {
    Execute(query string) (*Result, error)
}

Good:

// Simple and direct
type Executor interface {
    Execute(ctx context.Context, query string) (*Result, error)
}

Pitfall 2: Circular Dependencies

Bad:

// pkg/cypher/executor.go
import "github.com/orneryd/nornicdb/pkg/cypher/parser"

// pkg/cypher/parser/parser.go
import "github.com/orneryd/nornicdb/pkg/cypher"  // ← Circular!

Good:

// pkg/cypher/executor.go
import "github.com/orneryd/nornicdb/pkg/cypher/parser"

// pkg/cypher/parser/parser.go
// No import of parent package

Pitfall 3: Breaking Public API

Bad:

// BEFORE (public API)
func Execute(query string) (*Result, error)

// AFTER (breaking change) ❌
func ExecuteQuery(ctx context.Context, q string) (*QueryResult, error)

Good:

// BEFORE (public API)
func Execute(query string) (*Result, error)

// AFTER (backward compatible) ✅
func Execute(query string) (*Result, error) {
    return ExecuteContext(context.Background(), query)
}

func ExecuteContext(ctx context.Context, query string) (*Result, error) {
    // New implementation
}

Quick Reference

File Size Check

# Find files over 2000 lines
find pkg -name "*.go" -exec sh -c 'wc -l "$1" | awk "\$1 > 2000 {print}"' _ {} \;

# Count functions per file
for f in pkg/cypher/*.go; do 
    echo "$f: $(grep -c '^func ' $f) functions"
done

Refactoring Commands

# Before refactoring: Save baseline
go test ./... -coverprofile=before.out
go test ./pkg/cypher -bench=. -benchmem > before-bench.txt

# After refactoring: Compare
go test ./... -coverprofile=after.out
go test ./pkg/cypher -bench=. -benchmem > after-bench.txt

# Compare coverage
go tool cover -func=before.out | tail -1
go tool cover -func=after.out | tail -1

# Compare benchmarks
benchcmp before-bench.txt after-bench.txt

Remember: Refactoring is about improving structure without changing behavior. Always have tests to verify behavior is preserved.