Skip to content

soujanyanmbri/turbo-graphql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Turbo-GraphQL

A high-performance GraphQL parser written in C++ with SIMD-accelerated lexing and a complete recursive descent parser.

Features

Performance

  • SIMD-Accelerated Lexer: AVX2/SSE4.2 optimized tokenization with 3-5x speedup
  • Fast Parsing: Complete GraphQL query parsing in microseconds
  • Zero-Copy Design: Efficient memory management with token arenas
  • Auto-Detection: Automatic CPU feature detection (AVX512, AVX2, SSE4.2, SSE2, NEON, Scalar)

Complete GraphQL Support

  • Operations: Query, Mutation, Subscription
  • Variables: Type definitions with non-null modifiers ($userId: ID!)
  • Arguments: Named arguments with all value types
  • Directives: @include, @skip, custom directives
  • Fragments: Named fragments and inline fragments (... on Type)
  • Selection Sets: Nested field selections with aliases
  • Value Types: Int, Float, String, Boolean, Null, Enum, List, Object
  • Comments: Single-line (#) and block comments (/* */)
  • String Types: Regular strings with escapes and block strings ("""...""")
  • Numbers: Integers, floats, scientific notation, negative numbers

Robust Error Handling

  • Graceful error recovery
  • Detailed error messages with position information
  • Infinite loop protection
  • Detection of unterminated strings/comments

Quick Start

Build

# Configure the project
cmake -B build

# Build the project
cmake --build build

# Run the parser with sample query
./build/graphql_parser

# Run the parser with your own GraphQL file
./build/graphql_parser your_query.graphql

# Run performance benchmark
./build/benchmark

# Run tests
./build/graphql_tests

Example Usage

Input Query:

query GetUser($userId: ID!) {
  user(id: $userId) {
    name
    email
    posts @include(if: true) {
      title
      content
    }
  }
}

Parsed Output:

[0] QUERY GetUser($userId: ID!) {
  Field: user(id: $userId)
    Field: name
    Field: email
    Field: posts @include(...)
      Field: title
      Field: content
}

📊 Performance

Real-world benchmarks on AVX2-capable CPU:

Lexer Performance (SIMD vs Scalar)

Query Size Avg Time Throughput Tokens Speedup vs Scalar*
26 bytes 2.1 µs 11 MB/s 8 ~1.5x
122 bytes 7.2 µs 16 MB/s 25 ~2x
406 bytes 17.7 µs 21 MB/s 61 ~3x
1.5 KB 46.6 µs 32 MB/s 167 ~4x
8 KB 445 µs 17 MB/s 1531 ~5x

*Based on test4.cpp reference: 5.23x average speedup with SIMD

End-to-End Performance (Lexing + Parsing)

Input Size Tokens Lexing Parsing Total Throughput
141 bytes 31 16 µs 63 µs 79 µs 1.70 MB/s
303 bytes 62 30 µs 104 µs 134 µs 2.16 MB/s

Key Insights:

  • SIMD overhead is negligible for queries >100 bytes
  • Throughput increases with input size (better SIMD utilization)
  • Production GraphQL queries (500+ bytes) see 3-5x performance gains
  • End-to-end parsing remains fast even with complex AST construction

🏗️ Architecture

Components

turbo-graphql/
├── include/
│   ├── ast/              # AST node definitions
│   ├── lexer/            # Tokenization
│   │   ├── lexer.h       # Main tokenizer
│   │   ├── character_classifier.h  # Fast char classification
│   │   └── keyword_classifier.h    # Keyword detection
│   ├── parser/           # Recursive descent parser
│   └── simd/             # SIMD implementations
│       ├── simd_detect.h    # CPU feature detection
│       ├── simd_factory.h   # Auto-select best SIMD
│       └── impl/            # AVX2, SSE, Scalar implementations
├── src/                  # Implementation files
└── tests/                # Unit tests

SIMD Strategy

The lexer uses SIMD intrinsics to process text in 32-byte chunks:

  1. Whitespace Skipping: Vectorized detection of spaces, tabs, newlines
  2. Identifier Scanning: Parallel character classification
  3. Number Parsing: SIMD range checks for digits
  4. String Processing: Fast escape sequence detection

Automatically falls back to scalar implementation when SIMD is unavailable.

🐛 Bug Fixes & Improvements

Recent Fixes

  1. Whitespace SIMD Loop: Fixed to correctly process multiple 32-byte chunks
  2. Block Comment Boundaries: Correctly handles */ at chunk boundaries
  3. Number Parsing: Added support for negative numbers and scientific notation
  4. String Handling: Implemented block strings ("""...""") and improved escape tracking
  5. Error Detection: Detects unterminated strings and comments
  6. Keyword Classification: Fixed id, int, float, string, boolean to be treated as identifiers, not keywords
  7. Parser Stability: Added infinite loop protection and graceful error recovery

Completed

  • ✅ SIMD-accelerated lexer with AVX2/SSE support
  • ✅ Complete recursive descent parser
  • ✅ Full GraphQL specification support
  • ✅ AST generation and visualization
  • ✅ Comprehensive error handling
  • ✅ Performance benchmarking

In Progress

  • 🚧 Query caching (LRU cache for repeated queries)
  • 🚧 String interning for memory optimization

Planned

  • Schema validation
  • Query execution engine
  • Type system implementation
  • Introspection support
  • Federation support
  • Subscription handling

🧪 Testing

Run the test suite:

./build/graphql_tests

Test with sample queries:

# Simple query
echo '{ user { id name } }' > /tmp/query.graphql
./build/graphql_parser /tmp/query.graphql

# Complex query with variables and fragments
./build/graphql_parser test_simple.graphql

Performance matters. Turbo-GraphQL brings SIMD acceleration to GraphQL parsing.

About

Experimental Graphql Parser for CPP.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published