Skip to content

Releases: yohasebe/ruby-spacy

v0.4.0

23 Feb 08:38

Choose a tag to compare

What's New

Block-based OpenAI API

  • Language#with_openai: A new block-based API for OpenAI integration. Yields an OpenAIHelper instance that is configured once and reused for all calls within the block, making it efficient for batch processing with pipe.
  • Doc#linguistic_summary: Generates a JSON summary of spaCy's linguistic analysis (tokens, entities, noun chunks, sentences) that can be passed directly to LLMs as context. Sections and token attributes are fully customizable.
  • OpenAIHelper#chat: Convenient system:/user: shortcuts for building messages, with raw: option for full API response access.
  • OpenAIHelper#embeddings: Standalone embeddings method that accepts text directly.

Quality Improvements

  • Bug fix: Doc#ents now returns proper Span objects instead of raw Python objects
  • Security: Model name validation in Language#initialize to prevent injection
  • OpenAI: Temperature handling for o-series models, 429 retry with exponential backoff, client reuse, dimensions/response_format parameters, tool call depth limit
  • Code quality: Unified map usage across 17 methods, improved respond_to_missing? with py_hasattr?, optimized Doc#initialize retry loop

New Features

  • Token#idx for character offset access
  • Span#to_s for string representation
  • Language#memory_zone for spaCy 3.8+ memory management
  • PhraseMatcher support via Language#phrase_matcher
  • instance_variables_to_inspect for Ruby 4.0+ compatibility

Dependencies

  • Added base64 gem (Ruby 3.4+)
  • Added fiddle gem (Ruby 4.0+)

Tests

  • 24 new tests added (81 total)
  • All passing on Ruby 3.4.6 and Ruby 4.0.1

Example

nlp = Spacy::Language.new("en_core_web_sm")
texts = ["The bank approved the loan.", "I sat on the river bank."]

nlp.with_openai(model: "gpt-5-mini") do |ai|
  nlp.pipe(texts).each do |doc|
    result = ai.chat(
      system: "Analyze using the linguistic data.",
      user: doc.linguistic_summary
    )
    puts result
  end
end