Skip to content

Latest commit

 

History

History
290 lines (223 loc) · 5.87 KB

File metadata and controls

290 lines (223 loc) · 5.87 KB

Contents

The contents endpoint retrieves full page content, summaries, and highlights for a list of URLs. Results are returned from cache when available, with live crawling as a fallback.

Basic Usage

request = Exa::ContentsRequest.new( api_key: ENV[ 'EXA_API_KEY' ] )

urls = [ 'https://example.com/page1', 'https://example.com/page2' ]
response = request.submit( urls )

if response.success?
  response.result.each do | result |
    puts result.title
    puts result.text
  end
end

Options

Options can be passed as a hash or built using the DSL:

# using DSL
options = Exa::ContentsOptions.build do
  text { max_characters 2000 }
  highlights { num_sentences 3 }
end

# using hash
options = {
  text: { max_characters: 2000 },
  highlights: { num_sentences: 3 }
}

response = request.submit( urls, options )

Text Retrieval

options = Exa::ContentsOptions.build do
  text do
    max_characters 5000
    include_html_tags false
  end
end
Option Type Description
max_characters integer Maximum characters to retrieve
include_html_tags boolean Include HTML structure markers

Highlights

options = Exa::ContentsOptions.build do
  highlights do
    num_sentences 3
    highlights_per_url 5
    query 'Focus on technical implementation'
  end
end
Option Type Description
num_sentences integer Sentences per snippet (min: 1)
highlights_per_url integer Snippets per result (min: 1)
query string Custom direction for LLM selection

Summary

options = Exa::ContentsOptions.build do
  summary do
    query 'Summarize the key technical concepts'
  end
end
Option Type Description
query string Custom summarization directive

Crawling Control

options = Exa::ContentsOptions.build do
  livecrawl :fallback
  livecrawl_timeout 15000
end
Option Type Default Description
livecrawl symbol :fallback :never, :fallback, or :always
livecrawl_timeout integer 10000 Timeout in milliseconds

Livecrawl Modes

Mode Description
:never Only return cached content
:fallback Use cache if available, crawl if not
:always Always fetch fresh content

Subpages

options = Exa::ContentsOptions.build do
  subpages 5
  subpage_target 'documentation'
end
Option Type Description
subpages integer Number of subpages to crawl
subpage_target string Term for targeting specific subpages

Response

When the request succeeds, response.result is a ContentsResult object.

Result Accessors

Accessor Type Description
request_id string Unique request identifier
results array Array of ContentsResultItem objects

ContentsResultItem Accessors

Accessor Type Description
id string Unique result identifier
url string The URL that was fetched
title string Page title
author string Author name
text string Page text (if requested)
highlights array Highlight snippets (if requested)
highlight_scores array Scores for each highlight
summary string Summary text (if requested)
image string Image URL
favicon string Favicon URL

Success Check

response = Exa.contents( urls, options )

if response.success?
  result = response.result
  puts "Retrieved #{ result.count } pages"
  result.each do | item |
    puts "#{ item.title }: #{ item.text[ 0, 100 ] }..."
  end
end

Examples

Basic Content Retrieval

Exa.api_key ENV[ 'EXA_API_KEY' ]

urls = [
  'https://ruby-doc.org/core/Array.html',
  'https://ruby-doc.org/core/Hash.html'
]

response = Exa.contents( urls )

if response.success?
  response.result.each do | result |
    puts result.title
    puts result.text[ 0, 500 ]
    puts
  end
end

Content with Highlights

options = Exa::ContentsOptions.build do
  highlights do
    num_sentences 2
    highlights_per_url 3
    query 'Key methods and usage examples'
  end
end

response = Exa.contents( urls, options )

if response.success?
  response.result.each do | result |
    puts "=" * 60
    puts result.title
    puts "-" * 60
    result.highlights&.each do | highlight |
      puts "- #{ highlight }"
    end
    puts
  end
end

Content with Summaries

options = Exa::ContentsOptions.build do
  summary do
    query 'Summarize the main functionality and common use cases'
  end
end

response = Exa.contents( urls, options )

if response.success?
  response.result.each do | result |
    puts result.title
    puts result.summary
    puts
  end
end

Fresh Content with Live Crawling

options = Exa::ContentsOptions.build do
  livecrawl :always
  livecrawl_timeout 20000
  text { max_characters 10000 }
end

response = Exa.contents( urls, options )

Combined with Search

A common pattern is to search first, then retrieve full content for specific results:

# First, search for relevant pages
search_response = Exa.search( 'Ruby concurrency patterns', {
  num_results: 5
} )

if search_response.success?
  # Extract URLs from search results
  urls = search_response.result.map( &:url )

  # Fetch full content for those URLs
  contents_options = Exa::ContentsOptions.build do
    text { max_characters 5000 }
    summary { query 'Summarize the concurrency approach' }
  end

  contents_response = Exa.contents( urls, contents_options )

  if contents_response.success?
    contents_response.result.each do | result |
      puts result.title
      puts result.summary
      puts
    end
  end
end