Skip to content

Latest commit

 

History

History
250 lines (193 loc) · 5.24 KB

File metadata and controls

250 lines (193 loc) · 5.24 KB

Find Similar

The find similar endpoint finds pages that are similar to a given URL. This is useful for discovering related content, finding alternative sources, or expanding research.

Basic Usage

request = Exa::FindSimilarRequest.new( api_key: ENV[ 'EXA_API_KEY' ] )
response = request.submit( 'https://www.ruby-lang.org/en/' )

if response.success?
  response.result.each do | result |
    puts result.title
    puts result.url
  end
end

Options

Options can be passed as a hash or built using the DSL:

# using DSL
options = Exa::FindSimilarOptions.build do
  num_results 10
  exclude_source_domain true
end

# using hash
options = { num_results: 10, exclude_source_domain: true }

response = request.submit( url, options )

Result Control

options = Exa::FindSimilarOptions.build do
  num_results 20
  exclude_source_domain true
end
Option Type Default Description
num_results integer 10 Number of results to return (1-100)
exclude_source_domain boolean false Exclude results from the source URL's domain

Domain Filtering

options = Exa::FindSimilarOptions.build do
  include_domains [ 'github.com', 'gitlab.com' ]
  exclude_domains [ 'pinterest.com' ]
end
Option Type Description
include_domains array Only return results from these domains
exclude_domains array Exclude results from these domains

Date Filtering

options = Exa::FindSimilarOptions.build do
  start_published_date '2024-01-01T00:00:00Z'
  end_published_date '2024-12-31T23:59:59Z'
  start_crawl_date '2024-06-01T00:00:00Z'
  end_crawl_date '2024-12-31T23:59:59Z'
end
Option Type Description
start_published_date string Start of published date range (ISO 8601)
end_published_date string End of published date range (ISO 8601)
start_crawl_date string Start of crawl date range (ISO 8601)
end_crawl_date string End of crawl date range (ISO 8601)

Content Filtering

options = Exa::FindSimilarOptions.build do
  include_text [ 'programming' ]
  exclude_text [ 'deprecated' ]
end
Option Type Description
include_text array Required text in results
exclude_text array Forbidden text in results

Content Retrieval

The contents block controls what content is retrieved with results.

Text Options

options = Exa::FindSimilarOptions.build do
  contents do
    text do
      max_characters 1000
      include_html_tags false
    end
  end
end

Highlights Options

options = Exa::FindSimilarOptions.build do
  contents do
    highlights do
      num_sentences 3
      highlights_per_url 5
      query 'Focus on key features'
    end
  end
end

Summary Options

options = Exa::FindSimilarOptions.build do
  contents do
    summary do
      query 'Summarize the main topic'
    end
  end
end

Crawling Options

options = Exa::FindSimilarOptions.build do
  contents do
    livecrawl :fallback
    livecrawl_timeout 10000
  end
end

Response

When the request succeeds, response.result is a FindSimilarResult object.

Result Accessors

Accessor Type Description
request_id string Unique request identifier
results array Array of FindSimilarResultItem objects

FindSimilarResultItem Accessors

Accessor Type Description
id string Unique result identifier
url string The URL of the result
title string Page title
score float Similarity score
published_date string Publication date (ISO 8601)
author string Author name
text string Page text (if requested)
highlights array Highlight snippets (if requested)
highlight_scores array Scores for each highlight
summary string Summary text (if requested)
image string Image URL
favicon string Favicon URL

Examples

Find Similar Without Source Domain

Exa.api_key ENV[ 'EXA_API_KEY' ]

options = Exa::FindSimilarOptions.build do
  num_results 10
  exclude_source_domain true
end

response = Exa.find_similar( 'https://rubyonrails.org/', options )

if response.success?
  response.result.each do | result |
    puts "#{ result.title } - #{ result.url }"
  end
end

Find Similar with Summaries

options = Exa::FindSimilarOptions.build do
  num_results 5
  exclude_source_domain true
  contents do
    summary { query 'Describe what this resource offers' }
  end
end

response = Exa.find_similar( 'https://www.ruby-lang.org/en/', options )

if response.success?
  response.result.each do | result |
    puts "=" * 60
    puts result.title
    puts result.url
    puts "-" * 60
    puts result.summary
    puts
  end
end

Find Similar from Specific Domains

options = Exa::FindSimilarOptions.build do
  num_results 10
  include_domains [ 'github.com', 'gitlab.com' ]
  contents do
    text { max_characters 500 }
  end
end

response = Exa.find_similar( 'https://github.com/rails/rails', options )