ragcat

ragcat builds retrieval-augmented generation (RAG) stores from U.S. Fish and Wildlife Service ServCat references, downloaded ServCat files, already-downloaded local source files, and optional webpage URLs.

The package wraps a ServCat-first evidence workflow:

Search ServCat with user-supplied Quick Search terms or explicit reference IDs.
Download files attached to matching ServCat references.
Convert downloaded ServCat files, local files, and webpage URLs to Markdown with ragnar::read_as_markdown().
Screen extracted file text with user-supplied screening terms.
Build a DuckDB-backed ragnar store from verified sources.
Retrieve context or ask questions through ellmer, with structured answer exports.

Installation

You can install the development version of ragcat from GitHub with:

# install.packages("pak")
pak::pak("USFWS/ragcat")

Then load the package:

library(ragcat)

Minimal RAG store build

library(ragcat)

result <- build_rag_store(
  topic = "example_project",
  search_terms = c("watershed assessment", "habitat survey"),
  screening_terms = c("habitat", "survey", "stream flow", "water quality"),
  local_file_path = "data/local_sources",
  urls = c(
    "https://example.org/example-source-page"
  ),
  store_dir = "data/rag_store",
  store_location = "data/rag_store/rag_store.duckdb",
  embedding = "none",
  secure = FALSE,
  overwrite_store = TRUE,
  verbose = TRUE
)

Use embedding = "none" for a minimal no-credentials example. For vector retrieval, use one of the supported embedding backends, such as embedding = "azure-openai-small", embedding = "openai-small", or embedding = "ollama-default".

Build from local files and webpage URLs only

If you do not want to search ServCat, omit search_terms or set it to character(). You can still build a store from local files and webpage URLs.

result <- build_rag_store(
  topic = "local_sources_example",
  local_file_path = "data/local_sources",
  urls = c("https://example.org/example-source-page"),
  screening_terms = c("habitat", "survey"),
  store_dir = "data/rag_store",
  store_location = "data/rag_store/rag_store.duckdb",
  embedding = "none"
)

Build from explicit ServCat reference IDs

If you already know the ServCat references you want to use, provide them with reference_ids.

result <- build_rag_store(
  topic = "reference_id_example",
  reference_ids = c(12345, 67890),
  screening_terms = c("habitat", "survey"),
  local_file_path = "data/local_sources",
  store_dir = "data/rag_store",
  store_location = "data/rag_store/rag_store.duckdb",
  embedding = "none",
  secure = FALSE,
  overwrite_store = TRUE
)

Ask a question against a built store

response <- ask_rag_store(
  query = "Create an annotated bibliography by source for the available evidence.",
  topic = "example_project",
  store_location = "data/rag_store/rag_store.duckdb",
  prompt_file = "prompts/system_prompt.md",
  output_instructions_file = "prompts/output_instructions.md",
  top_k = 12,
  save_outputs = TRUE,
  output_dir = "results"
)

cat(response$summary)
response$structured
response$saved

Retrieve context without asking an LLM

chunks <- retrieve_rag_context(
  query = "What evidence is available for the project question?",
  store_location = "data/rag_store/rag_store.duckdb",
  top_k = 12
)

format_retrieved_context(chunks)

Prompt and output-instruction files

ask_rag_store() can use Markdown or text files for the system prompt and answer-formatting instructions.

response <- ask_rag_store(
  query = "Summarize the strongest and weakest evidence.",
  store_location = "data/rag_store/rag_store.duckdb",
  prompt_file = "prompts/system_prompt.md",
  output_instructions_file = "prompts/output_instructions.md"
)

If no prompt file is supplied, ragcat uses the package default system prompt, or a saved system_prompt.txt beside the store when available.

Getting help

Contact a project maintainer for help with this repository.

Contribute

Contact the project maintainer for information about contributing to this repository.

Submit a GitHub Issue to report a bug or request a feature or enhancement.

This work is licensed under a Creative Commons Zero Universal v1.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ragcat

Installation

Minimal RAG store build

Build from local files and webpage URLs only

Build from explicit ServCat reference IDs

Ask a question against a built store

Retrieve context without asking an LLM

Prompt and output-instruction files

Getting help

Contribute

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ragcat

Installation

Minimal RAG store build

Build from local files and webpage URLs only

Build from explicit ServCat reference IDs

Ask a question against a built store

Retrieve context without asking an LLM

Prompt and output-instruction files

Getting help

Contribute