Skip to content

crate/about

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About CrateDB

Status CI Coverage Downloads per month

License Release Notes PyPI Version Python Versions

» Documentation | Releases | Issues | Source code | License | CrateDB | Community Forum | Bluesky

A high-level description about CrateDB, with cross-references to relevant resources in the spirit of a curated knowledge backbone.

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is based on Lucene, inherits technologies from Elasticsearch, and is compatible with PostgreSQL.

What's inside

A workbench rig for information and knowledge management, aiming to compress content authoring and curation processes, nothing big.

Abstract

  • Structured documentation based on a basic and generic hierarchical outline.

  • Utility programs to parse YAML outline files and generate outputs (e.g., Markdown, llms-txt), supporting the authoring and production process.

  • Python API that offers selective access to documentation and knowledge resources by providing basic querying primitives to inquire elements from the outline tree.

Applied

  • The ask subcommand uses llms-txt context files to answer questions about a topic domain that would otherwise yield incomprehensible, incomplete, or weak responses.

  • The compact Python API can be used by a Model Context Protocol (MCP) documentation server to acquire information about the relevant topic domain on demand.

Concrete

  • The outline file cratedb-outline.yaml file indexes documents about what CrateDB is, what you can do with it, and how.

  • Context bundle files are published to the about/v1 folder. They can be used to provide better context for conversations about CrateDB, for example, by using the cratedb-about ask subcommand.

  • The documentation subsystem of the cratedb-mcp package uses the Python API to serve and consider relevant documentation resources within its data flow procedures. It selects relevant resources mostly based on the value of the description attribute of the outline data model.

Install

The authors recommend using the uv package manager. Alternative options are to use pipx or pip install --user.

From PyPI

uv tool install --upgrade 'cratedb-about[all]'

From Repository

uv tool install --upgrade 'cratedb-about[all] @ git+https://github.com/crate/about'

Usage

The cratedb-about package provides three subsystems.

  • Outline: Read and inquire outline files.
  • Bundle: Produce a context bundle from an outline file.
  • Query: Use context information for conversations with LLMs.

Outline

CLI

Convert knowledge outline from built-in cratedb-outline.yaml into Markdown format.

cratedb-about outline --format="markdown" > outline.md

Use the llms-txt format to directly generate llms-txt compatible output.

cratedb-about outline --format="llms-txt" > llms.txt

Use the --optional flag to include the "Optional" section for generating the llms-full.txt file.

cratedb-about outline --format="llms-txt" --optional > llms-full.txt

Use a custom outline file on a local or remote filesystem.

cratedb-about outline --url https://github.com/crate/about/raw/refs/heads/main/src/cratedb_about/outline/cratedb-outline.yaml

When using this option, you will need to minimally install the package including its manyio extra like cratedb-about[manyio]. After opting in, you can address resources on many filesystems through the excellent filesystem-spec package. Alternatively to the --url option, you can also use the ABOUT_OUTLINE_URL environment variable.

API

Use the Python API to retrieve individual sets of outline items, for example, by section name. The standard section names are: Docs, API, Examples, Optional.

from cratedb_about import CrateDbKnowledgeOutline

# Load information from the built-in YAML file.
outline = CrateDbKnowledgeOutline.load()

# Load information from a remote YAML file.
# outline = CrateDbKnowledgeOutline.load("http://example.org/outline.yaml")

# List available section names.
outline.get_section_names()

# Retrieve information about resources from the "Docs" and "Examples" sections.
outline.find_items(section_name="Docs").to_dict()
outline.find_items(section_name="Examples").to_dict()

# Convert outline into Markdown format.
outline.to_markdown()

# Convert outline into llms-txt format (medium).
outline.to_llms_txt()

# Convert outline into llms-txt format (full).
outline.to_llms_txt(optional=True)

Bundle

The Markdown file outline.md serves as the source for producing the llms.txt files. Generate multiple llms.txt files along with any auxiliary output files.

cratedb-about bundle --format=llm --outdir=./public_html

By default, the bundler will use the built-in cratedb-outline.yaml as input file. You can select an alternative input file using the --url option, or the ABOUT_OUTLINE_URL environment variable. The output directory can also be specified using the OUTDIR environment variable.

Query

Ask questions about CrateDB from the command line.

CLI

export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
cratedb-about ask "CrateDB does not seem to provide an AUTOINCREMENT feature?"

If you are running out of questions, get inspired by the standard library.

cratedb-about list-questions

API

Use the Python API to ask questions about CrateDB.

from cratedb_about import CrateDbKnowledgeConversation

knowledge = CrateDbKnowledgeConversation()
knowledge.ask("CrateDB does not seem to provide an AUTOINCREMENT feature?")

Notes

  • To configure a different context file, use the ABOUT_CONTEXT_URL environment variable. It can be a remote URL or a path on the local filesystem. The default value is https://cdn.crate.io/about/v1/llms-full.txt.
  • Remote resources will be cached for one hour (3600 seconds) to strike the balance between freshness and resource saving. Use the ABOUT_CACHE_TTL environment variable to reconfigure that value in seconds.

FAQ

  • Q: Seriously, how do I use this?

    A: As mentioned above, this repository includes content and a few utilities to manage corresponding information. Users will directly use the produced llms.txt and llms-full.txt files. Developers will install the cratedb-about package to access fundamental outline information in their own programs programmatically, or to invoke fragments of the production machinery on their premises, either ad hoc, or by including it in automated pipelines.

  • Q: It looks like the knowledge base machinery is missing important information about CrateDB. I've asked it about matters of polymer sharding, and the answer wasn't very insightful.

    A: Well, we can understand your disappointment. To improve the situation, we are constantly curating content, and you can support the process by giving us hints about which fragments of information to include in the set of curated information. To learn about what this means, see also ABOUT-24.

Project Information

Acknowledgements

Kudos to the authors of all the many software components and technologies this project is building upon.

Contributing

The cratedb-about package is an open source project, and is managed on GitHub. Contributions of any kind are welcome and appreciated.

Status

The software is in the pre-alpha (planning) stage. Version pinning is strongly recommended, especially if you use it as a library.

About

Information about CrateDB, for humans and machines.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages