» Documentation | Releases | Issues | Source code | License | CrateDB | Community Forum | Bluesky
A high-level description about CrateDB, with cross-references to relevant resources in the spirit of a curated knowledge backbone.
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is based on Lucene, inherits technologies from Elasticsearch, and is compatible with PostgreSQL.
A workbench rig for information and knowledge management, aiming to compress content authoring and curation processes, nothing big.
-
Structured documentation based on a basic and generic hierarchical outline.
-
Utility programs to parse YAML outline files and generate outputs (e.g., Markdown, llms-txt), supporting the authoring and production process.
-
Python API that offers selective access to documentation and knowledge resources by providing basic querying primitives to inquire elements from the outline tree.
-
The
ask
subcommand uses llms-txt context files to answer questions about a topic domain that would otherwise yield incomprehensible, incomplete, or weak responses. -
The compact Python API can be used by a Model Context Protocol (MCP) documentation server to acquire information about the relevant topic domain on demand.
-
The outline file cratedb-outline.yaml file indexes documents about what CrateDB is, what you can do with it, and how.
-
Context bundle files are published to the about/v1 folder. They can be used to provide better context for conversations about CrateDB, for example, by using the
cratedb-about ask
subcommand. -
The documentation subsystem of the cratedb-mcp package uses the Python API to serve and consider relevant documentation resources within its data flow procedures. It selects relevant resources mostly based on the value of the
description
attribute of the outline data model.
The authors recommend using the uv package manager. Alternative
options are to use pipx
or pip install --user
.
uv tool install --upgrade 'cratedb-about[all]'
uv tool install --upgrade 'cratedb-about[all] @ git+https://github.com/crate/about'
The cratedb-about
package provides three subsystems.
- Outline: Read and inquire outline files.
- Bundle: Produce a context bundle from an outline file.
- Query: Use context information for conversations with LLMs.
Convert knowledge outline from built-in cratedb-outline.yaml
into Markdown format.
cratedb-about outline --format="markdown" > outline.md
Use the llms-txt
format to directly generate llms-txt compatible output.
cratedb-about outline --format="llms-txt" > llms.txt
Use the --optional
flag to include the "Optional" section for
generating the llms-full.txt
file.
cratedb-about outline --format="llms-txt" --optional > llms-full.txt
Use a custom outline file on a local or remote filesystem.
cratedb-about outline --url https://github.com/crate/about/raw/refs/heads/main/src/cratedb_about/outline/cratedb-outline.yaml
When using this option, you will need to minimally install the package including
its manyio
extra like cratedb-about[manyio]
. After opting in, you can address
resources on many filesystems through the excellent filesystem-spec package.
Alternatively to the --url
option, you can also use the ABOUT_OUTLINE_URL
environment variable.
Use the Python API to retrieve individual sets of outline items, for example, by section name. The standard section names are: Docs, API, Examples, Optional.
from cratedb_about import CrateDbKnowledgeOutline
# Load information from the built-in YAML file.
outline = CrateDbKnowledgeOutline.load()
# Load information from a remote YAML file.
# outline = CrateDbKnowledgeOutline.load("http://example.org/outline.yaml")
# List available section names.
outline.get_section_names()
# Retrieve information about resources from the "Docs" and "Examples" sections.
outline.find_items(section_name="Docs").to_dict()
outline.find_items(section_name="Examples").to_dict()
# Convert outline into Markdown format.
outline.to_markdown()
# Convert outline into llms-txt format (medium).
outline.to_llms_txt()
# Convert outline into llms-txt format (full).
outline.to_llms_txt(optional=True)
The Markdown file outline.md
serves as the source for producing the
llms.txt
files. Generate multiple llms.txt
files along with any
auxiliary output files.
cratedb-about bundle --format=llm --outdir=./public_html
By default, the bundler will use the built-in cratedb-outline.yaml
as input file.
You can select an alternative input file using the --url
option, or the
ABOUT_OUTLINE_URL
environment variable. The output directory can
also be specified using the OUTDIR
environment variable.
Ask questions about CrateDB from the command line.
export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
cratedb-about ask "CrateDB does not seem to provide an AUTOINCREMENT feature?"
If you are running out of questions, get inspired by the standard library.
cratedb-about list-questions
Use the Python API to ask questions about CrateDB.
from cratedb_about import CrateDbKnowledgeConversation
knowledge = CrateDbKnowledgeConversation()
knowledge.ask("CrateDB does not seem to provide an AUTOINCREMENT feature?")
- To configure a different context file, use the
ABOUT_CONTEXT_URL
environment variable. It can be a remote URL or a path on the local filesystem. The default value is https://cdn.crate.io/about/v1/llms-full.txt. - Remote resources will be cached for one hour (3600 seconds) to strike the
balance between freshness and resource saving. Use the
ABOUT_CACHE_TTL
environment variable to reconfigure that value in seconds.
-
Q: Seriously, how do I use this?
A: As mentioned above, this repository includes content and a few utilities to manage corresponding information. Users will directly use the produced llms.txt and llms-full.txt files. Developers will install the cratedb-about package to access fundamental outline information in their own programs programmatically, or to invoke fragments of the production machinery on their premises, either ad hoc, or by including it in automated pipelines.
-
Q: It looks like the knowledge base machinery is missing important information about CrateDB. I've asked it about matters of polymer sharding, and the answer wasn't very insightful.
A: Well, we can understand your disappointment. To improve the situation, we are constantly curating content, and you can support the process by giving us hints about which fragments of information to include in the set of curated information. To learn about what this means, see also ABOUT-24.
Kudos to the authors of all the many software components and technologies this project is building upon.
The cratedb-about
package is an open source project, and is managed on
GitHub. Contributions of any kind are welcome and appreciated.
The software is in the pre-alpha (planning) stage. Version pinning is strongly recommended, especially if you use it as a library.