Skip to content

ulbmuenster/dataasee

Repository files navigation

DatAasee Logo DatAasee (0.9)

DatAasee centralizes and interlinks distributed library / research metadata into an API‑first union catalog.

DatAasee data flow schematic

A Metadata-Lake for Libraries

Licenses: MIT (add. CC-BY for openapi.yaml)

Function: Metadata-Lake, Metadata Catalog, Metadata Aggregator, Union Catalog

Audience: University Libraries, Research Libraries, Academic Libraries, Scientific Libraries

DatAasee is currently in pilot stage and not production-ready yet.

Documentation

Getting Started (Test Deployment)

Quick Start (Prepare a dedicated directory, inside run:)

$ wget https://raw.githubusercontent.com/ulbmuenster/dataasee/0.9/compose.yaml
$ mkdir -p -m 777 backup
$  DL_PASS=password1 DB_PASS=password2 docker compose up

The DL_PASS environment variable passes the password for the admin user of DatAasee which is required for the POST HTTP-API endpoints. The DB_PASS environment variable passes the database root password used by the back-end.

Web: http://localhost:8000 (API: http://localhost:8343/api/v1/ )

  • Depends on docker compose (>=2.37), and is compatible with docker and podman.
  • To deploy, no need to clone, just use the compose.yaml file.
  • See the Deploy Documentation for details.

API Cheat Sheet

Tech Stack Canvas

  • Setting: Many distributed data and metadata sources
  • Goals:
    • Centralize metadata
    • Interlinked metadata catalog
    • Super-index for bibliographic and research data
  • Features:
    • Interact through HTTP API (JSON)
    • Search by filter/facet, full-text, ingest-source, DOI
    • Custom queries via: SQL, OpenCypher, MQL, GraphQL, Redis
  • Frontend: Lowdefy (Optional)
  • Backend: Connect (formerly Benthos)
  • Data Storage: ArcadeDB (Graph Database)
  • Infrastructure: Compose (via Docker or Podman)
  • Deployment: (Public) Container Images from Harbor (at Uni Münster)
  • Monitoring: Container Logs (local logging driver)
  • Integrations:
    • Protocols: OAI-PMH (HTTP), S3 (HTTP), GET (HTTP), DatAasee (HTTP)
    • Encodings: XML (Plain-Text)
    • Formats: DataCite (XML), DC (XML), LIDO (XML), MARC (XML), MODS (XML)
  • Exports: DataCite (JSON), BibJSON (JSON)
  • Security: Privileged endpoints
  • Testing: check-jsonschema
  • Development: Github

Repository Contents

  • api/ API definition and message schemas
  • assets/ Logos and style definition
  • backend/ Processor pipeline and component definitions
  • container/ Dockerfiles
  • database/ Database initialization, schemas and enumerated data
  • docs/ Documentation of software, data and architecture
  • frontend/ Prototype frontend definition
  • tests/ Test definitions and data

Getting Started (Development)

Local Development (After a git clone)

  • Available make targets:
    • make setup Build development server container images
    • make start Start servers
    • make stop Stop servers
    • make reset Stop and start servers
    • make build Build release container images (pass REGISTRY= to set registry)
    • make empty Delete database backups
    • make logs Show backend processor logs (requires grep)
    • make peak Report peak database memory usage (requires grep)
    • make test Run tests (requires check-jsonschema, busybox, wget)
    • make tidy List violations of StrictYAML (requires yamllint)
    • make todo List inline TODOs in repo (requires grep)
  • Custom make variable: COMPOSE (set Compose implementation)
  • Open the development frontend in your browser for manual testing of the backend

Contributors

tl;dr

DatAasee provides centralized Metasearch for distributed Metadata.