This file provides guidance to coding agents working with code in this repository.
NexusLIMS is an electron microscopy Laboratory Information Management System (LIMS) originally developed at NIST, now maintained by Datasophos. It automatically generates experimental records by extracting metadata from microscopy data files and harvesting information from reservation calendar systems like NEMO.
This is the backend repository. The frontend is at https://github.com/datasophos/NexusLIMS-CDCS.
This project uses uv for package management.
# Install dependencies
uv sync
# Add a dependency
uv add <package-name>
# Add a dev dependency
uv add --dev <package-name>Tests should always be run with MPL comparison enabled.
# Run all tests with coverage (recommended)
./scripts/run_tests.sh
# Run a specific test file
uv run pytest --mpl --mpl-baseline-path=tests/files/figs tests/test_extractors.py
# Run a specific test
uv run pytest --mpl --mpl-baseline-path=tests/files/figs tests/test_extractors.py::TestClassName::test_method_name
# Generate matplotlib baseline figures for image comparison tests
./scripts/generate_mpl_baseline.sh# Run all linting and formatting checks (recommended)
./scripts/run_lint.sh
# Or run individually:
uv run ruff format . --check
uv run ruff check nexusLIMS tests
# Auto-format code
uv run ruff format .
# Type checking
pyrightAlways use --skip-tui-demos when building docs locally. TUI demo generation is slow and unnecessary for checking content.
# Build documentation (local)
./scripts/build_docs.sh --skip-tui-demos
# Build with strict mode (used in CI)
./scripts/build_docs.sh --strict --skip-tui-demos
# Watch mode for auto-rebuild during development
./scripts/build_docs.sh --watch --skip-tui-demosDocumentation will be written to ./_build.
# Run the record builder with full orchestration
nexuslims build-records
# Or using the module directly:
uv run python -m nexusLIMS.cli.process_records
# Run in dry-run mode
nexuslims build-records -n
# Run with verbose output
nexuslims build-records -vv
# Run the core record builder directly
uv run python -m nexusLIMS.builder.record_builder-
Database Layer (
nexusLIMS/db/)- SQLite database tracks instruments and session logs through Alembic migrations
- Main tables:
instrumentsandsession_log models.pydefines SQLModel ORM classesInstrumentandSessionLogenums.pydefines enumsEventTypeandRecordStatussession_handler.pyprovides higher-level session utilities
-
Harvesters (
nexusLIMS/harvesters/)- Extract reservation and usage data from external systems
- Primary harvester is NEMO in
nemo/ - SharePoint calendar support is deprecated
-
Extractors (
nexusLIMS/extractors/)- Plugin-based metadata extraction
- Plugins live in
extractors/plugins/ - Instrument profiles live in
extractors/plugins/profiles/ - Preview generators live in
extractors/plugins/preview_generators/ - Extractors return a dict with an
nx_metakey for NexusLIMS-specific metadata
-
Record Builder (
nexusLIMS/builder/record_builder.py)- Main orchestration entry point is
process_new_records() build_record()creates XML records conforming to the Nexus Experiment schema
- Main orchestration entry point is
-
Schemas (
nexusLIMS/schemas/)activity.pycontainsAcquisitionActivityand file clustering logic- XML schema validation is performed against
nexus-experiment.xsd
-
CDCS Integration (
cdcs.py)- Uploads records to the NexusLIMS CDCS frontend
- Uses credentials and configuration from environment-driven app config
Record Building Process
- NEMO harvester polls for new or ended reservations
- Harvester creates
session_logentries - Record builder finds sessions that are ready to build
- Files are found using GNU
find - Files are clustered into Acquisition Activities
- Metadata is extracted
- XML is built and validated
- Record is uploaded to CDCS
File Finding Strategy
- Controlled by
NX_FILE_STRATEGY exclusive: only files with known extractorsinclusive: all files, with basic metadata for unknowns
Environment variables are loaded from .env file data. See .env.example.
Critical paths:
NX_INSTRUMENT_DATA_PATH: read-only mount of centralized instrument dataNX_DATA_PATH: writable parallel directory for metadata and previewsNX_DB_PATH: SQLite database pathNX_LOG_PATH: optional directory for logs, defaults underNX_DATA_PATHNX_RECORDS_PATH: optional directory for XML records, defaults underNX_DATA_PATHNX_LOCAL_PROFILES_PATH: optional directory for site-specific instrument profiles
NEMO integration:
- Supports multiple NEMO instances via
NX_NEMO_ADDRESS_NandNX_NEMO_TOKEN_N - Optional timezone and datetime format overrides may be set per instance
CDCS authentication:
NX_CDCS_TOKENNX_CDCS_URL
Sessions progress through session_log.record_status:
WAITING_FOR_ENDTO_BE_BUILTCOMPLETEDERRORNO_FILES_FOUNDNO_CONSENTNO_RESERVATION
NX_FILE_DELAY_DAYS controls the retry window for NO_FILES_FOUND sessions.
Each instrument in instruments must specify:
harvester:nemoorsharepointfilestore_path: relative toNX_INSTRUMENT_DATA_PATHtimezone- For NEMO-backed instruments,
api_urlmatching NEMO tool names
- Uses
pytestwithpytest-mplfor image comparison tests - Test fixtures set up mock databases and environments
- Many test files are
.tar.gzarchives extracted during test setup - Coverage reports are generated in
tests/coverage/
- Ruff is used for formatting and linting
- Pyright is configured for type checking
- NumPy-style docstrings are preferred
- Changelog content is managed by
towncrier - When adding a feature or making a significant change, create a changelog blurb in
docs/changes - Follow the instructions in
docs/changes/README.rst - When preparing or cutting a release in Codex, use the
nexuslims-releaseskill
Never use os.getenv() or os.environ directly for application configuration access outside nexusLIMS/config.py.
# Wrong
import os
path = os.getenv("NX_DATA_PATH")
# Correct
from nexusLIMS import config
path = config.NX_DATA_PATHWhy this rule exists:
- centralizes configuration management
- provides validation and defaults
- makes testing easier
- keeps configuration access consistent
The only exception is nexusLIMS/config.py, which is responsible for reading environment variables and exposing validated module-level attributes.
- See
docs/reference/textual_testing_reference.mdfor Textual testing patterns used in this repo - See
.claude/notes/zeroing-compressed-tiff-files.mdfor the TIFF zeroing workflow referenced by past work in this repo - When creating archive files on macOS, use
COPYFILE_DISABLE=1so macOS metadata files are not included
Supports Python 3.11 and 3.12 only, as defined in pyproject.toml.
- This is a fork maintained by Datasophos, not affiliated with NIST
- Original NIST documentation may be outdated: https://pages.nist.gov/NexusLIMS
- When adding new file format support, create an extractor plugin in
nexusLIMS/extractors/plugins/ - When customizing instrument behavior, create an
InstrumentProfileinextractors/plugins/profiles/or in the directory pointed to byNX_LOCAL_PROFILES_PATH - HyperSpy is used extensively for reading and processing microscopy data
- The project structure mirrors the data structure:
NX_DATA_PATHparallelsNX_INSTRUMENT_DATA_PATH
See docs/writing_extractor_plugins.md for detailed guidance.
Quick reference:
- Create a class in
nexusLIMS/extractors/plugins/with:nameprioritysupported_extensionssupports(context: ExtractionContext) -> boolextract(context: ExtractionContext) -> dict[str, Any]
- Return a dict with an
nx_metakey containing:DatasetTypeData TypeCreation Time
- The registry auto-discovers plugins on first use
Key patterns:
- use priority-based selection
- use
supports()for content sniffing beyond extension checks - check
context.instrumentfor instrument-specific behavior - handle missing or corrupted files gracefully
- add tests under
tests/unit/test_extractors/