This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
NexusLIMS is an electron microscopy Laboratory Information Management System (LIMS) originally developed at NIST, now maintained by Datasophos. It automatically generates experimental records by extracting metadata from microscopy data files and harvesting information from reservation calendar systems (like NEMO).
This is the backend repository. The frontend is at NexusLIMS-CDCS.
This project uses uv for package management (recently migrated from Poetry):
# Install dependencies
uv sync
# Add a dependency
uv add <package-name>
# Add a dev dependency
uv add --dev <package-name>Tests should always be run with mpl comparison:
# Run all tests with coverage (recommended)
./scripts/run_tests.sh
# Run a specific test file
uv run pytest --mpl --mpl-baseline-path=tests/files/figs tests/test_extractors.py
# Run a specific test
uv run pytest --mpl --mpl-baseline-path=tests/files/figs tests/test_extractors.py::TestClassName::test_method_name
# Generate matplotlib baseline figures for image comparison tests
./scripts/generate_mpl_baseline.sh# Run all linting and formatting checks (recommended)
./scripts/run_lint.sh
# Or run individually:
uv run ruff format . --check # Check formatting
uv run ruff check nexusLIMS tests # Run linting
# Auto-format code
uv run ruff format .
# Type checking (Pyright is configured)
pyrightAlways use --skip-tui-demos when building docs locally — TUI demo generation is slow and unnecessary for checking content.
# Build documentation (local — always add --skip-tui-demos)
./scripts/build_docs.sh --skip-tui-demos
# Build with strict mode (treat warnings as errors - used in CI)
./scripts/build_docs.sh --strict --skip-tui-demos
# Watch mode for auto-rebuild during development
./scripts/build_docs.sh --watch --skip-tui-demos
# Documentation will be in ./_build directory# Run the record builder with full orchestration (recommended for production)
# Includes file locking, timestamped logging, and email notifications
nexuslims build-records
# Or using the module directly:
uv run python -m nexusLIMS.cli.process_records
# Run in dry-run mode (find files without building records)
nexuslims build-records -n
# Run with verbose output
nexuslims build-records -vv
# Run the core record builder directly (minimal logging, no locking)
uv run python -m nexusLIMS.builder.record_builder-
Database Layer (
nexusLIMS/db/)- SQLite database tracks instruments and session logs (managed with Alembic migrations)
- Two main tables:
instruments(instrument config) andsession_log(session tracking) models.pydefines SQLModel ORM classes (InstrumentandSessionLog)enums.pydefines type-safe enums (EventTypeandRecordStatus)session_handler.pyprovides higher-level session management utilities (Sessionclass,get_sessions_to_build())- Sessions have states (defined in
RecordStatusenum): WAITING_FOR_END, TO_BE_BUILT, COMPLETED, ERROR, NO_FILES_FOUND, NO_CONSENT, NO_RESERVATION
-
Harvesters (
nexusLIMS/harvesters/)- Extract reservation/usage data from external systems
- NEMO harvester (
nemo/): Primary harvester for NEMO lab management system - SharePoint harvester (
sharepoint_calendar.py): Deprecated - Harvesters create
ReservationEventobjects with session metadata
-
Extractors (
nexusLIMS/extractors/)- Extract metadata from microscopy file formats using a plugin-based architecture
- Plugin system (
extractors/plugins/):basic_metadata.py: Extracts file system metadata (size, timestamps, etc.)quanta_tif.py: FEI/Thermo.tifformat extractordigital_micrograph.py: Gatan.dm3/.dm4format extractoredax_spc_map.py: EDAX.spc/.msaspectrum format extractorfei_emi.py: FEI TIA.ser/.emiformat extractor- Plugins auto-discovered via
discover_plugins()function
- Instrument profiles (
extractors/plugins/profiles/):- Customize metadata extraction for specific instruments without modifying core extractors
- Supports both built-in profiles (shipped with codebase) and local profiles (external directory)
- Local profiles loaded from
NX_LOCAL_PROFILES_PATHenvironment variable - Profiles can add static metadata, parse/transform fields, add warnings, override extractors
- Preview generators (
extractors/plugins/preview_generators/):- Generate thumbnail images from microscopy data files
- Plugin-based system with image and text preview generators
- Each extractor returns dict with
nx_metakey containing NexusLIMS-specific metadata base.pydefines base classes:ExtractorPlugin,InstrumentProfile,PreviewGeneratorPlugin
-
Record Builder (
nexusLIMS/builder/record_builder.py)- Main orchestrator:
process_new_records()is the entry point build_record()creates XML records conforming to Nexus Experiment schema- Workflow:
- Query database for sessions TO_BE_BUILT
- Find files by modification time within session window
- Cluster files into Acquisition Activities using KDE
- Extract metadata from each file
- Build XML record
- Upload to CDCS frontend
- Main orchestrator:
-
Schemas (
nexusLIMS/schemas/)activity.py: AcquisitionActivity class and file clustering logiccluster_filelist_mtimes(): Uses scikit-learn KDE to find temporal gaps in file creation- XML schema validation against
nexus-experiment.xsd
-
CDCS Integration (
cdcs.py)- Uploads records to NexusLIMS CDCS frontend
- Uses credentials from environment variables
- CDCS REST API Query Endpoint (
rest/data/query/):- Defined in
core_main_apprepository - POST request parameters:
query(required): MongoDB-style query object (e.g.,{}for all,{"root.element.value": 2}for filtering)templates: List of template objects[{"id": "template_id"}]workspaces: List of workspace objects[{"id": "workspace_id"}](use[{"id": "None"}]for private)title: Filter by document title stringall: Boolean ("true"/"false") to disable paginationxpath: XPath expression to extract specific content (e.g.,"/ns:root/@element")namespaces: Namespace mappings for XPath (e.g.,{"ns": "http://example.com/ns"})options: Query options dict (can includeVISIBILITY_OPTION)order_by_field: Comma-separated field names for sorting
- URL parameter:
pagefor pagination (e.g.,?page=2) - Multiple filters can be combined in a single query
- Defined in
Record Building Process:
- NEMO harvester polls API for new/ended reservations
- Harvester creates session_log entries with START/END events
- Record builder finds sessions TO_BE_BUILT
- Files are found using GNU find (via
gnu_find_files_by_mtime) - Files clustered into Acquisition Activities by temporal analysis
- Metadata extracted from each file
- XML record built and validated
- Record uploaded to CDCS
File Finding Strategy:
- Controlled by
NX_FILE_STRATEGYenv var exclusive: Only files with known extractorsinclusive: All files (with basic metadata for unknowns)
Environment variables are loaded from .env file (see .env.example):
Critical paths:
NX_INSTRUMENT_DATA_PATH: Read-only mount of centralized instrument dataNX_DATA_PATH: Writable parallel directory for metadata/previewsNX_DB_PATH: SQLite database pathNX_LOG_PATH(optional): Directory for application logs (defaults toNX_DATA_PATH/logs/)NX_RECORDS_PATH(optional): Directory for generated XML records (defaults toNX_DATA_PATH/records/)NX_LOCAL_PROFILES_PATH(optional): Directory for site-specific instrument profiles (loaded in addition to built-in profiles)
NEMO integration:
- Supports multiple NEMO instances via
NX_NEMO_ADDRESS_N,NX_NEMO_TOKEN_Npattern - Optional timezone/datetime format overrides per instance
CDCS authentication:
NX_CDCS_TOKEN: API token for CDCS uploadsNX_CDCS_URL: Target CDCS instance URL
Sessions progress through states in session_log.record_status:
WAITING_FOR_END: Session started but not endedTO_BE_BUILT: Session ended, needs record generationCOMPLETED: Record successfully built and uploadedERROR: Record building failedNO_FILES_FOUND: No files found (may retry if within delay window)NO_CONSENT: A user did not consent to have their data harvested for the sessionNO_RESERVATION: There was no matching reservation found for this session
NX_FILE_DELAY_DAYS controls retry window for NO_FILES_FOUND sessions. Record builder continues searching until delay expires.
Each instrument in instruments table must specify:
harvester: "nemo" or "sharepoint"filestore_path: Relative toNX_INSTRUMENT_DATA_PATHtimezone: For proper datetime handling- NEMO-specific:
api_urlmatching NEMO tool names
- Uses
pytestwithpytest-mplfor image comparison tests - Test fixtures in
tests/conftest.pyset up mock database/environments - Many test data files are
.tar.gzarchives (extracted during test setup) - Coverage reports generated in
tests/coverage/
- Black formatting (88 char line length)
- isort for import sorting (Black profile)
- Ruff for linting (extensive rule set in pyproject.toml)
- Pylint with custom configuration
- NumPy-style docstrings
- Changelog content is managed by the
towncrierpackage - Whenever adding a feature or making a significant change, create a corresponding changelog blurb in docs/changes at commit time
- When creating these blurbs, follow the instructions in
docs/changes/README.rst
NEVER use os.getenv() or os.environ directly for configuration.
All environment variable access MUST go through the nexusLIMS.config module:
# ❌ WRONG - Do not do this
import os
path = os.getenv("NX_DATA_PATH")
# ✅ CORRECT - Always do this
from nexusLIMS import config
path = config.NX_DATA_PATHWhy this rule exists:
- Centralizes configuration management in one place
- Provides type safety and validation
- Enables proper default values and error handling
- Makes testing easier (can mock
configmodule) - Ensures consistent behavior across the codebase
The only exception: The nexusLIMS/config.py module itself, which is responsible for reading environment variables and exposing them as module-level attributes.
Additional technical documentation for specific tasks:
- Textual Testing Reference: Complete guide for testing Textual TUI applications. Covers the Pilot API, keyboard/mouse simulation, snapshot testing, synchronization patterns, and NexusLIMS-specific testing examples. Essential reference for developing and testing the instrument manager and other TUI components.
- Zeroing Compressed TIFF Files: Binary patching method for zeroing out LZW-compressed TIFF image data while preserving all metadata and file structure. Use when you need to create test fixtures or anonymized data files.
- Creating archive files: When creating an archive file with test files (or for any other purpose), ensure that MacOS hidden files (like
.DS_Store), MacOS resource forks, or others do not end up in the archive. Always use COPYFILE_DISABLE=1 when creating archives on MacOS. - NEMO Reference Document: The
docs/reference/NEMO_Reference.mddocument was generated from the official NEMO Feature Manual. To update it:- Download the latest PDF:
https://nemo.nist.gov/public/NEMO_Feature_Manual.pdf - Extract text:
pdftotext NEMO_Feature_Manual.pdf docs/reference/NEMO_Feature_Manual.txt - Parse relevant sections and format into markdown at
docs/reference/NEMO_Reference.md - Focus on: Usage Events, Pre/Post Usage Questions, Reservation Questions, Dynamic Form Fields, API Access
- Include NexusLIMS-specific integration notes about the three-tier metadata fallback strategy
- Download the latest PDF:
Supports Python 3.11 and 3.12 only (as specified in pyproject.toml).
- This is a fork maintained by Datasophos, not affiliated with NIST
- Original NIST documentation may be outdated: https://pages.nist.gov/NexusLIMS
- When adding new file format support: Create an
ExtractorPluginsubclass inextractors/plugins/- it will be auto-discovered - When customizing instrument behavior: Create an
InstrumentProfileinextractors/plugins/profiles/(built-in) or in the directory specified byNX_LOCAL_PROFILES_PATH(local/site-specific) - HyperSpy is used extensively for reading/processing microscopy data
- The project structure mirrors the data structure:
NX_DATA_PATHparallelsNX_INSTRUMENT_DATA_PATH
The NexusLIMS extractor system uses a plugin-based architecture for automatic discovery and registration of metadata extractors. To add support for a new file format:
See docs/writing_extractor_plugins.md for detailed guidance.
Quick reference:
-
Create a class in
nexusLIMS/extractors/plugins/with:name(str): Unique identifierpriority(int): Selection priority (higher wins, 0-1000)supported_extensions(set[str] | None): Extensions like{"xyz"}orNonefor wildcardsupports(context: ExtractionContext) -> bool: Determine if extractor handles fileextract(context: ExtractionContext) -> dict[str, Any]: Extract metadata
-
Return dict with
"nx_meta"key containing:"DatasetType": "Image", "Spectrum", "SpectrumImage", "Diffraction", or "Misc""Data Type": Descriptive string (e.g., "SEM_Imaging")"Creation Time": ISO-8601 timestamp with timezone
-
The registry auto-discovers your plugin on first use
Example:
from nexusLIMS.extractors.base import ExtractionContext
class MyFormatExtractor:
name = "my_format_extractor"
priority = 100
supported_extensions = {"xyz"}
def supports(self, context: ExtractionContext) -> bool:
return context.file_path.suffix.lower() == ".xyz"
def extract(self, context: ExtractionContext) -> dict:
return {
"nx_meta": {
"DatasetType": "Image",
"Data Type": "SEM_Imaging",
"Creation Time": "2025-12-16T12:34:56+00:00",
# ... additional fields
}
}Key patterns:
- Priority selection: Higher priority extractors are tried first
- Content sniffing: Use
supports()for format validation beyond extension - Instrument-specific: Check
context.instrumentfor instrument-specific behavior - Error handling: Gracefully handle missing/corrupted files, don't raise exceptions
- Testing: Add tests to
tests/unit/test_extractors/(see existing extractors for patterns)
Always use context7 when I need library/API documentation. This means you should automatically use the Context7 MCP tools to resolve library id and get library docs without me having to explicitly ask.