Skip to content

Documentation and unit tests#32

Open
andrewjpage wants to merge 48 commits into
quadram-institute-bioscience:masterfrom
andrewjpage:master
Open

Documentation and unit tests#32
andrewjpage wants to merge 48 commits into
quadram-institute-bioscience:masterfrom
andrewjpage:master

Conversation

@andrewjpage

Copy link
Copy Markdown
Collaborator

These changes add more documentation, comments and more unit tests to Socru. It doesn't change the underlying code. It was all generated by Copilot (AI) so all the usual warnings apply.

Copilot AI and others added 30 commits October 31, 2025 15:00
Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
…docs

Add comprehensive test suite and documentation with 93% coverage
Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
…rrnap.py

Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
…, and Database modules

Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
…Fragments

Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
… and Dif

Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
Add comprehensive inline documentation to all Python modules and scripts
…ff, remove Travis CI

- Add pyproject.toml as primary build config with setuptools backend,
  project metadata, console_scripts entry points, dev dependencies,
  pytest and ruff configuration
- Add socru/cli.py with proper entry point functions for all 6 CLI commands
- Add GitHub Actions CI workflow with Python 3.9-3.12 matrix and conda
  for bioinformatics dependencies (barrnap, blast)
- Add .pre-commit-config.yaml with ruff linter and formatter hooks
- Fix Dockerfile: use miniconda3 slim image, pin Python 3.11, fix
  git+git:// to git+https:// protocol
- Remove .travis.yml (defunct for open source, targeted EOL Python 3.6)
- Remove nose test dependency from setup.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…op iterator, double filter call, wrong attribute name, unused imports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lists, fix resource leaks

- Replace all subprocess shell=True calls with argument lists across
  Barrnap.py, Blast.py, Database.py, Socru.py, SocruCreate.py, ShrinkDatabase.py
- Replace shell redirects (>, >>, |) with Python file I/O and capture_output
- Replace gunzip/gzip shell commands with Python gzip module
- Replace shell sort pipe with Python sorted() for BLAST output
- Fix mkstemp fd leaks: close fd immediately after mkstemp in all files
- Fix Database.__del__: store tmpdir separately, rmtree the directory not db_prefix
- Add ignore_errors=True to all shutil.rmtree calls in __del__ methods
- Remove unused import time from Socru.py and SocruCreate.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rror messages

- Replace deprecated pkg_resources with importlib.resources and importlib.metadata
  across Schemas.py, SocruCreate.py, and all 6 CLI scripts
- Add FASTA input validation in Fasta.py: file existence, empty file, no sequences,
  and small contig warnings
- Replace sys.exit() with proper exceptions (FileNotFoundError, FileExistsError)
  in Socru.py and SocruCreate.py library code
- Add CLI-layer exception handling in scripts/socru and scripts/socru_create
- Improve error messages in Schemas.py to include species name and path
- Add stderr warnings in DnaA.py and Dif.py when BLAST fails to locate markers
- Improve ValidateFragments.py messages to include genome filename
- Add tests for Fasta validation (nonexistent file, empty file, invalid content)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… improve error messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Ter labels, and quality badge

New SvgGenomePlot module generates publication-quality SVG circular genome
diagrams with colored fragment arcs, hatch overlays for reversed fragments,
operon direction triangles, origin/terminus markers, tick marks, quality
badge, and a color legend. Integrated into PlotProfile (create_svg method)
and the CLI via --output_svg flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ence scoring, QC flags

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…y badges, and expandable details

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…anagement

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ualization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gment coverage, test isolation

- Fragment_test: add tests for num_bases, output_filename, operon_direction_str
  with forward/reverse operons, multiple coordinate ranges, reversed_frag behavior
- BlastResult_test: new file covering field parsing, is_forward for both
  orientations, tab-delimited __str__ output, and roundtrip consistency
- Operon_test: new file covering creation, __str__ format, attribute mutability
- GATProfile_test: edge cases for empty/single/all-reversed fragments, unknown
  "?" fragments, double-inversion identity, deterministic orientation_binary,
  profile matching, order extraction; hypothesis stubs as comments
- TypeGenerator_test: novel profile order not in DB, unknown fragments,
  empty profile, invalid fragment quality
- PlotProfile_test: use tempfile.mkdtemp with tearDown cleanup instead of
  writing to cwd
- ProfileGenerator_test: copy database to temp dir with tearDown cleanup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… chart

Introduce three new modules for batch-level analysis and visualization:
- BatchStats: aggregate statistics (type distribution, quality summary,
  mean confidence, flag summary, outlier detection) across multiple results
- SvgFragmentQuality: horizontal bar chart of per-fragment BLAST identity
  with color-coded quality thresholds and dashed bars for unknowns
- SvgTypeDistribution: bar chart of GS type frequencies with optional
  quality-stacked bars and percentage labels

Includes 37 tests covering all public APIs and edge cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate all diagnostic print() and sys.stderr.write() calls to Python's
logging module across 12 source files. Configure logging in both entry
points (scripts/socru and cli.py) based on --verbose flag. Primary
tab-delimited output and user-facing table prints remain as print().

Update README.md: replace broken Travis CI badge with GitHub Actions
placeholder, note Python 3.9+ requirement, add Output Formats and CLI
Options sections documenting --output_json, --output_svg, --output_html,
and switch testing instructions to pytest.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… after Results.py bug fix

- Fix 1-based GFF start coordinate to 0-based in Barrnap.parse_barrnap_output
- Update expected output files for SocruCreate, SocruRebuild, SocruUpdate tests
- All 290 tests now pass including integration tests with barrnap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… argparse

Introduce SocruConfig and SocruCreateConfig dataclasses that allow constructing
Socru and SocruCreate directly from typed config objects instead of requiring
fabricated argparse Namespace objects. Both classes retain backward compatibility
via from_options() class methods and isinstance dispatch in __init__.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
andrewjpage and others added 18 commits March 26, 2026 12:13
…A_DIR

Introduces a DatabaseManager class that provides a unified interface for
discovering species databases from both bundled package data and a
user-configurable data directory (~/.socru/data/ or SOCRU_DATA_DIR env var).

- DatabaseManager supports list, locate, install, and inspect operations
- Schemas.database_directory() now falls back to DatabaseManager
- socru_species gains --detailed flag for fragment/type info
- 16 new tests covering all DatabaseManager functionality

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add type annotations to all function signatures and class attributes across
13 core modules: Fragment, BlastResult, Operon, GATProfile, TypeGenerator,
ValidateFragments, Profiles, FilterBlast, Fasta, FragmentFiles, PlotProfile,
Results, and ProfileGenerator. Uses `from __future__ import annotations` for
forward reference support. Adds PEP 561 py.typed marker file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…to main pipeline

- Add novelty_assessment field to AnalysisResult (Optional[dict], default None)
- Wire NoveltyDetector into run_analysis() for novel profiles (assesses whether
  novel arrangements are likely real or artifactual)
- Add batch statistics computation and JSON output when processing multiple files
- Add _generate_batch_outputs() method producing type_distribution.svg,
  confidence_heatmap.svg, synteny.svg, per-assembly fragment_quality SVGs,
  and batch_stats.json
- Add --output_dir CLI option for batch visualization directory
- Add 13 integration tests covering novelty-in-result, batch stats consumption,
  and batch SVG generation with real AnalysisResult data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the empty __init__.py with a proper module that exports all public
classes, functions, and data models so users can write `from socru import Socru`
instead of reaching into submodules. Includes an import test to guard against
future regressions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…E.md

- Rewrite Dockerfile using condaforge/miniforge3 base with mamba, pinned
  versions for barrnap, blast, and Python 3.11
- Add .dockerignore to exclude dev artifacts from Docker context
- Add CLAUDE.md with project context for AI-assisted development
- Add package docstring and public API exports to socru/__init__.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove 53 unused imports (F401)
- Fix import ordering in all modules (I001)
- Remove trailing whitespace from blank lines (W291, W293)
- Fix tab indentation issues (W191)
- Add missing newlines at end of files (W292)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Concise, accurate docs reflecting the modernized codebase:
- README: quick start, output formats, library usage, CLI reference
- docs/: installation, user guide, tutorial, API reference, developer guide
- CLAUDE.md: developer quick reference with module layout
- Issue templates: bioinformatics-specific bug report, streamlined feature request

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce snapshot/golden-file testing infrastructure that catches
unexpected changes to visualization and report output formats. Covers
all five SVG generators (genome plot, synteny, fragment quality, type
distribution, confidence heatmap), JSON serialization of AnalysisResult,
and the HTML report generator with deterministic datetime mocking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ed input handling

- Add ToolCheck module with MissingToolError and check_tool/check_all_tools
  for early detection of missing barrnap/BLAST+ executables
- Call check_all_tools() at start of Socru and SocruCreate constructors
- Wrap subprocess.run calls in Barrnap and Blast with try/except for
  CalledProcessError (with context logging) and FileNotFoundError
- Harden FilterBlast.readin_results to skip malformed/blank BLAST lines
  with warnings instead of crashing
- Add warning when blastn produces no output
- Initialize cleanup lists before check_all_tools so __del__ never fails
- Add ToolCheck_test.py and ErrorHandling_test.py (12 new tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 48 new tests across two test modules:

- EndToEnd_test.py: 22 tests exercising the full Socru pipeline (barrnap +
  BLAST + type assignment) including JSON/SVG/HTML output generation, batch
  analysis with batch_stats, context manager cleanup, SocruConfig dataclass
  usage, novelty assessment, and SocruCreate database creation/reuse.

- OutputFormats_test.py: 26 tests verifying all output modules work correctly
  with realistic AnalysisResult data, covering JSON roundtrip serialization,
  HTML report generation, SVG genome plot/synteny/heatmap/fragment quality/
  type distribution rendering, and BatchStats computation.

Also fix a bug in HtmlReport._detail_row where None blast_identity values
caused a TypeError during HTML report generation for fragments with no BLAST
match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…recations

- Rename TestOptions to MockOptions in 5 test files to avoid
  PytestCollectionWarning (pytest tried to collect them as test classes)
- Suppress BiopythonDeprecationWarning in Fasta_test.py for the FASTA
  comment handling deprecation triggered by invalid test input
- Fix Socru.cleanup() to use getattr for dirs_to_cleanup, preventing
  PytestUnraisableExceptionWarning when __init__ raises before setting
  the attribute
- Add pkg_resources DeprecationWarning filter in pyproject.toml for
  third-party warnings we cannot fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…I options wired

- Add comprehensive serialization tests verifying AnalysisResult round-trips
  through JSON with all fields populated (novelty_assessment, qc_flags,
  fragments, operons)
- Add test verifying SocruConfig fields match argparse CLI options
- Add test verifying from_options() maps all CLI dests correctly
- Standardize None handling for file paths (use `is not None` consistently)
- Add output_dir coverage to SocruConfig tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Unit tests (379): run on Python 3.9-3.12 matrix, no conda needed, ~2s
- Integration tests (30): run with barrnap+BLAST via conda, single Python, ~50s
- Both jobs run in parallel on GitHub Actions
- Add pytest 'integration' marker via conftest.py auto-detection
- Unit job uses setup-python (fast), integration job uses setup-miniconda

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_tests.sh

- setup.py superseded by pyproject.toml
- scripts/ superseded by socru/cli.py entry points (now with all new CLI options)
- VERSION file superseded by pyproject.toml version + importlib.metadata
- MANIFEST.in only needed for setup.py
- run_tests.sh superseded by pytest
- CI simplified to Python 3.12 only (removed 3.9/3.10/3.11 matrix)
- cli.py updated with --output_json, --output_svg, --output_html, --output_dir,
  --detailed, context managers, error handling, importlib.metadata

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ense classifier

- Remove setuptools-scm from build-system requires (not used, causes missing config error)
- Remove License classifier (superseded by PEP 639 license expression already in pyproject.toml)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…841 unused vars, F541 f-string, E741 ambiguous name

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n unit tests

- Add Blast_test, Database_test, Dif_test, DnaA_test, ProfileGenerator_test to
  integration module list (they need makeblastdb/blastn on PATH)
- Mock ToolCheck in ErrorHandling_test and SocruConfig_test so they run
  without barrnap/BLAST in the unit test job
- Unit: 371 tests, Integration: 38 tests, Total: 409

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants