Documentation and unit tests#32
Open
andrewjpage wants to merge 48 commits into
Open
Conversation
Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
…docs Add comprehensive test suite and documentation with 93% coverage
Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
…rrnap.py Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
…, and Database modules Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
…Fragments Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
… and Dif Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
Co-authored-by: andrewjpage <24151+andrewjpage@users.noreply.github.com>
Add comprehensive inline documentation to all Python modules and scripts
…ff, remove Travis CI - Add pyproject.toml as primary build config with setuptools backend, project metadata, console_scripts entry points, dev dependencies, pytest and ruff configuration - Add socru/cli.py with proper entry point functions for all 6 CLI commands - Add GitHub Actions CI workflow with Python 3.9-3.12 matrix and conda for bioinformatics dependencies (barrnap, blast) - Add .pre-commit-config.yaml with ruff linter and formatter hooks - Fix Dockerfile: use miniconda3 slim image, pin Python 3.11, fix git+git:// to git+https:// protocol - Remove .travis.yml (defunct for open source, targeted EOL Python 3.6) - Remove nose test dependency from setup.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…op iterator, double filter call, wrong attribute name, unused imports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lists, fix resource leaks - Replace all subprocess shell=True calls with argument lists across Barrnap.py, Blast.py, Database.py, Socru.py, SocruCreate.py, ShrinkDatabase.py - Replace shell redirects (>, >>, |) with Python file I/O and capture_output - Replace gunzip/gzip shell commands with Python gzip module - Replace shell sort pipe with Python sorted() for BLAST output - Fix mkstemp fd leaks: close fd immediately after mkstemp in all files - Fix Database.__del__: store tmpdir separately, rmtree the directory not db_prefix - Add ignore_errors=True to all shutil.rmtree calls in __del__ methods - Remove unused import time from Socru.py and SocruCreate.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rror messages - Replace deprecated pkg_resources with importlib.resources and importlib.metadata across Schemas.py, SocruCreate.py, and all 6 CLI scripts - Add FASTA input validation in Fasta.py: file existence, empty file, no sequences, and small contig warnings - Replace sys.exit() with proper exceptions (FileNotFoundError, FileExistsError) in Socru.py and SocruCreate.py library code - Add CLI-layer exception handling in scripts/socru and scripts/socru_create - Improve error messages in Schemas.py to include species name and path - Add stderr warnings in DnaA.py and Dif.py when BLAST fails to locate markers - Improve ValidateFragments.py messages to include genome filename - Add tests for Fasta validation (nonexistent file, empty file, invalid content) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… improve error messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…Ter labels, and quality badge New SvgGenomePlot module generates publication-quality SVG circular genome diagrams with colored fragment arcs, hatch overlays for reversed fragments, operon direction triangles, origin/terminus markers, tick marks, quality badge, and a color legend. Integrated into PlotProfile (create_svg method) and the CLI via --output_svg flag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ence scoring, QC flags Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…y badges, and expandable details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…anagement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ualization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gment coverage, test isolation - Fragment_test: add tests for num_bases, output_filename, operon_direction_str with forward/reverse operons, multiple coordinate ranges, reversed_frag behavior - BlastResult_test: new file covering field parsing, is_forward for both orientations, tab-delimited __str__ output, and roundtrip consistency - Operon_test: new file covering creation, __str__ format, attribute mutability - GATProfile_test: edge cases for empty/single/all-reversed fragments, unknown "?" fragments, double-inversion identity, deterministic orientation_binary, profile matching, order extraction; hypothesis stubs as comments - TypeGenerator_test: novel profile order not in DB, unknown fragments, empty profile, invalid fragment quality - PlotProfile_test: use tempfile.mkdtemp with tearDown cleanup instead of writing to cwd - ProfileGenerator_test: copy database to temp dir with tearDown cleanup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… chart Introduce three new modules for batch-level analysis and visualization: - BatchStats: aggregate statistics (type distribution, quality summary, mean confidence, flag summary, outlier detection) across multiple results - SvgFragmentQuality: horizontal bar chart of per-fragment BLAST identity with color-coded quality thresholds and dashed bars for unknowns - SvgTypeDistribution: bar chart of GS type frequencies with optional quality-stacked bars and percentage labels Includes 37 tests covering all public APIs and edge cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate all diagnostic print() and sys.stderr.write() calls to Python's logging module across 12 source files. Configure logging in both entry points (scripts/socru and cli.py) based on --verbose flag. Primary tab-delimited output and user-facing table prints remain as print(). Update README.md: replace broken Travis CI badge with GitHub Actions placeholder, note Python 3.9+ requirement, add Output Formats and CLI Options sections documenting --output_json, --output_svg, --output_html, and switch testing instructions to pytest. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… after Results.py bug fix - Fix 1-based GFF start coordinate to 0-based in Barrnap.parse_barrnap_output - Update expected output files for SocruCreate, SocruRebuild, SocruUpdate tests - All 290 tests now pass including integration tests with barrnap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… argparse Introduce SocruConfig and SocruCreateConfig dataclasses that allow constructing Socru and SocruCreate directly from typed config objects instead of requiring fabricated argparse Namespace objects. Both classes retain backward compatibility via from_options() class methods and isinstance dispatch in __init__. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…A_DIR Introduces a DatabaseManager class that provides a unified interface for discovering species databases from both bundled package data and a user-configurable data directory (~/.socru/data/ or SOCRU_DATA_DIR env var). - DatabaseManager supports list, locate, install, and inspect operations - Schemas.database_directory() now falls back to DatabaseManager - socru_species gains --detailed flag for fragment/type info - 16 new tests covering all DatabaseManager functionality Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add type annotations to all function signatures and class attributes across 13 core modules: Fragment, BlastResult, Operon, GATProfile, TypeGenerator, ValidateFragments, Profiles, FilterBlast, Fasta, FragmentFiles, PlotProfile, Results, and ProfileGenerator. Uses `from __future__ import annotations` for forward reference support. Adds PEP 561 py.typed marker file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…to main pipeline - Add novelty_assessment field to AnalysisResult (Optional[dict], default None) - Wire NoveltyDetector into run_analysis() for novel profiles (assesses whether novel arrangements are likely real or artifactual) - Add batch statistics computation and JSON output when processing multiple files - Add _generate_batch_outputs() method producing type_distribution.svg, confidence_heatmap.svg, synteny.svg, per-assembly fragment_quality SVGs, and batch_stats.json - Add --output_dir CLI option for batch visualization directory - Add 13 integration tests covering novelty-in-result, batch stats consumption, and batch SVG generation with real AnalysisResult data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the empty __init__.py with a proper module that exports all public classes, functions, and data models so users can write `from socru import Socru` instead of reaching into submodules. Includes an import test to guard against future regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…E.md - Rewrite Dockerfile using condaforge/miniforge3 base with mamba, pinned versions for barrnap, blast, and Python 3.11 - Add .dockerignore to exclude dev artifacts from Docker context - Add CLAUDE.md with project context for AI-assisted development - Add package docstring and public API exports to socru/__init__.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove 53 unused imports (F401) - Fix import ordering in all modules (I001) - Remove trailing whitespace from blank lines (W291, W293) - Fix tab indentation issues (W191) - Add missing newlines at end of files (W292) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Concise, accurate docs reflecting the modernized codebase: - README: quick start, output formats, library usage, CLI reference - docs/: installation, user guide, tutorial, API reference, developer guide - CLAUDE.md: developer quick reference with module layout - Issue templates: bioinformatics-specific bug report, streamlined feature request Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce snapshot/golden-file testing infrastructure that catches unexpected changes to visualization and report output formats. Covers all five SVG generators (genome plot, synteny, fragment quality, type distribution, confidence heatmap), JSON serialization of AnalysisResult, and the HTML report generator with deterministic datetime mocking. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ed input handling - Add ToolCheck module with MissingToolError and check_tool/check_all_tools for early detection of missing barrnap/BLAST+ executables - Call check_all_tools() at start of Socru and SocruCreate constructors - Wrap subprocess.run calls in Barrnap and Blast with try/except for CalledProcessError (with context logging) and FileNotFoundError - Harden FilterBlast.readin_results to skip malformed/blank BLAST lines with warnings instead of crashing - Add warning when blastn produces no output - Initialize cleanup lists before check_all_tools so __del__ never fails - Add ToolCheck_test.py and ErrorHandling_test.py (12 new tests) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 48 new tests across two test modules: - EndToEnd_test.py: 22 tests exercising the full Socru pipeline (barrnap + BLAST + type assignment) including JSON/SVG/HTML output generation, batch analysis with batch_stats, context manager cleanup, SocruConfig dataclass usage, novelty assessment, and SocruCreate database creation/reuse. - OutputFormats_test.py: 26 tests verifying all output modules work correctly with realistic AnalysisResult data, covering JSON roundtrip serialization, HTML report generation, SVG genome plot/synteny/heatmap/fragment quality/ type distribution rendering, and BatchStats computation. Also fix a bug in HtmlReport._detail_row where None blast_identity values caused a TypeError during HTML report generation for fragments with no BLAST match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…recations - Rename TestOptions to MockOptions in 5 test files to avoid PytestCollectionWarning (pytest tried to collect them as test classes) - Suppress BiopythonDeprecationWarning in Fasta_test.py for the FASTA comment handling deprecation triggered by invalid test input - Fix Socru.cleanup() to use getattr for dirs_to_cleanup, preventing PytestUnraisableExceptionWarning when __init__ raises before setting the attribute - Add pkg_resources DeprecationWarning filter in pyproject.toml for third-party warnings we cannot fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…I options wired - Add comprehensive serialization tests verifying AnalysisResult round-trips through JSON with all fields populated (novelty_assessment, qc_flags, fragments, operons) - Add test verifying SocruConfig fields match argparse CLI options - Add test verifying from_options() maps all CLI dests correctly - Standardize None handling for file paths (use `is not None` consistently) - Add output_dir coverage to SocruConfig tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Unit tests (379): run on Python 3.9-3.12 matrix, no conda needed, ~2s - Integration tests (30): run with barrnap+BLAST via conda, single Python, ~50s - Both jobs run in parallel on GitHub Actions - Add pytest 'integration' marker via conftest.py auto-detection - Unit job uses setup-python (fast), integration job uses setup-miniconda Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_tests.sh - setup.py superseded by pyproject.toml - scripts/ superseded by socru/cli.py entry points (now with all new CLI options) - VERSION file superseded by pyproject.toml version + importlib.metadata - MANIFEST.in only needed for setup.py - run_tests.sh superseded by pytest - CI simplified to Python 3.12 only (removed 3.9/3.10/3.11 matrix) - cli.py updated with --output_json, --output_svg, --output_html, --output_dir, --detailed, context managers, error handling, importlib.metadata Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ense classifier - Remove setuptools-scm from build-system requires (not used, causes missing config error) - Remove License classifier (superseded by PEP 639 license expression already in pyproject.toml) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…841 unused vars, F541 f-string, E741 ambiguous name Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n unit tests - Add Blast_test, Database_test, Dif_test, DnaA_test, ProfileGenerator_test to integration module list (they need makeblastdb/blastn on PATH) - Mock ToolCheck in ErrorHandling_test and SocruConfig_test so they run without barrnap/BLAST in the unit test job - Unit: 371 tests, Integration: 38 tests, Total: 409 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
These changes add more documentation, comments and more unit tests to Socru. It doesn't change the underlying code. It was all generated by Copilot (AI) so all the usual warnings apply.