Skip to content

Commit 95a3085

Browse files
committed
Improving test coverage and documentation
1 parent f8d86e6 commit 95a3085

19 files changed

+5375
-97
lines changed

CLAUDE.md

Lines changed: 53 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ Technical guidance for Claude Code when working with the PrePrimer codebase.
55
## Current State (v0.2.0)
66

77
**Codebase Metrics:**
8-
- 12,853+ total lines of Python code across 49+ files
9-
- 250+ tests with comprehensive coverage including external scheme validation
8+
- 19,985 total lines of Python code across 59 files
9+
- 581 tests with 96.90% comprehensive coverage including external scheme validation
1010
- Plugin-based architecture with security and performance focus
1111
- Documentation: 16 organized files in 3-tier structure
1212

@@ -41,11 +41,19 @@ pip install -e ".[dev]"
4141

4242
### Testing Commands
4343
```bash
44-
# Run all tests (250+ total with comprehensive coverage)
44+
# Run all tests (581 total with 96.90% comprehensive coverage)
4545
python -m pytest
4646

47+
# Run comprehensive test suites (new coverage-focused tests)
48+
python -m pytest tests/test_security_comprehensive.py -v # Security validation (38 tests)
49+
python -m pytest tests/test_main_api_comprehensive.py -v # Main API testing (12 tests)
50+
python -m pytest tests/test_converter_comprehensive_gaps.py -v # Core converter (9 tests)
51+
python -m pytest tests/test_exceptions_comprehensive.py -v # Exception system (25 tests)
52+
python -m pytest tests/test_registry_comprehensive.py -v # Registry system (16 tests)
53+
python -m pytest tests/test_artic_parser_comprehensive.py -v # ARTIC parser (22 tests)
54+
python -m pytest tests/test_sts_writer_comprehensive.py -v # STS writer (12 tests)
55+
4756
# Run specific categories
48-
python -m pytest tests/test_security.py -v # Security validation
4957
python -m pytest tests/test_benchmarks.py -v # Performance benchmarks
5058
python -m pytest tests/test_property_based.py -v # Property-based testing
5159
python -m pytest tests/test_integration.py -v # End-to-end testing
@@ -59,13 +67,14 @@ python scripts/run_mutation_tests.py # Test quality assessmen
5967
```
6068

6169
**Test Categories:**
62-
- Property-based (12): Automated input generation with Hypothesis
63-
- Benchmarks (23): Performance validation and regression detection
64-
- Integration (12+): End-to-end workflow testing
65-
- Security (18): Input validation and vulnerability prevention
66-
- Topology (20): Circular genome coordinate handling and detection
67-
- External validation: Real-world schemes from PrimerSchemes Labs repository
68-
- Unit tests: Core functionality across all components
70+
- **Comprehensive Coverage Tests (134)**: Security (38), Main API (12), Converter (9), Exceptions (25), Registry (16), Parser edge cases (22), Writer coverage (12)
71+
- **Property-based (12)**: Automated input generation with Hypothesis
72+
- **Benchmarks (23)**: Performance validation and regression detection
73+
- **Integration (12+)**: End-to-end workflow testing
74+
- **Topology (20)**: Circular genome coordinate handling and detection
75+
- **External validation**: Real-world schemes from PrimerSchemes Labs repository
76+
- **Unit tests**: Core functionality across all components
77+
- **Total: 581 tests with 96.90% coverage**
6978

7079
### Code Quality
7180
```bash
@@ -271,10 +280,12 @@ info.save("info.json")
271280
### Development Focus Areas
272281

273282
**Near-term (v0.2.x):**
274-
- Maintain comprehensive test coverage (currently 250+ tests with external validation)
275-
- Continue security hardening and input validation improvements
283+
- Maintain exceptional test coverage (currently 581 tests with 96.90% coverage)
284+
- **Completed**: Comprehensive security hardening with 100% security module coverage
285+
- **Completed**: Main API entry point testing achieving 100% coverage
286+
- **Completed**: Core infrastructure testing (converter, registry, exceptions) with 95-100% coverage
276287
- Performance monitoring with large-scale datasets (validated up to 2,500+ amplicons)
277-
- Documentation maintenance following ecosystem integration updates
288+
- Documentation maintenance following comprehensive testing improvements
278289

279290
**Medium-term (v0.3.x):**
280291
- Windows support investigation (Unicode encoding challenges)
@@ -303,26 +314,40 @@ info.save("info.json")
303314
- Comprehensive error handling with informative validation messages
304315
- JSON metadata support: Olivar configurations and primal-page info.json schema
305316

306-
**Test Data Structure:**
317+
**Test Suite Structure:**
307318
```
308-
tests/test_data/
309-
├── datasets/ # Internal test datasets
310-
│ ├── small/ # COVID-19: 5 amplicons (fast testing)
311-
│ ├── medium/ # ASFV: 80 amplicons (performance testing)
312-
│ └── mitochondrial/ # Human mito: 8 amplicons (circular genome testing)
313-
└── external_schemes/ # Real-world validation schemes
314-
├── yale-tb/ # Mycobacterium tuberculosis: 2,564 amplicons
315-
├── yale-west-nile-virus/ # West Nile Virus: 38 amplicons
316-
├── varvamp-hav/ # Hepatitis A with degenerate primers
317-
├── nCoV-2019-V532/ # ARTIC SARS-CoV-2 V5.3.2: 96 amplicons
318-
└── olivar-mitochondrial/ # Olivar-generated: 15 amplicons
319+
tests/
320+
├── test_*_comprehensive.py # Comprehensive coverage test suites (134 tests)
321+
│ ├── test_security_comprehensive.py # Security validation (38 tests, 100% coverage)
322+
│ ├── test_main_api_comprehensive.py # Main API entry point (12 tests, 100% coverage)
323+
│ ├── test_converter_comprehensive_gaps.py # Core converter edge cases (9 tests, 99.34% coverage)
324+
│ ├── test_exceptions_comprehensive.py # Exception system (25 tests, 95.67% coverage)
325+
│ ├── test_registry_comprehensive.py # Registry system (16 tests, 96.97% coverage)
326+
│ ├── test_artic_parser_comprehensive.py # ARTIC parser edge cases (22 tests, 97.22% coverage)
327+
│ └── test_sts_writer_comprehensive.py # STS writer complete coverage (12 tests, 100% coverage)
328+
├── test_data/ # Test datasets
329+
│ ├── datasets/ # Internal test datasets
330+
│ │ ├── small/ # COVID-19: 5 amplicons (fast testing)
331+
│ │ ├── medium/ # ASFV: 80 amplicons (performance testing)
332+
│ │ └── mitochondrial/ # Human mito: 8 amplicons (circular genome testing)
333+
│ └── external_schemes/ # Real-world validation schemes
334+
│ ├── yale-tb/ # Mycobacterium tuberculosis: 2,564 amplicons
335+
│ ├── yale-west-nile-virus/ # West Nile Virus: 38 amplicons
336+
│ ├── varvamp-hav/ # Hepatitis A with degenerate primers
337+
│ ├── nCoV-2019-V532/ # ARTIC SARS-CoV-2 V5.3.2: 96 amplicons
338+
│ └── olivar-mitochondrial/ # Olivar-generated: 15 amplicons
339+
└── [other test categories...] # 447 additional tests across all other categories
319340
```
320341
Each dataset includes cross-format consistency with realistic biological data. The mitochondrial datasets specifically test circular genome coordinate wrapping, while external schemes validate real-world compatibility with official primer design tools and repositories.
321342

322343
### Quality Standards
323344

324-
- **Test Coverage**: Maintain >95% across all modules
325-
- **Security**: All file operations require security validation
345+
- **Test Coverage**: **Achieved 96.90% across 581 comprehensive tests**
346+
- Security Module: 100% coverage with comprehensive edge case testing
347+
- Main API: 100% coverage of primary user-facing functions
348+
- Core Infrastructure: 95-100% coverage (converter 99.34%, registry 96.97%, exceptions 95.67%)
349+
- Parser/Writer System: 95-100% coverage with comprehensive error path testing
350+
- **Security**: All file operations require security validation (100% security module coverage)
326351
- **Performance**: Document benchmarks for performance-critical paths
327352
- **Documentation**: Keep aligned with actual implementation (recently reorganized)
328353
- **Real-world Validation**: Continuous testing with official schemes from PrimerSchemes Labs repository

README.md

Lines changed: 34 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# PrePrimer
22

3-
A comprehensive, extensible primer scheme converter for tiled amplicon sequencing applications with support for linear and circular genome architectures.
3+
A primer scheme converter for tiled amplicon sequencing applications supporting linear and circular genome architectures.
44

5-
PrePrimer enables seamless interconversion between primer scheme formats used in genomic sequencing workflows, including VarVAMP, ARTIC, and Olivar formats. The software incorporates topology-aware processing for both linear and circular genomes, ensuring accurate coordinate handling for diverse biological targets including viral, bacterial, and organellar genomes.
5+
PrePrimer facilitates format conversion between primer schemes used in genomic sequencing workflows, including VarVAMP, ARTIC, and Olivar formats. The software includes topology detection for linear and circular genomes, with coordinate system handling for viral, bacterial, and organellar targets.
66

77
## Version 0.2.0 Features
88

@@ -15,7 +15,7 @@ PrePrimer enables seamless interconversion between primer scheme formats used in
1515
- **Security Implementation**: Comprehensive input validation, path sanitization, and secure file operations
1616
- **Real-world Validation**: Tested with official schemes from PrimerSchemes Labs repository
1717
- **Command-line Interface**: Intuitive commands with automatic format detection and validation
18-
- **Comprehensive Testing**: 250+ tests including external validation and property-based testing
18+
- **Testing Coverage**: 581 tests with 96.90% coverage including external validation and property-based testing
1919
- **Performance Optimization**: Efficient processing validated with datasets up to 2,500+ amplicons
2020

2121
## Supported Formats
@@ -39,7 +39,7 @@ PrePrimer enables seamless interconversion between primer scheme formats used in
3939
- **Standards Compliance**: Full adherence to primal-page specifications and ecosystem standards
4040
- **Coordinate Systems**: Proper conversion between 0-based BED and 1-based coordinate systems
4141

42-
The software maintains complete bidirectional conversion fidelity across all implemented formats, preserving data integrity and biological accuracy throughout the conversion process.
42+
The software maintains bidirectional conversion compatibility across implemented formats, preserving data integrity during conversion.
4343

4444
## Installation
4545

@@ -64,19 +64,19 @@ python -m pytest
6464

6565
### Security Implementation
6666

67-
PrePrimer incorporates security measures for safe file processing:
68-
- Path validation to prevent directory traversal vulnerabilities
69-
- Input sanitization with configurable file size limitations
70-
- Secure file operations with automatic resource cleanup
71-
- Comprehensive logging for security event monitoring
67+
PrePrimer includes security measures for file processing:
68+
- Path validation to prevent directory traversal
69+
- Input sanitization with configurable file size limits
70+
- Secure file operations with resource cleanup
71+
- Security event logging
7272

7373
### Performance Characteristics
7474

75-
- Validated processing capabilities for datasets up to 2,500+ amplicons (Yale TB whole genome)
76-
- Linear computational complexity O(n) scaling with dataset size
77-
- Memory utilization: approximately 50MB baseline, scaling efficiently for large datasets
78-
- Sub-second processing for typical viral genome schemes (≤500 amplicons)
79-
- Topology detection and coordinate conversion with minimal computational overhead
75+
- Tested with datasets up to 2,500 amplicons (Yale TB whole genome)
76+
- Linear computational complexity O(n) scaling
77+
- Memory usage: approximately 50MB baseline
78+
- Processing time under 1 second for schemes with ≤500 amplicons
79+
- Efficient topology detection and coordinate conversion
8080

8181
## Quick Start
8282

@@ -141,7 +141,7 @@ preprimer convert --input primers.tsv --output-dir output/ \
141141
--output-formats artic fasta sts --prefix MyVirus
142142
```
143143

144-
## 🧬 **Use Cases**
144+
## Use Cases
145145

146146
### **Viral Genome Sequencing Workflows**
147147

@@ -177,9 +177,9 @@ preprimer info suspicious_primers.tsv
177177
preprimer convert --input primers.tsv --output-dir /tmp --validate-only
178178
```
179179

180-
## 🏗️ **Architecture**
180+
## Architecture
181181

182-
PrePrimer 0.2.0 features a completely refactored, extensible architecture:
182+
PrePrimer implements a plugin-based architecture:
183183

184184
```
185185
preprimer/
@@ -205,22 +205,24 @@ preprimer/
205205
└── cli.py # Modern command-line interface
206206
```
207207

208-
### **Key Features**
208+
### Key Features
209209

210-
- **🧬 Topology-aware**: Automatic detection and handling of circular genome architectures
211-
- **🔌 Plugin Architecture**: Extensible parser and writer system with auto-registration
212-
- **🛡️ Standards Compliance**: Full adherence to primal-page specifications and ecosystem standards
213-
- **🧪 IUPAC Support**: Complete degenerate nucleotide handling for variant-aware primer designs
214-
- **🔍 Auto-detection**: Intelligent format detection based on content analysis and metadata
215-
- **📊 Data Integrity**: Standardized data models preserving biological accuracy across conversions
216-
- **⚙️ Flexible Configuration**: JSON-based configuration with primal-page info.json support
217-
- **🔐 Security-first**: Comprehensive input validation and secure file operations
210+
- **Topology Detection**: Automatic detection and handling of circular genome architectures
211+
- **Plugin Architecture**: Extensible parser and writer system with auto-registration
212+
- **Standards Compliance**: Adherence to primal-page specifications
213+
- **IUPAC Support**: Degenerate nucleotide handling for variant-aware primer designs
214+
- **Format Detection**: Automatic format detection based on content analysis
215+
- **Data Models**: Standardized data structures for conversion accuracy
216+
- **Configuration**: JSON-based configuration with primal-page info.json support
217+
- **Security**: Input validation and secure file operations
218218

219-
## 🤝 **Contributing**
219+
For detailed architecture documentation, see [docs/developer/architecture.md](docs/developer/architecture.md).
220220

221-
We welcome contributions! PrePrimer is designed to be easily extensible.
221+
## Contributing
222222

223-
### **Adding New Formats**
223+
Contributions are welcome. PrePrimer is designed for extensibility.
224+
225+
### Adding New Formats
224226

225227
1. **Create a Parser** (for input formats):
226228
```python
@@ -241,7 +243,7 @@ We welcome contributions! PrePrimer is designed to be easily extensible.
241243
writer_registry.register(MyWriter)
242244
```
243245

244-
### **Development Setup**
246+
### Development Setup
245247
```bash
246248
git clone https://github.com/FOI-Bioinformatics/preprimer.git
247249
cd preprimer
@@ -256,11 +258,11 @@ python -m pytest
256258
python -m pytest tests/test_refactored_system.py -v
257259
```
258260

259-
## 📄 **License**
261+
## License
260262

261263
PrePrimer is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
262264

263-
## 🙏 **Acknowledgments**
265+
## Acknowledgments
264266

265267
- Original PrePrimer codebase foundation
266268
- [VarVAMP](https://github.com/jonas-fuchs/varVAMP) - SADDLE algorithm for variant-aware primer design
@@ -273,4 +275,3 @@ PrePrimer is licensed under the MIT License - see the [LICENSE](LICENSE) file fo
273275

274276
---
275277

276-
**PrePrimer 0.2.0 - Modern primer scheme conversion made easy! 🧬✨**

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ This directory contains the complete documentation for PrePrimer, a comprehensiv
2828

2929
### System Specifications
3030
- **[Security Implementation](technical/security.md)** - Comprehensive security features and validation
31-
- **[Testing Framework](technical/testing.md)** - Extensive testing methodology (250+ tests with external validation)
31+
- **[Testing Framework](technical/testing.md)** - Extensive testing methodology (581 tests with 96.90% coverage)
3232
- **[Platform Compatibility](technical/compatibility.md)** - Platform support, topology handling, and ecosystem integration
3333

3434
## Developer Documentation

docs/technical/testing.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# Testing Framework
22

3-
PrePrimer implements a comprehensive testing framework with 226 tests across multiple methodologies to ensure code quality, performance, and reliability.
3+
PrePrimer implements a testing framework with 581 tests across multiple methodologies to ensure code quality, performance, and reliability.
44

55
## Testing Overview
66

77
### Test Statistics
8-
- **Total Tests**: 226 implemented tests
9-
- **Success Rate**: 225 passing, 1 skipped (99.6% success rate)
10-
- **Test Categories**: 5 distinct testing methodologies
11-
- **Coverage**: Core functionality, security features, and performance validation
8+
- **Total Tests**: 581 implemented tests
9+
- **Coverage**: 96.90% code coverage
10+
- **Test Categories**: Multiple testing methodologies
11+
- **Scope**: Core functionality, security features, performance validation, and comprehensive edge case testing
1212

1313
### Test Architecture
1414
```

docs/user-guide/installation.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,24 @@
1-
# 💿 Installation Guide
1+
# Installation Guide
22

3-
Complete installation instructions for PrePrimer on all supported platforms.
3+
Installation instructions for PrePrimer on supported platforms.
44

5-
## 🎯 **Prerequisites**
5+
## Prerequisites
66

7-
### **System Requirements**
7+
### System Requirements
88
- **Python**: 3.8 or later
99
- **Operating System**: Linux or macOS only
1010
- **Memory**: 512 MB RAM minimum (2 GB recommended for large files)
1111
- **Storage**: 100 MB free space
1212

13-
> ⚠️ **Windows Support**: Windows is not currently supported due to Unicode character encoding limitations. Consider using WSL2 on Windows.
13+
> Note: Windows is not currently supported due to Unicode character encoding limitations. WSL2 may provide an alternative for Windows users.
1414
1515
### **Python Dependencies**
1616
PrePrimer automatically installs these required packages:
1717
- `pydantic>=2.0` - Data validation and settings management
1818
- `pyyaml>=6.0` - YAML configuration file support
1919
- `click>=8.0` - Command-line interface framework
2020

21-
**Development dependencies** (installed with `[dev]` option):
21+
Development dependencies (installed with `[dev]` option):
2222
- `pytest>=7.0` - Test framework
2323
- `pytest-cov>=4.0` - Coverage reporting
2424
- `pytest-benchmark>=4.0` - Performance benchmarking
@@ -29,11 +29,11 @@ PrePrimer automatically installs these required packages:
2929
- `flake8>=6.0` - Linting
3030
- `mypy>=1.0` - Type checking
3131

32-
## 🚀 **Installation Methods**
32+
## Installation Methods
3333

34-
### **Method 1: From Source (Recommended)**
34+
### Method 1: From Source (Recommended)
3535

36-
This is the most up-to-date installation method:
36+
Installation from source provides the most current version:
3737

3838
```bash
3939
# 1. Clone the repository
@@ -47,7 +47,7 @@ pip install -e .
4747
preprimer --version
4848
```
4949

50-
### **Method 2: Direct pip Install (Coming Soon)**
50+
### Method 2: Direct pip Install (Coming Soon)
5151

5252
```bash
5353
# When available on PyPI

0 commit comments

Comments
 (0)