Draft
Conversation
Implements full Excel file processing functionality for nf-schema, addressing the need for
direct Excel workbook support without manual CSV conversion.
## Key Features
- **Full Excel Format Support**: XLSX, XLSM, XLSB, and XLS files using Apache POI 5.4.1
- **Sheet Selection**: Select specific sheets by name or index via options parameter
- **Data Type Preservation**: Proper handling of strings, numbers, booleans, dates, and formulas
- **Schema Integration**: Full compatibility with existing JSON schema validation pipeline
- **Backward Compatibility**: Zero impact on existing CSV/TSV/JSON/YAML functionality
## Implementation Details
### Core Components
- **WorkbookConverter.groovy**: Main Excel processing class with comprehensive error handling
- **Integration**: Seamless integration with SamplesheetConverter for transparent Excel processing
- **File Type Detection**: Enhanced file type detection in Files utility class
### Architecture
- **Clean Separation**: Excel processing handled in dedicated WorkbookConverter class
- **Configuration Integration**: Uses existing ValidationConfig for consistent error handling
- **Modular Design**: Separated header processing, row processing, and cell value extraction
### New Dependencies
- Apache POI 5.4.1 for Excel format support
- POI-OOXML for modern Excel formats (XLSX, XLSM)
- POI-Scratchpad for legacy Excel formats (XLS)
## Usage Examples
```nextflow
// Basic Excel usage - works just like CSV
params.input = "samplesheet.xlsx"
params.schema = "assets/schema_input.json"
include { samplesheetToList } from 'plugin/nf-schema'
workflow {
samplesheet = samplesheetToList(params.input, params.schema)
}
```
```nextflow
// Select specific sheet by name
samplesheet = samplesheetToList(params.input, params.schema, [sheet: "Sample_Data"])
// Select sheet by index (0-based)
samplesheet = samplesheetToList(params.input, params.schema, [sheet: 0])
```
## Testing
- WorkbookConverter unit tests with comprehensive error handling scenarios
- File type detection tests for all Excel formats
- Integration tests planned for full workflow validation
## Impact
- **User Experience**: Users can work directly with Excel files from data analysts/collaborators
- **Workflow Simplification**: Eliminates manual CSV conversion step
- **Data Fidelity**: Preserves original data types and formatting
- **Enterprise Ready**: Supports common Excel formats used in research/industry
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
2510d12 to
c716966
Compare
nvnieuwk
reviewed
Sep 16, 2025
Collaborator
nvnieuwk
left a comment
There was a problem hiding this comment.
This is impressive! Can you add some more tests though? It seems like this has a lot of logic behind it and I wan't to be sure everything works as expected
|
|
||
| if ( commaCount == tabCount ){ | ||
| log.error("Could not derive file type from ${file}. Please specify the file extension (CSV, TSV, YML, YAML and JSON are supported).".toString()) | ||
| log.error("Could not derive file type from ${file}. Please specify the file extension (CSV, TSV, YML, YAML, JSON, and Excel formats are supported).".toString()) |
Collaborator
There was a problem hiding this comment.
Maybe also specify which excel formats exactly are supported?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements comprehensive Excel file processing functionality for nf-schema, addressing GitHub issue #177.
Users can now use Excel workbooks (XLSX, XLSM, XLSB, XLS) directly without manual conversion to CSV format.
Key Features
Implementation Details
Core Components
Utils.castToType()method that was converting typed data to nullCommit Structure
Testing
Usage Examples
Impact
Closes #177
🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com