Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions .github/workflows/deploy-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Deploy Documentation

on:
push:
branches: ["main"]
workflow_dispatch:

permissions:
contents: write

jobs:
deploy-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: "3.10"

- name: Cache pip dependencies
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-docs-${{ hashFiles('requirements-docs.txt') }}
restore-keys: |
${{ runner.os }}-pip-docs-

- name: Install documentation dependencies
run: pip install -r requirements-docs.txt

- name: Build and deploy documentation
run: 'mkdocs gh-deploy --force --message "docs: deploy documentation [skip ci]"'
4 changes: 2 additions & 2 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: Python application

on:
push:
branches: [ main, develop ]
branches: [ main, dev ]
pull_request:
branches: [ main, develop ]
branches: [ main, dev ]

jobs:
build:
Expand Down
56 changes: 54 additions & 2 deletions benchmarks/quant-pxd007683-tmt-vs-lfq/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,58 @@ Both methods accurately detect fold changes. TMT performs slightly better for sm

---

## Running the Benchmark

The benchmark includes comprehensive analysis scripts that answer three core scientific questions.

> **IMPORTANT: DirectLFQ Dependency**
>
> When using DirectLFQ quantification in benchmarks, **always use the external directlfq package** directly rather than any fallback implementation. This ensures reproducibility and uses the official Mann Lab algorithm.
>
> ```bash
> pip install mokume[directlfq]
> ```

### Quick Start

```bash
cd benchmarks/quant-pxd007683-tmt-vs-lfq/scripts

# Download data first
python 00_download_data.py

# Run complete benchmark
python run_benchmark.py

# Or run individual analyses
python 01_grid_search_methods.py # Grid search over methods
python 02_variance_decomposition.py # PCA and variance analysis
python 03_fold_change_accuracy.py # Fold-change accuracy
python 04_stability_metrics.py # CV analysis
python 05_cross_technology_correlation.py # LFQ vs TMT correlation
python 06_generate_report.py # Generate summary report
```

### Scientific Questions Addressed

| Question | Script | Metrics |
|----------|--------|---------|
| Q1: Absolute expression stability | `04_stability_metrics.py` | CV within conditions |
| Q2: Technical vs biological variance | `02_variance_decomposition.py` | PCA, silhouette score |
| Q3: Fold-change accuracy | `03_fold_change_accuracy.py` | RMSE, compression ratio, FPR |
| Cross-technology agreement | `05_cross_technology_correlation.py` | Pearson/Spearman r |

### Output

Results are saved to:
- `results/` - CSV files with metrics
- `figures/` - PNG plots
- `results/BENCHMARK_REPORT.md` - Comprehensive summary

See [ROADMAP-SCIENTIFIC-BENCHMARKING.md](ROADMAP-SCIENTIFIC-BENCHMARKING.md) for the full scientific framework.

---

<details>
<summary><strong>mokume vs MaxQuant Comparison</strong></summary>

Expand Down Expand Up @@ -180,8 +232,8 @@ The `median` method uses batch processing (20 samples at a time), reducing memor

Datasets searched with SAGE, COMET, MSGF+, combined with ConsensusID, filtered at 1% protein and PSM FDR.

### Notebook
### Batch Correction Example

See `notebooks/mokume-batch-correction-example.ipynb` for interactive analysis.
For batch correction examples, see `../batch-quartet-multilab/notebooks/mokume-batch-correction-example.ipynb`.

</details>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
160 changes: 160 additions & 0 deletions benchmarks/quant-pxd007683-tmt-vs-lfq/results/BENCHMARK_REPORT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# PXD007683 Benchmark Report

Comprehensive analysis of TMT vs LFQ quantification using mokume.

> **Note**: When using DirectLFQ quantification, always use the external directlfq package
> directly (`pip install mokume[directlfq]`) rather than any fallback implementation.
> This ensures reproducibility and uses the official Mann Lab algorithm.

---

## Q1: Absolute Expression Stability

**Question**: Which method combination provides the most stable absolute expression values?

### Best Methods by Within-Condition CV

| Technology | Best Method | CV (median) | % Proteins with CV < 20% |
|------------|-------------|-------------|--------------------------|
| TMT | maxlfq | 0.029 | 99.7% |
| LFQ | maxlfq | 0.074 | 86.1% |

### Key Finding

TMT consistently shows lower CV than LFQ across all quantification methods.

## Q2: Technical vs Biological Variance

**Question**: How much variance is explained by condition (biology) vs technology (technical)?

### Variance Decomposition

| Technology | % Condition | % Residual | Silhouette |
|------------|-------------|------------|------------|
| TMT | 1.0% | 99.0% | N/A |
| LFQ | 0.9% | 99.1% | N/A |

### Key Finding

Condition (yeast spike-in level) explains a significant portion of variance,
indicating biological signal is preserved through quantification.

## Q3: Fold-Change Accuracy

**Question**: How accurately do we detect expected fold-changes?

Ground truth: Yeast proteins spiked at 10%, 5%, 3.3% ratios
- 10% vs 3.3% → expected 3.0-fold (log2 = 1.58)
- 10% vs 5% → expected 2.0-fold (log2 = 1.0)
- 5% vs 3.3% → expected 1.5-fold (log2 = 0.58)

### Yeast Protein Fold-Change Accuracy

| Technology | Comparison | Expected | Observed | Compression | RMSE |
|------------|------------|----------|----------|-------------|------|
| TMT | QY_10pct_vs_QY_3pct_yeast | 1.58 | 1.57 | 0.99 | 0.325 |
| TMT | QY_10pct_vs_QY_5pct_yeast | 1.00 | 1.13 | 1.13 | 0.228 |
| TMT | QY_5pct_vs_QY_3pct_yeast | 0.58 | 0.44 | 0.75 | 0.199 |
| LFQ | QY_10pct_vs_QY_3pct_yeast | 1.58 | 1.56 | 0.99 | 0.665 |
| LFQ | QY_10pct_vs_QY_5pct_yeast | 1.00 | 1.03 | 1.03 | 0.522 |
| LFQ | QY_5pct_vs_QY_3pct_yeast | 0.58 | 0.53 | 0.90 | 0.398 |

### Human Protein False Positive Rate

Human proteins should show no change (log2 FC = 0)

| Technology | Comparison | FP Rate (|log2FC| > 1) |
|------------|------------|------------------------|
| TMT | QY_10pct_vs_QY_3pct_human | 0.0% |
| TMT | QY_10pct_vs_QY_5pct_human | 0.0% |
| TMT | QY_5pct_vs_QY_3pct_human | 0.0% |
| LFQ | QY_10pct_vs_QY_3pct_human | 1.5% |
| LFQ | QY_10pct_vs_QY_5pct_human | 1.8% |
| LFQ | QY_5pct_vs_QY_3pct_human | 1.1% |

### Key Finding

TMT shows ratio compression (observed fold-change < expected), which is
a known phenomenon. LFQ generally shows less compression but higher variability.

## Cross-Technology Correlation (TMT vs LFQ)

**Question**: How well do TMT and LFQ agree on protein abundances?

### Overall Correlation

- **Pearson r**: 0.7924
- **Spearman r**: 0.7923
- Common proteins: 5954
- Matched samples: 11

### Per-Protein Correlation

- Median correlation: 0.0622
- Proteins with r > 0.8: 438 (7.9%)

### Key Finding

TMT and LFQ show good overall agreement, validating that both technologies
measure similar underlying biology despite methodological differences.

## Recommendations

Based on the comprehensive benchmark analysis:

### For Absolute Quantification (Q1)

- **TMT**: Use `maxlfq` (CV = 0.029)
- **LFQ**: Use `maxlfq` (CV = 0.074)

### For Differential Expression (Q2-Q3)

- For **small fold-changes** (< 2-fold): Prefer TMT (lower CV, higher precision)
- For **large fold-changes** (> 2-fold): LFQ shows less compression
- Apply **batch correction** when combining experiments

### For Cross-Experiment Integration

- Normalize using `median` or `hierarchical` methods
- Consider technology as a batch effect when combining TMT and LFQ

## Figures

See the `figures/` directory for:

- `9_tissues-boxplot.png`
- `9_tissues-density.png`
- `PXD007683-11samples-density.png`
- `PXD007683-LFQ-11samples-ibaq-ibaqpy-and-maxquant.png`
- `PXD007683-LFQ-11samples-ibaq-vs-maxquant-density.png`
- `PXD007683-LFQ-11samples-no_cov.png`
- `PXD007683-LFQ-ibaq-ibaqpy-and-maxquant.png`
- `PXD007683-LFQ-ibaq-vs-maxquant-density.png`
- `PXD007683-LFQ-no_cov.png`
- `PXD007683-TMTvsLFQ-boxplot.png`
- `PXD007683-TMTvsLFQ-density.png`
- `PXD019909-11samples-density.png`
- `PXD019909-TMTvsLFQ-density.png`
- `cross_tech_overall.png`
- `cross_tech_per_protein.png`
- `cross_tech_per_sample.png`
- `cv_by_condition_lfq.png`
- `cv_by_condition_tmt.png`
- `cv_distribution_lfq.png`
- `cv_distribution_tmt.png`
- `cv_method_comparison.png`
- `fold_change_comparison.png`
- `fold_change_lfq.png`
- `fold_change_tmt.png`
- `method_mean_cv_016999_lfq.png`
- `method_mean_cv_lfq.png`
- `method_mean_cv_tmt.png`
- `method_per_p_cv_016999_lfq.png`
- `method_per_p_cv_lfq.png`
- `method_per_p_cv_tmt.png`
- `missing_peptides_by_sample.png`
- `missing_value_016999_lfq.png`
- `pca_combined.png`
- `pca_lfq.png`
- `pca_tmt.png`
- `per_protein_cv.png`
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
metric,n_proteins,n_samples,pearson_r,spearman_r
overall,5954,11,0.7924137053599063,0.7922632409282845
Loading
Loading