Skip to content

Commit 3314837

Browse files
committed
paper self-references
1 parent 1870813 commit 3314837

1 file changed

Lines changed: 24 additions & 23 deletions

File tree

paper/paper.md

Lines changed: 24 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ The system consists of four main components:
9797

9898
## Generator Architectures
9999

100-
The generator network can be configured with different backbone types, each providing a unique trade-off between complexity, receptive field, and textural detail (see Table~\ref{tab:arch} in ~\ref{app:components}).
100+
The generator network can be configured with different backbone types, each providing a unique trade-off between complexity, receptive field, and textural detail (see Table [@tbl-arch] in [@app-components]).
101101

102102
The `Generator` class provides a unified implementation of SR backbones that share a common convolutional structure while differing in their internal residual block design.
103103
The module is initialized with a `model_type` flag selecting one of `res`, `rcab`, `rrdb`, `lka`, `esrgan`, `cgan`, each drawn from a shared registry of block factories or dedicated ESRGAN implementation.
@@ -129,7 +129,7 @@ where $s$ is a residual scaling factor. This mechanism maintains spatial coheren
129129

130130
## Discriminator Architectures
131131

132-
The discriminator can be selected to prioritize either global consistency or fine local realism. The different architectures and their purposes are outlined in Table~\ref{tab:disc} in ~\ref{app:components}. Three discriminator variants are implemented to complement the different generator types: a global `Discriminator`, a local `PatchGANDiscriminator`, and the deeper `ESRGANDiscriminator`. All are built from shared convolutional blocks with LeakyReLU activations and instance normalization.
132+
The discriminator can be selected to prioritize either global consistency or fine local realism. The different architectures and their purposes are outlined in Table [@tbl-disc] in [@app-components]. Three discriminator variants are implemented to complement the different generator types: a global `Discriminator`, a local `PatchGANDiscriminator`, and the deeper `ESRGANDiscriminator`. All are built from shared convolutional blocks with LeakyReLU activations and instance normalization.
133133

134134
The standard discriminator follows the original SRGAN \cite{ledig2017photo} design and evaluates the realism of the entire super-resolved image and the actual HR image. It stacks a sequence of strided convolutional layers with progressively increasing feature channels, an adaptive average pooling layer to a fixed spatial size, and two fully connected layers producing a scalar real/fake score. This 'global' discriminator promotes coherent large-scale structure and overall photorealism.
135135

@@ -141,7 +141,7 @@ Together, these architectures allow users to select the appropriate adversarial
141141

142142
# Training Features
143143

144-
Training stability is improved through several built-in mechanisms that address common issues of adversarial optimization (summarized in Table~\ref{tab:train}, ~\ref{app:components}). These are configured in the `Training` section of the YAML `config` file.
144+
Training stability is improved through several built-in mechanisms that address common issues of adversarial optimization (summarized in Table [@tbl-train], [@app-components]). These are configured in the `Training` section of the YAML `config` file.
145145

146146
## General Training Optimizations
147147
Several additional methods contribute to stable adversarial optimization. Label smoothing replaces hard discriminator targets (1 for real, 0 for fake) with softened values such as 0.9 and 0.1, preventing overconfidence and promoting smoother gradients. A short generator warmup phase allows $G$ to learn basic low-frequency structure before adversarial feedback is introduced, often combined with a linear or cosine learning-rate ramp to avoid abrupt updates. The discriminator holdback delays $D$ updates for the first few epochs so that $G$ can stabilise; when enabled, $D$ also follows a short warmup schedule to balance learning rates. Finally, both optimisers employ adaptive scheduling via `ReduceLROnPlateau`, lowering the learning rate when progress stagnates. These implementations mitigate divergence and improve convergence stability in adversarial training. All of these techniques can be configured from the `config` file as the unified entry-point.
@@ -166,12 +166,12 @@ where $\hat{y}_{\text{SR}}$ denotes the final super-resolved output produced by
166166

167167
## Loss Functions
168168

169-
Each loss term (see Table~\ref{tab:loss} in ~\ref{app:components}) can be weighted independently, allowing users to balance spectral accuracy and perceptual realism. Typical configurations combine L1, Perceptual, and Adversarial losses, optionally augmented by SAM and TV for multispectral consistency and smoothness. The overall objective is a weighted sum of these terms defined in the `Training.Losses ` section of the configuration. A detailed description of the internal training and validation metrics logged alongside these losses is given in ~\ref{app:metrics}.
169+
Each loss term (see Table [@tbl-loss] in [@app-components]) can be weighted independently, allowing users to balance spectral accuracy and perceptual realism. Typical configurations combine L1, Perceptual, and Adversarial losses, optionally augmented by SAM and TV for multispectral consistency and smoothness. The overall objective is a weighted sum of these terms defined in the `Training.Losses ` section of the configuration. A detailed description of the internal training and validation metrics logged alongside these losses is given in [@app-metrics].
170170

171171

172172
# Limitations
173-
Super-resolution techniques, including those implemented in *Remote-Sensing-SRGAN*, can enhance apparent spatial detail but can never substitute for true high-resolution observations acquired by native sensors.
174-
While *Remote-Sensing-SRGAN* provides a stable and extensible foundation for GAN-based super-resolution in remote sensing, several limitations remain. First, the framework focuses on the engineering and reproducibility aspects of model development rather than achieving state-of-the-art quantitative performance. It is therefore intended as a research and benchmarking blueprint, not as an optimized production model. Second, although the modular configuration system greatly simplifies experimentation, users are still responsible for ensuring proper data preprocessing, radiometric normalization, and accurate LR–HR alignment, factors that strongly influence training stability and reconstruction quality. Third, adversarial optimization in multispectral domains remains sensitive to dataset size and diversity; small or unbalanced datasets may still yield mode collapse or spectral inconsistencies despite the provided stabilization mechanisms. Finally, the current release does not include native uncertainty estimation or automatic hyperparameter tuning; these remain open areas for future extension.
173+
Super-resolution techniques, including those implemented in OpenSR-SRGAN, can enhance apparent spatial detail but can never substitute for true high-resolution observations acquired by native sensors.
174+
While OpenSR-SRGAN provides a stable and extensible foundation for GAN-based super-resolution in remote sensing, several limitations remain. First, the framework focuses on the engineering and reproducibility aspects of model development rather than achieving state-of-the-art quantitative performance. It is therefore intended as a research and benchmarking blueprint, not as an optimized production model. Second, although the modular configuration system greatly simplifies experimentation, users are still responsible for ensuring proper data preprocessing, radiometric normalization, and accurate LR–HR alignment, factors that strongly influence training stability and reconstruction quality. Third, adversarial optimization in multispectral domains remains sensitive to dataset size and diversity; small or unbalanced datasets may still yield mode collapse or spectral inconsistencies despite the provided stabilization mechanisms. Finally, the current release does not include native uncertainty estimation or automatic hyperparameter tuning; these remain open areas for future extension.
175175

176176
# Licensing and Availability
177177
`OpenSR-SRGAN` is licensed under the Apache-2.0 license, with all source code stored at [ESAOpenSR/OpenSR-SRGAN](https://github.com/ESAOpenSR/SRGAN) Github repository. In the spirit of open science and collaboration, we encourage feature requests and updates, bug fixes and reports, as well as general questions and concerns via direct interaction with the repository. A reproducible notebook is permanently hosted on [Google Colab](https://colab.research.google.com/drive/16W0FWr6py1J8P4po7JbNDMaepHUM97yL?usp=sharing).
@@ -182,7 +182,7 @@ This work has been supported by the European Space Agency (ESA) $\Phi$-Lab, with
182182
# Appendix
183183
## Appendix A – Architecture and Training Components
184184

185-
**Table A1. Implemented generator types and their characteristics.**
185+
**Table A1. Implemented generator types and their characteristics.** {#tbl-arch}
186186

187187
| **Generator Type** | **Description** |
188188
|:-------------------|:----------------|
@@ -194,7 +194,7 @@ This work has been supported by the European Space Agency (ESA) $\Phi$-Lab, with
194194
| `cgan` [@cgan]| Stochastic Conditional Generator with `NoiseResBlock`. |
195195

196196

197-
**Table A2. Implemented discriminator types and their purposes.**
197+
**Table A2. Implemented discriminator types and their purposes.** {#tbl-disc}
198198

199199
| **Discriminator Type** | **Description** |
200200
|:-----------------------|:----------------|
@@ -203,7 +203,7 @@ This work has been supported by the European Space Agency (ESA) $\Phi$-Lab, with
203203
| `esrgan` [@rrdb] | ESRGAN discriminator with configurable base channels and linear head size to complement RRDB generators. |
204204

205205

206-
**Table A3. Implemented training features for stable adversarial optimization.**
206+
**Table A3. Implemented training features for stable adversarial optimization.** {#tbl-train}
207207

208208
| **Feature** | **Description** |
209209
|:-------------|:----------------|
@@ -220,7 +220,7 @@ This work has been supported by the European Space Agency (ESA) $\Phi$-Lab, with
220220
| `Training.gpus` | Enables distributed data-parallel training when multiple GPU indices are listed, scaling training efficiently via PyTorch Lightning. |
221221

222222

223-
**Table A4. Supported loss components and configuration parameters.**
223+
**Table A4. Supported loss components and configuration parameters.** {#tbl-loss}
224224

225225
| **Loss Type** | **Description** |
226226
|:---------------|:----------------|
@@ -235,18 +235,19 @@ This work has been supported by the European Space Agency (ESA) $\Phi$-Lab, with
235235

236236
During training, scalar metrics are continuously logged in **Weights & Biases**. These indicators quantify loss dynamics, adversarial balance, and stability. Table B1 summarises the most relevant internal metrics tracked by *OpenSR-SRGAN*.
237237

238-
**Table B1. Key internal metrics tracked during training and validation.**
238+
**Table B1. Key internal metrics tracked during training and validation.** {#tbl-metrics}
239239

240240
| **Metric** | **Description and Expected Behaviour** |
241-
|:------------|:--------------------------------------|
242-
| `training/<br>pretrain_phase` | Binary flag indicating whether generator-only warm-up is active. Remains 1 during pretraining and switches to 0 once adversarial learning begins. |
243-
| `discriminator/<br>adversarial_loss` | Binary cross-entropy loss separating real HR from generated SR samples. Decreases below ~0.7 during stable co-training; large oscillations may indicate imbalance. |
244-
| `discriminator/<br>D(y)_prob` | Mean discriminator confidence that ground-truth HR inputs are real. Should rise toward 0.8–1.0 and stay high when *D* is healthy. |
245-
| `discriminator/<br>D(G(x))_prob` | Mean discriminator confidence that generated SR outputs are real. Starts near 0 and climbs toward 0.4–0.6 as *G* improves realism. |
241+
|:-----------|:--------------------------------------|
242+
| `training/`<br/>`pretrain_phase` | Binary flag indicating whether generator-only warm-up is active. Remains 1 during pretraining and switches to 0 once adversarial learning begins. |
243+
| `discriminator/`<br/>`adversarial_loss` | Binary cross-entropy loss separating real HR from generated SR samples. Decreases below ~0.7 during stable co-training; large oscillations may indicate imbalance. |
244+
| `discriminator/`<br/>`D(y)_prob` | Mean discriminator confidence that ground-truth HR inputs are real. Should rise toward 0.8–1.0 and stay high when *D* is healthy. |
245+
| `discriminator/`<br/>`D(G(x))_prob` | Mean discriminator confidence that generated SR outputs are real. Starts near 0 and climbs toward 0.4–0.6 as *G* improves realism. |
246246
| `generator/content_loss` | Weighted content component of the generator objective (e.g., L1 or spectral loss). Dominant during pretraining; gradually decreases over time. |
247-
| `generator/<br>total_loss` | Full generator objective combining content and adversarial terms. Tracks `content_loss` early, then stabilises once the adversarial weight ramps up. |
248-
| `training/<br>adv_loss_weight` | Current adversarial weight applied to the generator loss. Stays at 0 during pretrain and linearly ramps to its configured maximum value. |
249-
| `validation/<br>DISC_adversarial_loss` | Discriminator loss on validation batches. Should roughly mirror the training curve; strong divergence may signal overfitting or instability. |
247+
| `generator/`<br/>`total_loss` | Full generator objective combining content and adversarial terms. Tracks `content_loss` early, then stabilises once the adversarial weight ramps up. |
248+
| `training/`<br/>`adv_loss_weight` | Current adversarial weight applied to the generator loss. Stays at 0 during pretrain and linearly ramps to its configured maximum value. |
249+
| `validation/`<br/>`DISC_adversarial_loss` | Discriminator loss on validation batches. Should roughly mirror the training curve; strong divergence may signal overfitting or instability. |
250+
250251

251252

252253
## Appendix C – Experimental Configuration and Quantitative Results
@@ -266,9 +267,9 @@ EMA (β = 0.999) stabilises validation.
266267

267268
Qualitative results show sharper fields, buildings, and roads compared to bicubic upsampling, with minimal spectral distortion (Figure C1).
268269

269-
![False-color visual comparison for 4× RGB SR on SEN2NAIP. Left to right: LR input, model output, HR reference.](figures/rgb_example.png)
270+
![False-color visual comparison for 4× RGB SR on SEN2NAIP. Left to right: LR input, model output, HR reference.](figures/rgb_example.png){#fig-exp1}
270271

271-
**Table C1. Configuration summary for the SEN2NAIP RGB experiment.**
272+
**Table C1. Configuration summary for the SEN2NAIP RGB experiment.** {#tbl-exp1config}
272273

273274
| **Parameter** | **Setting** |
274275
|:---------------|:------------|
@@ -279,7 +280,7 @@ Qualitative results show sharper fields, buildings, and roads compared to bicubi
279280
| Training schedule | Pretrain 150k steps; Ramp 50k steps; EMA β = 0.999 |
280281
| Hardware | Dual A100 (DDP), 16-bit precision |
281282

282-
**Table C2. Validation performance of the SEN2NAIP RGB experiment (4×).**
283+
**Table C2. Validation performance of the SEN2NAIP RGB experiment (4×).** {#tbl-exp1results}
283284

284285
| **Model** | **PSNR↑** | **SSIM↑** | **LPIPS↑** | **SAM↓** |
285286
|:-----------|:----------:|:----------:|:-----------:|:----------:|
@@ -296,7 +297,7 @@ A PatchGAN discriminator ensures local realism; EMA is disabled.
296297
**Performance:** mid-20 dB PSNR, SSIM ≈ 0.7–0.75, low SAM values.
297298
Figure C2 shows sharper edges and preserved spectral structure relative to bicubic interpolation.
298299

299-
![Visual comparison for 8× multispectral SR (6-band Sentinel-2). Left to right: LR input, model output, HR reference.](figures/swir_example.png)
300+
![Visual comparison for 8× multispectral SR (6-band Sentinel-2). Left to right: LR input, model output, HR reference.](figures/swir_example.png){#fig-exp2}
300301

301302
**Table C3. Configuration summary for the 6-band Sentinel-2 experiment.**
302303

0 commit comments

Comments
 (0)