You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/paper.md
+24-23Lines changed: 24 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -97,7 +97,7 @@ The system consists of four main components:
97
97
98
98
## Generator Architectures
99
99
100
-
The generator network can be configured with different backbone types, each providing a unique trade-off between complexity, receptive field, and textural detail (see Table~\ref{tab:arch} in ~\ref{app:components}).
100
+
The generator network can be configured with different backbone types, each providing a unique trade-off between complexity, receptive field, and textural detail (see Table[@tbl-arch] in [@app-components]).
101
101
102
102
The `Generator` class provides a unified implementation of SR backbones that share a common convolutional structure while differing in their internal residual block design.
103
103
The module is initialized with a `model_type` flag selecting one of `res`, `rcab`, `rrdb`, `lka`, `esrgan`, `cgan`, each drawn from a shared registry of block factories or dedicated ESRGAN implementation.
@@ -129,7 +129,7 @@ where $s$ is a residual scaling factor. This mechanism maintains spatial coheren
129
129
130
130
## Discriminator Architectures
131
131
132
-
The discriminator can be selected to prioritize either global consistency or fine local realism. The different architectures and their purposes are outlined in Table~\ref{tab:disc} in ~\ref{app:components}. Three discriminator variants are implemented to complement the different generator types: a global `Discriminator`, a local `PatchGANDiscriminator`, and the deeper `ESRGANDiscriminator`. All are built from shared convolutional blocks with LeakyReLU activations and instance normalization.
132
+
The discriminator can be selected to prioritize either global consistency or fine local realism. The different architectures and their purposes are outlined in Table[@tbl-disc] in [@app-components]. Three discriminator variants are implemented to complement the different generator types: a global `Discriminator`, a local `PatchGANDiscriminator`, and the deeper `ESRGANDiscriminator`. All are built from shared convolutional blocks with LeakyReLU activations and instance normalization.
133
133
134
134
The standard discriminator follows the original SRGAN \cite{ledig2017photo} design and evaluates the realism of the entire super-resolved image and the actual HR image. It stacks a sequence of strided convolutional layers with progressively increasing feature channels, an adaptive average pooling layer to a fixed spatial size, and two fully connected layers producing a scalar real/fake score. This 'global' discriminator promotes coherent large-scale structure and overall photorealism.
135
135
@@ -141,7 +141,7 @@ Together, these architectures allow users to select the appropriate adversarial
141
141
142
142
# Training Features
143
143
144
-
Training stability is improved through several built-in mechanisms that address common issues of adversarial optimization (summarized in Table~\ref{tab:train}, ~\ref{app:components}). These are configured in the `Training` section of the YAML `config` file.
144
+
Training stability is improved through several built-in mechanisms that address common issues of adversarial optimization (summarized in Table[@tbl-train], [@app-components]). These are configured in the `Training` section of the YAML `config` file.
145
145
146
146
## General Training Optimizations
147
147
Several additional methods contribute to stable adversarial optimization. Label smoothing replaces hard discriminator targets (1 for real, 0 for fake) with softened values such as 0.9 and 0.1, preventing overconfidence and promoting smoother gradients. A short generator warmup phase allows $G$ to learn basic low-frequency structure before adversarial feedback is introduced, often combined with a linear or cosine learning-rate ramp to avoid abrupt updates. The discriminator holdback delays $D$ updates for the first few epochs so that $G$ can stabilise; when enabled, $D$ also follows a short warmup schedule to balance learning rates. Finally, both optimisers employ adaptive scheduling via `ReduceLROnPlateau`, lowering the learning rate when progress stagnates. These implementations mitigate divergence and improve convergence stability in adversarial training. All of these techniques can be configured from the `config` file as the unified entry-point.
@@ -166,12 +166,12 @@ where $\hat{y}_{\text{SR}}$ denotes the final super-resolved output produced by
166
166
167
167
## Loss Functions
168
168
169
-
Each loss term (see Table~\ref{tab:loss} in ~\ref{app:components}) can be weighted independently, allowing users to balance spectral accuracy and perceptual realism. Typical configurations combine L1, Perceptual, and Adversarial losses, optionally augmented by SAM and TV for multispectral consistency and smoothness. The overall objective is a weighted sum of these terms defined in the `Training.Losses ` section of the configuration. A detailed description of the internal training and validation metrics logged alongside these losses is given in ~\ref{app:metrics}.
169
+
Each loss term (see Table[@tbl-loss] in [@app-components]) can be weighted independently, allowing users to balance spectral accuracy and perceptual realism. Typical configurations combine L1, Perceptual, and Adversarial losses, optionally augmented by SAM and TV for multispectral consistency and smoothness. The overall objective is a weighted sum of these terms defined in the `Training.Losses ` section of the configuration. A detailed description of the internal training and validation metrics logged alongside these losses is given in [@app-metrics].
170
170
171
171
172
172
# Limitations
173
-
Super-resolution techniques, including those implemented in *Remote-Sensing-SRGAN*, can enhance apparent spatial detail but can never substitute for true high-resolution observations acquired by native sensors.
174
-
While *Remote-Sensing-SRGAN* provides a stable and extensible foundation for GAN-based super-resolution in remote sensing, several limitations remain. First, the framework focuses on the engineering and reproducibility aspects of model development rather than achieving state-of-the-art quantitative performance. It is therefore intended as a research and benchmarking blueprint, not as an optimized production model. Second, although the modular configuration system greatly simplifies experimentation, users are still responsible for ensuring proper data preprocessing, radiometric normalization, and accurate LR–HR alignment, factors that strongly influence training stability and reconstruction quality. Third, adversarial optimization in multispectral domains remains sensitive to dataset size and diversity; small or unbalanced datasets may still yield mode collapse or spectral inconsistencies despite the provided stabilization mechanisms. Finally, the current release does not include native uncertainty estimation or automatic hyperparameter tuning; these remain open areas for future extension.
173
+
Super-resolution techniques, including those implemented in OpenSR-SRGAN, can enhance apparent spatial detail but can never substitute for true high-resolution observations acquired by native sensors.
174
+
While OpenSR-SRGAN provides a stable and extensible foundation for GAN-based super-resolution in remote sensing, several limitations remain. First, the framework focuses on the engineering and reproducibility aspects of model development rather than achieving state-of-the-art quantitative performance. It is therefore intended as a research and benchmarking blueprint, not as an optimized production model. Second, although the modular configuration system greatly simplifies experimentation, users are still responsible for ensuring proper data preprocessing, radiometric normalization, and accurate LR–HR alignment, factors that strongly influence training stability and reconstruction quality. Third, adversarial optimization in multispectral domains remains sensitive to dataset size and diversity; small or unbalanced datasets may still yield mode collapse or spectral inconsistencies despite the provided stabilization mechanisms. Finally, the current release does not include native uncertainty estimation or automatic hyperparameter tuning; these remain open areas for future extension.
175
175
176
176
# Licensing and Availability
177
177
`OpenSR-SRGAN` is licensed under the Apache-2.0 license, with all source code stored at [ESAOpenSR/OpenSR-SRGAN](https://github.com/ESAOpenSR/SRGAN) Github repository. In the spirit of open science and collaboration, we encourage feature requests and updates, bug fixes and reports, as well as general questions and concerns via direct interaction with the repository. A reproducible notebook is permanently hosted on [Google Colab](https://colab.research.google.com/drive/16W0FWr6py1J8P4po7JbNDMaepHUM97yL?usp=sharing).
@@ -182,7 +182,7 @@ This work has been supported by the European Space Agency (ESA) $\Phi$-Lab, with
182
182
# Appendix
183
183
## Appendix A – Architecture and Training Components
184
184
185
-
**Table A1. Implemented generator types and their characteristics.**
185
+
**Table A1. Implemented generator types and their characteristics.** {#tbl-arch}
186
186
187
187
|**Generator Type**|**Description**|
188
188
|:-------------------|:----------------|
@@ -194,7 +194,7 @@ This work has been supported by the European Space Agency (ESA) $\Phi$-Lab, with
194
194
|`cgan`[@cgan]| Stochastic Conditional Generator with `NoiseResBlock`. |
195
195
196
196
197
-
**Table A2. Implemented discriminator types and their purposes.**
197
+
**Table A2. Implemented discriminator types and their purposes.** {#tbl-disc}
198
198
199
199
|**Discriminator Type**|**Description**|
200
200
|:-----------------------|:----------------|
@@ -203,7 +203,7 @@ This work has been supported by the European Space Agency (ESA) $\Phi$-Lab, with
203
203
|`esrgan`[@rrdb]| ESRGAN discriminator with configurable base channels and linear head size to complement RRDB generators. |
204
204
205
205
206
-
**Table A3. Implemented training features for stable adversarial optimization.**
206
+
**Table A3. Implemented training features for stable adversarial optimization.** {#tbl-train}
207
207
208
208
|**Feature**|**Description**|
209
209
|:-------------|:----------------|
@@ -220,7 +220,7 @@ This work has been supported by the European Space Agency (ESA) $\Phi$-Lab, with
220
220
|`Training.gpus`| Enables distributed data-parallel training when multiple GPU indices are listed, scaling training efficiently via PyTorch Lightning. |
221
221
222
222
223
-
**Table A4. Supported loss components and configuration parameters.**
223
+
**Table A4. Supported loss components and configuration parameters.** {#tbl-loss}
224
224
225
225
|**Loss Type**|**Description**|
226
226
|:---------------|:----------------|
@@ -235,18 +235,19 @@ This work has been supported by the European Space Agency (ESA) $\Phi$-Lab, with
235
235
236
236
During training, scalar metrics are continuously logged in **Weights & Biases**. These indicators quantify loss dynamics, adversarial balance, and stability. Table B1 summarises the most relevant internal metrics tracked by *OpenSR-SRGAN*.
237
237
238
-
**Table B1. Key internal metrics tracked during training and validation.**
238
+
**Table B1. Key internal metrics tracked during training and validation.** {#tbl-metrics}
239
239
240
240
|**Metric**|**Description and Expected Behaviour**|
|`training/<br>pretrain_phase`| Binary flag indicating whether generator-only warm-up is active. Remains 1 during pretraining and switches to 0 once adversarial learning begins. |
243
-
|`discriminator/<br>adversarial_loss`| Binary cross-entropy loss separating real HR from generated SR samples. Decreases below ~0.7 during stable co-training; large oscillations may indicate imbalance. |
244
-
|`discriminator/<br>D(y)_prob`| Mean discriminator confidence that ground-truth HR inputs are real. Should rise toward 0.8–1.0 and stay high when *D* is healthy. |
245
-
|`discriminator/<br>D(G(x))_prob`| Mean discriminator confidence that generated SR outputs are real. Starts near 0 and climbs toward 0.4–0.6 as *G* improves realism. |
|`training/`<br/>`pretrain_phase`| Binary flag indicating whether generator-only warm-up is active. Remains 1 during pretraining and switches to 0 once adversarial learning begins. |
243
+
|`discriminator/`<br/>`adversarial_loss`| Binary cross-entropy loss separating real HR from generated SR samples. Decreases below ~0.7 during stable co-training; large oscillations may indicate imbalance. |
244
+
|`discriminator/`<br/>`D(y)_prob`| Mean discriminator confidence that ground-truth HR inputs are real. Should rise toward 0.8–1.0 and stay high when *D* is healthy. |
245
+
|`discriminator/`<br/>`D(G(x))_prob`| Mean discriminator confidence that generated SR outputs are real. Starts near 0 and climbs toward 0.4–0.6 as *G* improves realism. |
246
246
|`generator/content_loss`| Weighted content component of the generator objective (e.g., L1 or spectral loss). Dominant during pretraining; gradually decreases over time. |
247
-
|`generator/<br>total_loss`| Full generator objective combining content and adversarial terms. Tracks `content_loss` early, then stabilises once the adversarial weight ramps up. |
248
-
|`training/<br>adv_loss_weight`| Current adversarial weight applied to the generator loss. Stays at 0 during pretrain and linearly ramps to its configured maximum value. |
249
-
|`validation/<br>DISC_adversarial_loss`| Discriminator loss on validation batches. Should roughly mirror the training curve; strong divergence may signal overfitting or instability. |
247
+
|`generator/`<br/>`total_loss`| Full generator objective combining content and adversarial terms. Tracks `content_loss` early, then stabilises once the adversarial weight ramps up. |
248
+
|`training/`<br/>`adv_loss_weight`| Current adversarial weight applied to the generator loss. Stays at 0 during pretrain and linearly ramps to its configured maximum value. |
249
+
|`validation/`<br/>`DISC_adversarial_loss`| Discriminator loss on validation batches. Should roughly mirror the training curve; strong divergence may signal overfitting or instability. |
250
+
250
251
251
252
252
253
## Appendix C – Experimental Configuration and Quantitative Results
@@ -266,9 +267,9 @@ EMA (β = 0.999) stabilises validation.
266
267
267
268
Qualitative results show sharper fields, buildings, and roads compared to bicubic upsampling, with minimal spectral distortion (Figure C1).
268
269
269
-

270
+
{#fig-exp1}
270
271
271
-
**Table C1. Configuration summary for the SEN2NAIP RGB experiment.**
272
+
**Table C1. Configuration summary for the SEN2NAIP RGB experiment.** {#tbl-exp1config}
272
273
273
274
|**Parameter**|**Setting**|
274
275
|:---------------|:------------|
@@ -279,7 +280,7 @@ Qualitative results show sharper fields, buildings, and roads compared to bicubi
279
280
| Training schedule | Pretrain 150k steps; Ramp 50k steps; EMA β = 0.999 |
280
281
| Hardware | Dual A100 (DDP), 16-bit precision |
281
282
282
-
**Table C2. Validation performance of the SEN2NAIP RGB experiment (4×).**
283
+
**Table C2. Validation performance of the SEN2NAIP RGB experiment (4×).** {#tbl-exp1results}
@@ -296,7 +297,7 @@ A PatchGAN discriminator ensures local realism; EMA is disabled.
296
297
**Performance:** mid-20 dB PSNR, SSIM ≈ 0.7–0.75, low SAM values.
297
298
Figure C2 shows sharper edges and preserved spectral structure relative to bicubic interpolation.
298
299
299
-

300
+
{#fig-exp2}
300
301
301
302
**Table C3. Configuration summary for the 6-band Sentinel-2 experiment.**
0 commit comments