ECLIPSE-Lab
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎materials_genomics/01_intro/unit1_content_50slides.md‎
Lines changed: 14 additions & 7 deletions b/‎materials_genomics/01_intro/unit1_content_50slides.md‎
Lines changed: 14 additions & 7 deletions
diff --git a/‎materials_genomics/01_intro/unit1_plan.md‎
Lines changed: 33 additions & 0 deletions b/‎materials_genomics/01_intro/unit1_plan.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎materials_genomics/02_crystal_structure_fundamentals/unit2_content_50slides.md‎
Lines changed: 16 additions & 2 deletions b/‎materials_genomics/02_crystal_structure_fundamentals/unit2_content_50slides.md‎
Lines changed: 16 additions & 2 deletions
diff --git a/‎materials_genomics/02_crystal_structure_fundamentals/unit2_plan.md‎
Lines changed: 34 additions & 0 deletions b/‎materials_genomics/02_crystal_structure_fundamentals/unit2_plan.md‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎materials_genomics/03_materials_databases/unit3_plan.md‎
Lines changed: 34 additions & 0 deletions b/‎materials_genomics/03_materials_databases/unit3_plan.md‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎materials_genomics/04_classical_descriptors/unit4_plan.md‎
Lines changed: 35 additions & 0 deletions b/‎materials_genomics/04_classical_descriptors/unit4_plan.md‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎materials_genomics/05_graph_based_rep/unit5_plan.md‎
Lines changed: 33 additions & 0 deletions b/‎materials_genomics/05_graph_based_rep/unit5_plan.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎materials_genomics/06_local_atomic_envs/unit6_plan.md‎
Lines changed: 33 additions & 0 deletions b/‎materials_genomics/06_local_atomic_envs/unit6_plan.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎materials_genomics/07_regression_and_generalization_in_materials_data/unit7_plan.md‎
Lines changed: 34 additions & 0 deletions b/‎materials_genomics/07_regression_and_generalization_in_materials_data/unit7_plan.md‎
Lines changed: 34 additions & 0 deletions
@@ -178,3 +178,4 @@ cython_debug/
 .pypirc
 
 /.quarto/
+.worktrees/
@@ -3,16 +3,23 @@
 ## Unit theme
 **Materials data as a design space: from datasets to discovery loops**
 
-## Core source mapping (book-priority aligned)
-- **Neuer (2024)**: model framing, explainability, criticism of pure black-box use.
-- **Sandfeld (2024)**: Ch. 2.1–2.3 (data science in materials, domain knowledge, data→information→knowledge).
-- **McClarren (2021)**: Ch. 1 (ML landscape; validation basics; Bayesian view).
-- **Murphy (2012)**: Ch. 1 (task definitions, model selection language).
-- **Bishop (2006)**: Ch. 1 (probabilistic framing and model-selection mindset).
+## Book-backed content summary (for this unit)
+- Materials genomics treats composition and crystal structure as a searchable design space rather than a fixed list of known compounds.
+- Databases and simulations provide candidate structures, target properties, and method metadata that together define the learning problem.
+- Model choice in scientific discovery must remain tied to domain knowledge, uncertainty, and explainability rather than pure benchmark accuracy.
+- Materials discovery tasks combine regression, ranking, classification, and screening logic within one validation-aware workflow.
+- Provenance, dataset bias, and leakage determine whether a discovery claim is scientifically defensible.
+
+## Source anchors used
+- Neuer 1.1-1.3
+- Sandfeld 2.1-2.3
+- McClarren Ch1
+- Murphy Ch1
+- Bishop Ch1
 
 ---
 
-## Slide-by-slide content (target: 50)
+## 50-slide scaffold
 
 ### Block A — Why materials genomics? (Slides 1–8)
 1. **Title + course role in program**
 
@@ -39,3 +39,36 @@ By the end of this unit, students can:
 ## Assessment alignment
 - Written exam: conceptual precision (not coding trivia)
 - Students should be able to defend model/data choices scientifically.
+
+## Required chapter files
+- Neuer:
+  - `neuer-machine-learning-for-engineers/markdown/01-data-as-the-basis-of-models.qmd` (1.1-1.3)
+- Sandfeld:
+  - `sandfeld-materials-data-science/markdown/04-part-i-introduction-and-foundations.qmd` (2.1-2.3)
+- McClarren:
+  - `mcclarren-machine-learning-for-engineers/markdown/01-the-landscape-of-machine-learning-supervised-and-unsupervised-learning-optimization-and-other-topics.qmd`
+- Bishop:
+  - `bishop-pattern-recognition-and-machine-learning-2006/markdown/05-introduction.qmd`
+- Murphy:
+  - `murphy-machine-learning-a-probabilistic-perspective-2012/markdown/08-introduction.qmd`
+
+## Cross-book summary target
+- Start from Neuer's distinction between white-box, grey-box, and black-box models and explain why materials genomics cannot treat screening as a pure black-box exercise.
+- Use Sandfeld to define materials data science as a domain-knowledge-guided workflow from data to information to knowledge.
+- Use McClarren, Bishop, and Murphy only to stabilize the ML vocabulary: task definition, model selection, validation, and scientific interpretation.
+- Keep the focus on databases, discovery loops, and validity criteria rather than on algorithm derivations.
+- Exclude detailed probability theory and optimization proofs; they belong to MFML.
+
+## 50-slide strategy
+- Slides 1-8: course position, genomics analogy, learning objectives, discovery bottleneck.
+- Slides 9-18: design-space framing, PSPP graph, targets, surrogate-model logic.
+- Slides 19-30: database landscape, data objects, provenance, bias, and leakage.
+- Slides 31-41: regression/classification/ranking tasks, grouped validation, uncertainty-aware decisions.
+- Slides 42-50: exercise handoff, reporting checklist, exam-relevant summary statements.
+
+## Website summary update
+- Heading: `#### Week 1 – What is Materials Genomics? (14.04.2026)`
+- Add a short summary emphasizing:
+  - materials genomes as searchable composition-structure spaces,
+  - databases plus simulations as the data substrate,
+  - the need for validation, uncertainty, and domain knowledge in discovery claims.
@@ -1,6 +1,20 @@
-# Materials Genomics Unit 2 — 50-Slide Scaffold Pack
+# Materials Genomics Unit 2 — 50-Slide Teaching Scaffold (book-backed)
 
-## Slide-by-slide scaffold
+## Book-backed content summary (for this unit)
+- Crystal structures become ML-ready only after careful choices about lattice, basis, coordinate systems, and periodic representation.
+- Symmetry reduces redundant degrees of freedom but also creates canonicalization and leakage challenges when equivalent structures appear multiple times.
+- CIF-like containers mix geometry, chemistry, and metadata; these fields must be parsed into structured representations without discarding provenance.
+- Low-dimensional organization of structural data helps students see why crystal families and prototypes cluster, but these projections are not substitutes for crystallographic reasoning.
+- The unit prepares students for descriptor and graph models by clarifying what information a crystal representation must preserve.
+
+## Source anchors used
+- Neuer 1.2.3, 1.2.7, 5.2
+- Sandfeld 3.3
+- McClarren Ch4
+- Bishop Ch12
+- Murphy Ch19
+
+## 50-slide scaffold
 
 1. **Title: Crystal Structure Fundamentals**
 - Unit scope and role.
 
@@ -35,3 +35,37 @@ By the end of Unit 2 students can:
 2. Sandfeld: materials data science + domain integration
 3. McClarren: practical ML task structure
 4. Murphy/Bishop: validation and probabilistic interpretation
+
+## Required chapter files
+- Neuer:
+  - `neuer-machine-learning-for-engineers/markdown/01-data-as-the-basis-of-models.qmd` (1.2.3, 1.2.7)
+  - `neuer-machine-learning-for-engineers/markdown/05-unsupervised-learning.qmd` (5.2)
+- Sandfeld:
+  - `sandfeld-materials-data-science/markdown/04-part-i-introduction-and-foundations.qmd` (3.3, structured/tabular data)
+- McClarren:
+  - `mcclarren-machine-learning-for-engineers/markdown/04-finding-structure-within-a-data-set-data-reduction-and-clustering.qmd`
+- Bishop:
+  - `bishop-pattern-recognition-and-machine-learning-2006/markdown/16-continuous-latent-variables.qmd`
+- Murphy:
+  - `murphy-machine-learning-a-probabilistic-perspective-2012/markdown/19-latent-linear-models.qmd`
+
+## Cross-book summary target
+- Use Sandfeld and Neuer to explain how crystal structures must be represented as structured data objects before any ML method can act on them.
+- Use McClarren, Murphy, and Bishop only to motivate low-dimensional structure, covariance views, and representation choices, not to teach crystallography itself.
+- Keep the domain core on lattice, basis, periodicity, symmetry, and invariance requirements for ML-ready crystal data.
+- Explain why representation choices change both model accuracy and leakage risk.
+- Exclude formal derivations of PCA and latent-variable models; students already meet the mathematics in MFML.
+
+## 50-slide strategy
+- Slides 1-10: structural vocabulary, lattices, bases, unit cells, coordinate systems.
+- Slides 11-22: primitive vs conventional cells, symmetry, periodicity, invariances.
+- Slides 23-34: CIF/POSCAR-style encodings, tabularization, low-dimensional structure intuition.
+- Slides 35-44: data-quality risks, polymorph/prototype leakage, grouped splits.
+- Slides 45-50: CIF-to-feature exercise setup and exam checklist.
+
+## Website summary update
+- Heading: `#### Week 2 – Simulation methods as data generators (21.04.2026)`
+- Add or revise the summary so Week 2 bridges simulation outputs to ML-ready crystal representations:
+  - structures as data objects with periodic constraints,
+  - symmetry and coordinate choices,
+  - low-dimensional organization of crystal data.
@@ -28,3 +28,37 @@ By the end of Unit 3, students can:
 - compare two methodological choices under identical split protocol
 - perform one structured failure analysis and mitigation proposal
 - produce a short report with claims, evidence, and limitations
+
+## Required chapter files
+- Neuer:
+  - `neuer-machine-learning-for-engineers/markdown/01-data-as-the-basis-of-models.qmd` (1.3)
+  - `neuer-machine-learning-for-engineers/markdown/04-supervised-learning.qmd` (4.2.2, 4.2.3, 4.4.1)
+- Sandfeld:
+  - `sandfeld-materials-data-science/markdown/04-part-i-introduction-and-foundations.qmd` (2.2, 4.5)
+- McClarren:
+  - `mcclarren-machine-learning-for-engineers/markdown/01-the-landscape-of-machine-learning-supervised-and-unsupervised-learning-optimization-and-other-topics.qmd`
+- Bishop:
+  - `bishop-pattern-recognition-and-machine-learning-2006/markdown/07-linear-models-for-regression.qmd`
+- Murphy:
+  - `murphy-machine-learning-a-probabilistic-perspective-2012/markdown/14-linear-regression.qmd`
+
+## Cross-book summary target
+- Use Neuer to frame database targets as supervised-learning objects with explicit train/test logic and careful normalization.
+- Use Sandfeld to connect materials datasets to provenance, domain context, and structured records rather than isolated numbers.
+- Use Bishop and Murphy selectively for regression language around targets, residuals, and fair comparison.
+- Keep the unit centered on real materials records: composition, structure, calculation metadata, formation energy, and energy above hull.
+- Exclude extensive regression derivations; the priority is scientific meaning of database fields, convex-hull logic, and reproducible queries.
+
+## 50-slide strategy
+- Slides 1-8: database ecosystem and why database choice changes conclusions.
+- Slides 9-20: schema elements, identifiers, composition/structure/provenance fields.
+- Slides 21-32: formation energy, energy above hull, convex hulls, metastability, synthesis caveats.
+- Slides 33-42: confounders from functionals, cutoffs, duplicate structures, normalization choices.
+- Slides 43-50: API query workflow, reproducible snapshotting, exercise briefing.
+
+## Website summary update
+- Heading: `#### Week 3 – Atomistic and electronic simulations (DFT, MD, MC) (28.04.2026)`
+- Add a summary emphasizing:
+  - which atomistic simulations populate materials databases,
+  - how thermodynamic targets are constructed from those calculations,
+  - why method metadata and reference states matter for ML.
@@ -28,3 +28,38 @@ By the end of Unit 4, students can:
 - compare two methodological choices under identical split protocol
 - perform one structured failure analysis and mitigation proposal
 - produce a short report with claims, evidence, and limitations
+
+## Required chapter files
+- Neuer:
+  - `neuer-machine-learning-for-engineers/markdown/05-unsupervised-learning.qmd` (5.5)
+- Sandfeld:
+  - `sandfeld-materials-data-science/markdown/04-part-i-introduction-and-foundations.qmd` (2.2, 2.4)
+  - `sandfeld-materials-data-science/markdown/06-part-iii-classical-machine-learning.qmd` (feature matrices, regression context)
+- McClarren:
+  - `mcclarren-machine-learning-for-engineers/markdown/04-finding-structure-within-a-data-set-data-reduction-and-clustering.qmd`
+  - `mcclarren-machine-learning-for-engineers/markdown/08-unsupervised-learning-with-neural-networks-autoencoders.qmd`
+- Bishop:
+  - `bishop-pattern-recognition-and-machine-learning-2006/markdown/16-continuous-latent-variables.qmd`
+- Murphy:
+  - `murphy-machine-learning-a-probabilistic-perspective-2012/markdown/19-latent-linear-models.qmd`
+
+## Cross-book summary target
+- Use Sandfeld to motivate descriptor engineering through domain knowledge, feature matrices, and curse-of-dimensionality effects.
+- Use Neuer plus McClarren to explain why autoencoder-style learned representations become attractive when hand-crafted features saturate.
+- Use Bishop and Murphy only for latent-variable intuition, not for deep theory.
+- Keep the materials focus on descriptor families such as Magpie and matminer, invariance requirements, and failure modes from multicollinearity or missing nonlocal physics.
+- Exclude architecture-specific training detail; that belongs in later neural-network units.
+
+## 50-slide strategy
+- Slides 1-10: descriptor purpose, chemistry and structure feature families, invariance requirements.
+- Slides 11-22: Magpie/matminer examples, scaling, normalization, correlation, and sparsity.
+- Slides 23-34: where classical descriptors succeed, where they fail, and why.
+- Slides 35-44: transition to learned representations, latent-variable intuition, transferability.
+- Slides 45-50: descriptor-vs-learned-representation exercise and summary.
+
+## Website summary update
+- Heading: `#### Week 4 – Continuum simulations, thermodynamics, and stability (05.05.2026)`
+- Add a summary that links stability concepts to representation choices:
+  - descriptors as compressed carriers of chemistry and structure,
+  - interpretable but limited hand-crafted features,
+  - motivation for moving toward learned representations in later weeks.
@@ -28,3 +28,36 @@ By the end of Unit 5, students can:
 - compare two methodological choices under identical split protocol
 - perform one structured failure analysis and mitigation proposal
 - produce a short report with claims, evidence, and limitations
+
+## Required chapter files
+- Neuer:
+  - `neuer-machine-learning-for-engineers/markdown/04-supervised-learning.qmd` (4.5.1-4.5.5)
+- Sandfeld:
+  - `sandfeld-materials-data-science/markdown/04-part-i-introduction-and-foundations.qmd` (2.2, 3.3)
+- McClarren:
+  - `mcclarren-machine-learning-for-engineers/markdown/05-feed-forward-neural-networks.qmd`
+- Bishop:
+  - `bishop-pattern-recognition-and-machine-learning-2006/markdown/09-neural-networks.qmd`
+- Murphy:
+  - `murphy-machine-learning-a-probabilistic-perspective-2012/markdown/35-deep-learning.qmd`
+
+## Cross-book summary target
+- Use Neuer, McClarren, Bishop, and Murphy to supply the neural-network language needed to explain message passing as a learned nonlinear update rule.
+- Use Sandfeld to keep the materials focus on structure encoding, domain constraints, and why graph objects are natural for crystals.
+- Emphasize node, edge, and global attributes; periodic boundary conditions; and readout functions for property prediction.
+- Compare graph models to descriptor baselines without getting lost in architecture trivia.
+- Exclude advanced GNN math and equivariant formalism beyond intuition.
+
+## 50-slide strategy
+- Slides 1-10: why crystals become graphs, atoms/bonds/global state, periodicity.
+- Slides 11-22: cutoff construction, edge features, invariance and equivariance intuition.
+- Slides 23-34: message passing, CGCNN, MEGNet, SchNet-style intuition.
+- Slides 35-44: over-smoothing, readout choices, transferability, shortcut-learning failures.
+- Slides 45-50: graph-construction exercise and recap.
+
+## Website summary update
+- Heading: `#### Week 6 – Graph-based crystal representations (19.05.2026)`
+- Add a summary covering:
+  - crystals as periodic graphs,
+  - message passing as a structure-property learning mechanism,
+  - the role of cutoffs, invariances, and grouped evaluation.
@@ -28,3 +28,36 @@ By the end of Unit 6, students can:
 - compare two methodological choices under identical split protocol
 - perform one structured failure analysis and mitigation proposal
 - produce a short report with claims, evidence, and limitations
+
+## Required chapter files
+- Neuer:
+  - `neuer-machine-learning-for-engineers/markdown/06-physics-informed-learning.qmd` (6.2, 6.3)
+- Sandfeld:
+  - `sandfeld-materials-data-science/markdown/04-part-i-introduction-and-foundations.qmd` (2.2, 3.3)
+- McClarren:
+  - `mcclarren-machine-learning-for-engineers/markdown/02-linear-models-for-regression-and-classification.qmd`
+- Bishop:
+  - `bishop-pattern-recognition-and-machine-learning-2006/markdown/10-kernel-methods.qmd`
+- Murphy:
+  - `murphy-machine-learning-a-probabilistic-perspective-2012/markdown/21-kernels.qmd`
+
+## Cross-book summary target
+- Use Neuer and Sandfeld to stress that local-environment features are a form of domain-guided data enrichment rather than arbitrary preprocessing.
+- Use Bishop and Murphy to motivate kernel and similarity language for SOAP-like descriptors.
+- Keep the materials content on coordination numbers, Voronoi views, atom-centered descriptors, and aggregation from local to material-level features.
+- Show local descriptors as the bridge between interpretable classical features and learned graph representations.
+- Exclude full kernel derivations and spherical-harmonic details.
+
+## 50-slide strategy
+- Slides 1-10: local vs global structure, neighbor shells, coordination environments.
+- Slides 11-22: bond-length/bond-angle views, Voronoi tessellations, atom-centered features.
+- Slides 23-34: SOAP/ACSF intuition, kernel similarity, aggregation to material-level vectors.
+- Slides 35-44: defects, noise sensitivity, local-environment failure modes, transfer limits.
+- Slides 45-50: descriptor computation exercise and recap.
+
+## Website summary update
+- Heading: `#### Week 7 – Local atomic environments (26.05.2026)`
+- Add a summary covering:
+  - local descriptors as ML-ready fingerprints,
+  - Voronoi/SOAP intuition,
+  - the bridge from interpretable environments to richer learned representations.
@@ -28,3 +28,37 @@ By the end of Unit 7, students can:
 - compare two methodological choices under identical split protocol
 - perform one structured failure analysis and mitigation proposal
 - produce a short report with claims, evidence, and limitations
+
+## Required chapter files
+- Neuer:
+  - `neuer-machine-learning-for-engineers/markdown/04-supervised-learning.qmd` (4.2.2, 4.2.3, 4.5.9)
+- Sandfeld:
+  - `sandfeld-materials-data-science/markdown/06-part-iii-classical-machine-learning.qmd` (12-13)
+- McClarren:
+  - `mcclarren-machine-learning-for-engineers/markdown/02-linear-models-for-regression-and-classification.qmd`
+  - `mcclarren-machine-learning-for-engineers/markdown/03-decision-trees-and-random-forests-for-regression-and-classification.qmd`
+- Bishop:
+  - `bishop-pattern-recognition-and-machine-learning-2006/markdown/07-linear-models-for-regression.qmd`
+- Murphy:
+  - `murphy-machine-learning-a-probabilistic-perspective-2012/markdown/14-linear-regression.qmd`
+
+## Cross-book summary target
+- Use Neuer to define regression tasks, train/test logic, and overfitting control.
+- Use Sandfeld, Bishop, and Murphy for regression geometry, feature matrices, basis functions, and the bias-variance trade-off.
+- Use McClarren for baseline model families that work well in engineering settings.
+- Keep the unit centered on fair comparison of materials-property predictors under grouped chemistry-aware validation.
+- Exclude extended statistical proofs and Bayesian derivations; keep the discussion operational.
+
+## 50-slide strategy
+- Slides 1-10: target selection, regression framing, baseline metrics.
+- Slides 11-22: linear and regularized models, basis expansions, interpretability.
+- Slides 23-34: tree/ensemble baselines, grouped splits, cross-validation, residual analysis.
+- Slides 35-44: bias-variance, overfitting signatures, OOD behavior in chemical space.
+- Slides 45-50: baseline-comparison exercise and summary.
+
+## Website summary update
+- Heading: `#### Week 8 – Regression and generalization in materials data (02.06.2026)`
+- Add a summary covering:
+  - regression as empirical-risk minimization for materials targets,
+  - grouped validation and generalization gaps,
+  - why chemistry-aware splits matter more than raw accuracy.
Original file line number	Diff line number	Diff line change
`@@ -178,3 +178,4 @@ cython_debug/`
`178`	`178`	`.pypirc`
`179`	`179`
`180`	`180`	`/.quarto/`
	`181`	`+.worktrees/`