ECLIPSE-Lab
diff --git a/‎materials_genomics/01_intro/01_intro.qmd‎
Lines changed: 98 additions & 0 deletions b/‎materials_genomics/01_intro/01_intro.qmd‎
Lines changed: 98 additions & 0 deletions
diff --git a/‎materials_genomics/01_intro/ref.bib‎
Lines changed: 35 additions & 0 deletions b/‎materials_genomics/01_intro/ref.bib‎
Lines changed: 35 additions & 0 deletions
diff --git a/‎materials_genomics/01_intro/unit1_plan.md‎
Lines changed: 41 additions & 0 deletions b/‎materials_genomics/01_intro/unit1_plan.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎mathematical_foundations_of_ai_and_ml/01_intro/01_intro.qmd‎
Lines changed: 117 additions & 0 deletions b/‎mathematical_foundations_of_ai_and_ml/01_intro/01_intro.qmd‎
Lines changed: 117 additions & 0 deletions
diff --git a/‎mathematical_foundations_of_ai_and_ml/01_intro/_metadata.yml‎
Lines changed: 5 additions & 0 deletions b/‎mathematical_foundations_of_ai_and_ml/01_intro/_metadata.yml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎mathematical_foundations_of_ai_and_ml/01_intro/custom.csl‎
Lines changed: 59 additions & 0 deletions b/‎mathematical_foundations_of_ai_and_ml/01_intro/custom.csl‎
Lines changed: 59 additions & 0 deletions
@@ -0,0 +1,98 @@
+---
+title: |
+  Materials Genomics<br>
+  Unit 1: Materials Data as a Design Space
+bibliography: ref.bib
+author:
+  - name: Prof. Dr. Philipp Pelz
+    affiliation:
+      - FAU Erlangen-Nürnberg
+execute:
+  eval: false
+format:
+  revealjs:
+    width: 1920
+    height: 1080
+    template-partials:
+      - title-slide.html
+    css: custom.css
+    theme: custom.scss
+    slide-number: c/t
+    logo: "eclipse_logo_small.png"
+    footer: "© Philipp Pelz - Materials Genomics"
+---
+
+## Unit 1 goals
+
+By the end of today, you can:
+
+1. Explain what "materials genomics" really means.
+2. Navigate core data sources (MP, OQMD, AFLOW, NOMAD).
+3. Identify structure-property-learning bottlenecks.
+4. Recognize bias and leakage risks in materials datasets.
+
+## From periodic table to design space
+
+- Classical view: materials as isolated compounds
+- Genomics view: materials as points in a high-dimensional searchable space
+- Objective: accelerate discovery via data-driven surrogate models
+
+::: aside
+@sandfeld_materials_data_science; @neuer2024machine
+:::
+
+## The core loop
+
+1. Generate/collect data (computation + experiments)
+2. Learn structure-property relations
+3. Quantify uncertainty
+4. Prioritize new candidates
+5. Feed results back into data generation
+
+## What data do we actually have?
+
+- Composition-level descriptors
+- Crystal structures (CIF, symmetry, local environments)
+- Energetics (formation energy, hull distance)
+- Electronic/mechanical targets (bandgap, moduli, etc.)
+
+## What data do we *not* have reliably?
+
+- Uniform coverage of chemical space
+- Consistent measurement/protocol metadata
+- Balanced labels across task difficulty
+
+This is why model confidence often fails in deployment.
+
+## Modeling choices in Unit 1 context
+
+- Task: regression/classification/ranking
+- Baseline before complexity
+- Split strategy must reflect scientific deployment scenario
+
+## Failure modes you should spot early
+
+- random split leakage by composition family
+- benchmark overfitting
+- hidden confounders (dataset origin, synthesis route)
+
+## Lecture vs exercise split
+
+**Lecture:** discovery logic + validity criteria
+
+**Exercise:**
+- query one database subset
+- build first feature table
+- run baseline prediction
+- diagnose one bias artifact
+
+## Unit 1 takeaways
+
+- Materials genomics is a systems workflow, not just a model choice.
+- Data quality and split logic can dominate architecture choice.
+- Uncertainty-aware screening is mandatory for credible discovery.
+
+## References
+
+::: {#refs}
+:::
@@ -8,6 +8,41 @@ @article{Ye_2020
   volume={142}, ISSN={0002-7863}, DOI={10.1021/jacs.0c05175}, 
   abstractNote={Helical structures are ubiquitous in natural and synthetic materials across multiple length scales. Excellent and sometimes unusual chiral optical, mechanical, and sensing properties have been previously demonstrated in such symmetry-breaking shape, yet a general principle to realize helical structures at the sub-100 nm scale via colloidal synthesis remains underexplored. In this work, we describe the wet-chemical synthesis of monodisperse nanohelices based on gadolinium oxide (Gd2O3). Aberration-corrected electron microscopy revealed that individual nanohelices consist of a bilayer structure with the outer and inner layers derived from the {111} and {100} planes of bulk Gd2O3, respectively. Distinct from existing inorganic nanocoils with flexible bending geometries, the built-in lattice misfit between two adjacent crystal planes induces continuous helical growth yielding three-dimensional rigid nanohelices. Furthermore, the presence of water in the reaction was found to suppress the formation of nanohelices, producing nanoplates expressing predominantly {111} planes. Our study not only provides a bottom-up synthetic route and mechanistic understanding of nanohelices formation but may also open up new possibilities for creating chiral plasmonic nanostructures, luminescent biological labels, and nanoscale transducers.}, number={29}, journal={Journal of the American Chemical Society}, publisher={American Chemical Society}, author={Liu, Yang and Li, Yuda and Jeong, Soojin and Wang, Yi and Chen, Jun and Ye, Xingchen}, year={2020}, month=jul, pages={12777–12783} }
  @article{Peng_2022, title={Observation of formation and local structures of metal-organic layers via complementary electron microscopy techniques}, volume={13}, rights={2022 The Author(s)}, ISSN={2041-1723}, DOI={10.1038/s41467-022-32330-z}, abstractNote={Metal-organic layers (MOLs) are highly attractive for application in catalysis, separation, sensing and biomedicine, owing to their tunable framework structure. However, it is challenging to obtain comprehensive information about the formation and local structures of MOLs using standard electron microscopy methods due to serious damage under electron beam irradiation. Here, we investigate the growth processes and local structures of MOLs utilizing a combination of liquid-phase transmission electron microscopy, cryogenic electron microscopy and electron ptychography. Our results show a multistep formation process, where precursor clusters first form in solution, then they are complexed with ligands to form non-crystalline solids, followed by the arrangement of the cluster-ligand complex into crystalline sheets, with additional possible growth by the addition of clusters to surface edges. Moreover, high-resolution imaging allows us to identify missing clusters, dislocations, loop and flat surface terminations and ligand connectors in the MOLs. Our observations provide insights into controllable MOL crystal morphology, defect engineering, and surface modification, thus assisting novel MOL design and synthesis.}, number={1}, journal={Nature Communications}, publisher={Nature Publishing Group}, author={Peng, Xinxing and Pelz, Philipp M. and Zhang, Qiubo and Chen, Peican and Cao, Lingyun and Zhang, Yaqian and Liao, Hong-Gang and Zheng, Haimei and Wang, Cheng and Sun, Shi-Gang and Scott, Mary C.}, year={2022}, month=sept, pages={5197}, language={en} }
+@book{neuer2024machine,
+  title={Machine Learning for Engineers: Introduction to Physics-Informed, Explainable Learning Methods for AI in Engineering Applications},
+  author={Neuer, Michael and others},
+  year={2024},
+  publisher={Springer Nature}
+}
+
+@book{ryan2021machine,
+  title={Machine Learning for Engineers: Using Data to Solve Problems for Physical Systems},
+  author={McClarren, Ryan G.},
+  year={2021},
+  publisher={Springer}
+}
+
+@book{murphy2012machine,
+  title={Machine Learning: A Probabilistic Perspective},
+  author={Murphy, Kevin P.},
+  year={2012},
+  publisher={MIT Press}
+}
+
+@book{bishop2006pattern,
+  title={Pattern Recognition and Machine Learning},
+  author={Bishop, Christopher M.},
+  year={2006},
+  publisher={Springer}
+}
+
+@book{sandfeld_materials_data_science,
+  title={Materials Data Science},
+  author={Sandfeld, Stefan and others},
+  year={2024},
+  publisher={Springer}
+}
+
 @article{sanchez2025phase,
   title={Phase Imaging Methods in the Scanning Transmission Electron Microscope},
   author={Sanchez-Santolino, Gabriel and Clark, Laura and Toyama, Satoko and Seki, Takehito and Shibata, Naoya},
 
@@ -0,0 +1,41 @@
+# Unit 1 Plan — Materials Genomics
+
+## Audience + constraints
+- BSc AI-Material Technology, 5th semester
+- Prior knowledge: 2 semesters math, 1 physics, some chemistry
+- Assumed: basic undergrad math, SVD familiarity, very basic Python
+- Lecture: 90 minutes + 90-minute exercise
+- Language: English (German translation later)
+
+## Learning objectives (Unit 1)
+By the end of this unit, students can:
+1. Explain the “genomics” analogy for materials discovery without oversimplification.
+2. Describe the structure–processing–property-performance graph as a data system.
+3. Identify core materials databases and the types of targets they contain.
+4. Recognize key sources of bias/incompleteness in materials datasets.
+5. Formulate a first ML-ready materials query pipeline.
+
+## Book-aligned content mapping
+1. Neuer (2024): model framing, data-based modeling, uncertainty and limits.
+2. Sandfeld: materials data science pipeline and domain-specific examples.
+3. McClarren (2021): model realism in physical systems.
+4. Murphy/Bishop: supervised/unsupervised framing language for later units.
+
+## 90-minute structure
+- 0–10 min: Course position in SS26 triad (MFML + ML-PC + MG)
+- 10–25 min: What is materials genomics? promises vs reality
+- 25–45 min: Data assets: MP/OQMD/AFLOW/NOMAD and core target quantities
+- 45–60 min: Representation question: composition/structure/process as ML input
+- 60–75 min: Biases, leakage, domain shift, and scientific validity
+- 75–85 min: Discovery loop: screening → uncertainty → experiment feedback
+- 85–90 min: Unit summary + exercise briefing
+
+## Exercise (90 min)
+- Query a materials database (or prepared subset)
+- Build first feature table (composition + simple metadata)
+- Explore one target (bandgap or formation energy)
+- Diagnose one bias artifact and document mitigation
+
+## Assessment alignment
+- Written exam: conceptual precision (not coding trivia)
+- Students should be able to defend model/data choices scientifically.
@@ -0,0 +1,117 @@
+---
+title: |
+  Mathematical Foundations of AI & ML<br>
+  Unit 1: What Learning Means in Engineering and Materials
+bibliography: ref.bib
+author:
+  - name: Prof. Dr. Philipp Pelz
+    affiliation:
+      - FAU Erlangen-Nürnberg
+execute:
+  eval: false
+format:
+  revealjs:
+    width: 1920
+    height: 1080
+    template-partials:
+      - title-slide.html
+    css: custom.css
+    theme: custom.scss
+    slide-number: c/t
+    logo: "eclipse_logo_small.png"
+    footer: "© Philipp Pelz - Mathematical Foundations of AI & ML"
+---
+
+## Unit 1 goals
+
+By the end of today, you should be able to:
+
+1. Define ML as an optimization problem under uncertainty.
+2. Explain model, loss, risk, regularization, and generalization in one pipeline.
+3. Distinguish **fit quality** from **scientific validity**.
+4. Map materials tasks to regression/classification/representation learning.
+
+## Why this lecture matters for KI-Materialtechnologie
+
+- You already know math tools (linear algebra, SVD, calculus).
+- Now we reframe them as **learning machinery**.
+- This course is the foundation layer for:
+  - Materials Genomics
+  - ML for Characterization and Processing
+
+## Data analysis vs machine learning
+
+- **Data analysis:** summarize/explain observed data.
+- **Machine learning:** learn a function that generalizes to unseen data.
+- **Scientific ML:** learning must stay consistent with physics and uncertainty.
+
+::: aside
+@neuer2024machine; @murphy2012machine; @bishop2006pattern
+:::
+
+## Supervised learning formalized
+
+Given data $\mathcal{D}=\{(x_i,y_i)\}_{i=1}^N$, choose model $f_\theta$ by minimizing empirical risk:
+
+$$
+\hat{\theta}=\arg\min_\theta \frac{1}{N}\sum_{i=1}^N \ell\big(f_\theta(x_i),y_i\big) + \lambda\Omega(\theta)
+$$
+
+- $\ell$: loss function
+- $\Omega$: regularization
+- $\lambda$: complexity control
+
+## The minimum viable ML pipeline
+
+1. Problem definition + target variable
+2. Data curation + split strategy
+3. Baseline model
+4. Validation + error analysis
+5. Model revision with scientific constraints
+
+## Generalization: the central question
+
+- Good training error is not enough.
+- We care about expected future error on unseen data.
+- Key dangers:
+  - data leakage
+  - overfitting
+  - distribution shift
+
+## Materials-flavored examples
+
+- Regression: predict hardness from process parameters
+- Classification: defect class from microscopy images
+- Structured prediction: spectra-to-composition mapping
+
+**Same math core, different data geometry + uncertainty.**
+
+## What goes into lecture vs exercise
+
+**Lecture (essential):**
+- conceptual framework and notation
+- why methods succeed/fail
+- interpretation and scientific trust
+
+**Exercise (practice):**
+- coding gradient descent
+- split strategy experiments
+- regularization tradeoffs
+
+## Exercise handoff (90 min)
+
+- Implement linear regression in NumPy
+- Compare underfit vs overfit using polynomial features
+- Evaluate train/validation/test behavior
+- Add L2 regularization and discuss model selection
+
+## Key takeaways
+
+- ML in engineering is **modeling + optimization + uncertainty reasoning**.
+- Generalization and validity matter more than leaderboard scores.
+- Unit 1 defines language and rigor for all following units.
+
+## References
+
+::: {#refs}
+:::
@@ -0,0 +1,5 @@
+format:
+  revealjs: 
+    menu: false
+    progress: false
+search: false
@@ -0,0 +1,59 @@
+<?xml version="1.0" encoding="utf-8"?>
+<style xmlns="http://purl.org/net/xbiblio/csl" class="in-text" version="1.0">
+  <info>
+    <title>Custom In-text Author-Date with DOI</title>
+    <id>http://www.zotero.org/styles/custom-intext-doi</id>
+    <link href="http://www.zotero.org/styles/nature" rel="template"/>
+    <author>
+      <name>Your Name</name>
+    </author>
+    <category citation-format="author-date"/>
+    <updated>2025-03-31T00:00:00+00:00</updated>
+  </info>
+
+  <macro name="author-in-text">
+    <names variable="author">
+      <name form="long" name-as-sort-order="first" et-al-min="2" et-al-use-first="1" delimiter=", "/>
+    </names>
+  </macro>
+
+  <macro name="year">
+    <date variable="issued">
+      <date-part name="year"/>
+    </date>
+  </macro>
+
+  <macro name="in-text-citation">
+    <group delimiter=", ">
+      <text macro="author-in-text"/>
+      <group prefix="(" suffix=")">
+        <text macro="year"/>
+      </group>
+      <text prefix="doi:" variable="DOI"/>
+    </group>
+  </macro>
+
+  <citation>
+    <layout delimiter="; ">
+      <text macro="in-text-citation"/>
+    </layout>
+  </citation>
+
+  <bibliography>
+    <style-options>
+      <option name="et-al-min" value="6"/>
+      <option name="et-al-use-first" value="1"/>
+      <option name="et-al-subsequent-min" value="3"/>
+      <option name="et-al-subsequent-use-first" value="1"/>
+    </style-options>
+    <layout suffix="." delimiter="; ">
+      <text variable="title" suffix=", "/>
+      <text variable="container-title" suffix=", "/>
+      <names variable="author">
+        <name and="symbol" delimiter=", "/>
+      </names>
+      <date variable="issued" prefix=" (" suffix=")"/>
+      <text variable="DOI" prefix=" https://doi.org/"/>
+    </layout>
+  </bibliography>
+</style>