Skip to content

Commit 02c5d2c

Browse files
committed
Add Unit 1 plans and English slide drafts for SS26 lecture trio
1 parent 9d3beee commit 02c5d2c

File tree

16 files changed

+1021
-0
lines changed

16 files changed

+1021
-0
lines changed
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
---
2+
title: |
3+
Materials Genomics<br>
4+
Unit 1: Materials Data as a Design Space
5+
bibliography: ref.bib
6+
author:
7+
- name: Prof. Dr. Philipp Pelz
8+
affiliation:
9+
- FAU Erlangen-Nürnberg
10+
execute:
11+
eval: false
12+
format:
13+
revealjs:
14+
width: 1920
15+
height: 1080
16+
template-partials:
17+
- title-slide.html
18+
css: custom.css
19+
theme: custom.scss
20+
slide-number: c/t
21+
logo: "eclipse_logo_small.png"
22+
footer: "© Philipp Pelz - Materials Genomics"
23+
---
24+
25+
## Unit 1 goals
26+
27+
By the end of today, you can:
28+
29+
1. Explain what "materials genomics" really means.
30+
2. Navigate core data sources (MP, OQMD, AFLOW, NOMAD).
31+
3. Identify structure-property-learning bottlenecks.
32+
4. Recognize bias and leakage risks in materials datasets.
33+
34+
## From periodic table to design space
35+
36+
- Classical view: materials as isolated compounds
37+
- Genomics view: materials as points in a high-dimensional searchable space
38+
- Objective: accelerate discovery via data-driven surrogate models
39+
40+
::: aside
41+
@sandfeld_materials_data_science; @neuer2024machine
42+
:::
43+
44+
## The core loop
45+
46+
1. Generate/collect data (computation + experiments)
47+
2. Learn structure-property relations
48+
3. Quantify uncertainty
49+
4. Prioritize new candidates
50+
5. Feed results back into data generation
51+
52+
## What data do we actually have?
53+
54+
- Composition-level descriptors
55+
- Crystal structures (CIF, symmetry, local environments)
56+
- Energetics (formation energy, hull distance)
57+
- Electronic/mechanical targets (bandgap, moduli, etc.)
58+
59+
## What data do we *not* have reliably?
60+
61+
- Uniform coverage of chemical space
62+
- Consistent measurement/protocol metadata
63+
- Balanced labels across task difficulty
64+
65+
This is why model confidence often fails in deployment.
66+
67+
## Modeling choices in Unit 1 context
68+
69+
- Task: regression/classification/ranking
70+
- Baseline before complexity
71+
- Split strategy must reflect scientific deployment scenario
72+
73+
## Failure modes you should spot early
74+
75+
- random split leakage by composition family
76+
- benchmark overfitting
77+
- hidden confounders (dataset origin, synthesis route)
78+
79+
## Lecture vs exercise split
80+
81+
**Lecture:** discovery logic + validity criteria
82+
83+
**Exercise:**
84+
- query one database subset
85+
- build first feature table
86+
- run baseline prediction
87+
- diagnose one bias artifact
88+
89+
## Unit 1 takeaways
90+
91+
- Materials genomics is a systems workflow, not just a model choice.
92+
- Data quality and split logic can dominate architecture choice.
93+
- Uncertainty-aware screening is mandatory for credible discovery.
94+
95+
## References
96+
97+
::: {#refs}
98+
:::

materials_genomics/01_intro/ref.bib

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,41 @@ @article{Ye_2020
88
volume={142}, ISSN={0002-7863}, DOI={10.1021/jacs.0c05175},
99
abstractNote={Helical structures are ubiquitous in natural and synthetic materials across multiple length scales. Excellent and sometimes unusual chiral optical, mechanical, and sensing properties have been previously demonstrated in such symmetry-breaking shape, yet a general principle to realize helical structures at the sub-100 nm scale via colloidal synthesis remains underexplored. In this work, we describe the wet-chemical synthesis of monodisperse nanohelices based on gadolinium oxide (Gd2O3). Aberration-corrected electron microscopy revealed that individual nanohelices consist of a bilayer structure with the outer and inner layers derived from the {111} and {100} planes of bulk Gd2O3, respectively. Distinct from existing inorganic nanocoils with flexible bending geometries, the built-in lattice misfit between two adjacent crystal planes induces continuous helical growth yielding three-dimensional rigid nanohelices. Furthermore, the presence of water in the reaction was found to suppress the formation of nanohelices, producing nanoplates expressing predominantly {111} planes. Our study not only provides a bottom-up synthetic route and mechanistic understanding of nanohelices formation but may also open up new possibilities for creating chiral plasmonic nanostructures, luminescent biological labels, and nanoscale transducers.}, number={29}, journal={Journal of the American Chemical Society}, publisher={American Chemical Society}, author={Liu, Yang and Li, Yuda and Jeong, Soojin and Wang, Yi and Chen, Jun and Ye, Xingchen}, year={2020}, month=jul, pages={12777–12783} }
1010
@article{Peng_2022, title={Observation of formation and local structures of metal-organic layers via complementary electron microscopy techniques}, volume={13}, rights={2022 The Author(s)}, ISSN={2041-1723}, DOI={10.1038/s41467-022-32330-z}, abstractNote={Metal-organic layers (MOLs) are highly attractive for application in catalysis, separation, sensing and biomedicine, owing to their tunable framework structure. However, it is challenging to obtain comprehensive information about the formation and local structures of MOLs using standard electron microscopy methods due to serious damage under electron beam irradiation. Here, we investigate the growth processes and local structures of MOLs utilizing a combination of liquid-phase transmission electron microscopy, cryogenic electron microscopy and electron ptychography. Our results show a multistep formation process, where precursor clusters first form in solution, then they are complexed with ligands to form non-crystalline solids, followed by the arrangement of the cluster-ligand complex into crystalline sheets, with additional possible growth by the addition of clusters to surface edges. Moreover, high-resolution imaging allows us to identify missing clusters, dislocations, loop and flat surface terminations and ligand connectors in the MOLs. Our observations provide insights into controllable MOL crystal morphology, defect engineering, and surface modification, thus assisting novel MOL design and synthesis.}, number={1}, journal={Nature Communications}, publisher={Nature Publishing Group}, author={Peng, Xinxing and Pelz, Philipp M. and Zhang, Qiubo and Chen, Peican and Cao, Lingyun and Zhang, Yaqian and Liao, Hong-Gang and Zheng, Haimei and Wang, Cheng and Sun, Shi-Gang and Scott, Mary C.}, year={2022}, month=sept, pages={5197}, language={en} }
11+
@book{neuer2024machine,
12+
title={Machine Learning for Engineers: Introduction to Physics-Informed, Explainable Learning Methods for AI in Engineering Applications},
13+
author={Neuer, Michael and others},
14+
year={2024},
15+
publisher={Springer Nature}
16+
}
17+
18+
@book{ryan2021machine,
19+
title={Machine Learning for Engineers: Using Data to Solve Problems for Physical Systems},
20+
author={McClarren, Ryan G.},
21+
year={2021},
22+
publisher={Springer}
23+
}
24+
25+
@book{murphy2012machine,
26+
title={Machine Learning: A Probabilistic Perspective},
27+
author={Murphy, Kevin P.},
28+
year={2012},
29+
publisher={MIT Press}
30+
}
31+
32+
@book{bishop2006pattern,
33+
title={Pattern Recognition and Machine Learning},
34+
author={Bishop, Christopher M.},
35+
year={2006},
36+
publisher={Springer}
37+
}
38+
39+
@book{sandfeld_materials_data_science,
40+
title={Materials Data Science},
41+
author={Sandfeld, Stefan and others},
42+
year={2024},
43+
publisher={Springer}
44+
}
45+
1146
@article{sanchez2025phase,
1247
title={Phase Imaging Methods in the Scanning Transmission Electron Microscope},
1348
author={Sanchez-Santolino, Gabriel and Clark, Laura and Toyama, Satoko and Seki, Takehito and Shibata, Naoya},
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Unit 1 Plan — Materials Genomics
2+
3+
## Audience + constraints
4+
- BSc AI-Material Technology, 5th semester
5+
- Prior knowledge: 2 semesters math, 1 physics, some chemistry
6+
- Assumed: basic undergrad math, SVD familiarity, very basic Python
7+
- Lecture: 90 minutes + 90-minute exercise
8+
- Language: English (German translation later)
9+
10+
## Learning objectives (Unit 1)
11+
By the end of this unit, students can:
12+
1. Explain the “genomics” analogy for materials discovery without oversimplification.
13+
2. Describe the structure–processing–property-performance graph as a data system.
14+
3. Identify core materials databases and the types of targets they contain.
15+
4. Recognize key sources of bias/incompleteness in materials datasets.
16+
5. Formulate a first ML-ready materials query pipeline.
17+
18+
## Book-aligned content mapping
19+
1. Neuer (2024): model framing, data-based modeling, uncertainty and limits.
20+
2. Sandfeld: materials data science pipeline and domain-specific examples.
21+
3. McClarren (2021): model realism in physical systems.
22+
4. Murphy/Bishop: supervised/unsupervised framing language for later units.
23+
24+
## 90-minute structure
25+
- 0–10 min: Course position in SS26 triad (MFML + ML-PC + MG)
26+
- 10–25 min: What is materials genomics? promises vs reality
27+
- 25–45 min: Data assets: MP/OQMD/AFLOW/NOMAD and core target quantities
28+
- 45–60 min: Representation question: composition/structure/process as ML input
29+
- 60–75 min: Biases, leakage, domain shift, and scientific validity
30+
- 75–85 min: Discovery loop: screening → uncertainty → experiment feedback
31+
- 85–90 min: Unit summary + exercise briefing
32+
33+
## Exercise (90 min)
34+
- Query a materials database (or prepared subset)
35+
- Build first feature table (composition + simple metadata)
36+
- Explore one target (bandgap or formation energy)
37+
- Diagnose one bias artifact and document mitigation
38+
39+
## Assessment alignment
40+
- Written exam: conceptual precision (not coding trivia)
41+
- Students should be able to defend model/data choices scientifically.
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: |
3+
Mathematical Foundations of AI & ML<br>
4+
Unit 1: What Learning Means in Engineering and Materials
5+
bibliography: ref.bib
6+
author:
7+
- name: Prof. Dr. Philipp Pelz
8+
affiliation:
9+
- FAU Erlangen-Nürnberg
10+
execute:
11+
eval: false
12+
format:
13+
revealjs:
14+
width: 1920
15+
height: 1080
16+
template-partials:
17+
- title-slide.html
18+
css: custom.css
19+
theme: custom.scss
20+
slide-number: c/t
21+
logo: "eclipse_logo_small.png"
22+
footer: "© Philipp Pelz - Mathematical Foundations of AI & ML"
23+
---
24+
25+
## Unit 1 goals
26+
27+
By the end of today, you should be able to:
28+
29+
1. Define ML as an optimization problem under uncertainty.
30+
2. Explain model, loss, risk, regularization, and generalization in one pipeline.
31+
3. Distinguish **fit quality** from **scientific validity**.
32+
4. Map materials tasks to regression/classification/representation learning.
33+
34+
## Why this lecture matters for KI-Materialtechnologie
35+
36+
- You already know math tools (linear algebra, SVD, calculus).
37+
- Now we reframe them as **learning machinery**.
38+
- This course is the foundation layer for:
39+
- Materials Genomics
40+
- ML for Characterization and Processing
41+
42+
## Data analysis vs machine learning
43+
44+
- **Data analysis:** summarize/explain observed data.
45+
- **Machine learning:** learn a function that generalizes to unseen data.
46+
- **Scientific ML:** learning must stay consistent with physics and uncertainty.
47+
48+
::: aside
49+
@neuer2024machine; @murphy2012machine; @bishop2006pattern
50+
:::
51+
52+
## Supervised learning formalized
53+
54+
Given data $\mathcal{D}=\{(x_i,y_i)\}_{i=1}^N$, choose model $f_\theta$ by minimizing empirical risk:
55+
56+
$$
57+
\hat{\theta}=\arg\min_\theta \frac{1}{N}\sum_{i=1}^N \ell\big(f_\theta(x_i),y_i\big) + \lambda\Omega(\theta)
58+
$$
59+
60+
- $\ell$: loss function
61+
- $\Omega$: regularization
62+
- $\lambda$: complexity control
63+
64+
## The minimum viable ML pipeline
65+
66+
1. Problem definition + target variable
67+
2. Data curation + split strategy
68+
3. Baseline model
69+
4. Validation + error analysis
70+
5. Model revision with scientific constraints
71+
72+
## Generalization: the central question
73+
74+
- Good training error is not enough.
75+
- We care about expected future error on unseen data.
76+
- Key dangers:
77+
- data leakage
78+
- overfitting
79+
- distribution shift
80+
81+
## Materials-flavored examples
82+
83+
- Regression: predict hardness from process parameters
84+
- Classification: defect class from microscopy images
85+
- Structured prediction: spectra-to-composition mapping
86+
87+
**Same math core, different data geometry + uncertainty.**
88+
89+
## What goes into lecture vs exercise
90+
91+
**Lecture (essential):**
92+
- conceptual framework and notation
93+
- why methods succeed/fail
94+
- interpretation and scientific trust
95+
96+
**Exercise (practice):**
97+
- coding gradient descent
98+
- split strategy experiments
99+
- regularization tradeoffs
100+
101+
## Exercise handoff (90 min)
102+
103+
- Implement linear regression in NumPy
104+
- Compare underfit vs overfit using polynomial features
105+
- Evaluate train/validation/test behavior
106+
- Add L2 regularization and discuss model selection
107+
108+
## Key takeaways
109+
110+
- ML in engineering is **modeling + optimization + uncertainty reasoning**.
111+
- Generalization and validity matter more than leaderboard scores.
112+
- Unit 1 defines language and rigor for all following units.
113+
114+
## References
115+
116+
::: {#refs}
117+
:::
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
format:
2+
revealjs:
3+
menu: false
4+
progress: false
5+
search: false
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
<?xml version="1.0" encoding="utf-8"?>
2+
<style xmlns="http://purl.org/net/xbiblio/csl" class="in-text" version="1.0">
3+
<info>
4+
<title>Custom In-text Author-Date with DOI</title>
5+
<id>http://www.zotero.org/styles/custom-intext-doi</id>
6+
<link href="http://www.zotero.org/styles/nature" rel="template"/>
7+
<author>
8+
<name>Your Name</name>
9+
</author>
10+
<category citation-format="author-date"/>
11+
<updated>2025-03-31T00:00:00+00:00</updated>
12+
</info>
13+
14+
<macro name="author-in-text">
15+
<names variable="author">
16+
<name form="long" name-as-sort-order="first" et-al-min="2" et-al-use-first="1" delimiter=", "/>
17+
</names>
18+
</macro>
19+
20+
<macro name="year">
21+
<date variable="issued">
22+
<date-part name="year"/>
23+
</date>
24+
</macro>
25+
26+
<macro name="in-text-citation">
27+
<group delimiter=", ">
28+
<text macro="author-in-text"/>
29+
<group prefix="(" suffix=")">
30+
<text macro="year"/>
31+
</group>
32+
<text prefix="doi:" variable="DOI"/>
33+
</group>
34+
</macro>
35+
36+
<citation>
37+
<layout delimiter="; ">
38+
<text macro="in-text-citation"/>
39+
</layout>
40+
</citation>
41+
42+
<bibliography>
43+
<style-options>
44+
<option name="et-al-min" value="6"/>
45+
<option name="et-al-use-first" value="1"/>
46+
<option name="et-al-subsequent-min" value="3"/>
47+
<option name="et-al-subsequent-use-first" value="1"/>
48+
</style-options>
49+
<layout suffix="." delimiter="; ">
50+
<text variable="title" suffix=", "/>
51+
<text variable="container-title" suffix=", "/>
52+
<names variable="author">
53+
<name and="symbol" delimiter=", "/>
54+
</names>
55+
<date variable="issued" prefix=" (" suffix=")"/>
56+
<text variable="DOI" prefix=" https://doi.org/"/>
57+
</layout>
58+
</bibliography>
59+
</style>

0 commit comments

Comments
 (0)