diff --git a/paper/chemiscope-v1.0-overview.svg b/paper/chemiscope-v1.0-overview.svg new file mode 100644 index 000000000..838cb85bd --- /dev/null +++ b/paper/chemiscope-v1.0-overview.svg @@ -0,0 +1,4 @@ + + + +
a
b
c
d
\ No newline at end of file diff --git a/paper/chemiscope-v1.0.svg b/paper/chemiscope-v1.0.svg new file mode 100644 index 000000000..9306ae5f7 --- /dev/null +++ b/paper/chemiscope-v1.0.svg @@ -0,0 +1,4 @@ + + + +
chemiscope v1.0
web app
streamlit component
sphinx-gallery docs
jupyter widget
chemfiles
\ No newline at end of file diff --git a/paper/paper-2020/paper-2020.md b/paper/paper-2020/paper-2020.md new file mode 100644 index 000000000..911a552c6 --- /dev/null +++ b/paper/paper-2020/paper-2020.md @@ -0,0 +1,120 @@ +--- +title: 'Chemiscope: interactive structure-property explorer for materials and molecules' +tags: + - TypeScript + - JavaScript + - chemistry + - material science + - machine learning +authors: + - name: Guillaume Fraux + orcid: 0000-0003-4824-6512 + affiliation: 1 + - name: Rose K. Cersonsky + orcid: 0000-0003-4515-3441 + affiliation: 1 + - name: Michele Ceriotti + orcid: 0000-0003-2571-2832 + affiliation: 1 +affiliations: + - name: Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland + index: 1 +date: 30 January 2020 +bibliography: paper.bib +--- + +# Summary + +The number of materials or molecules that can be created by combining different +chemical elements in various proportions and spatial arrangements is enormous. +Computational chemistry can be used to generate databases containing billions of +potential structures [@Ruddigkeit2012], and predict some of the associated +properties [@Montavon2013; @Ramakrishnan2014]. Unfortunately, the very large +number of structures makes exploring such database — to understand +structure-property relations or find the _best_ structure for a given +application — a daunting task. In recent years, multiple molecular +_representations_ [@Behler2007; @Bartok2013; @Willatt2019] have been developed +to compute structural similarities between materials or molecules, incorporating +physically-relevant information and symmetries. The features associated with +these representations can be used for unsupervised machine learning +applications, such as clustering or classification of the different structures, +and high-throughput screening of database for specific properties [@Maier2007; +@De2017; @Hautier2019]. Unfortunately, the dimensionality of these features (as +well as most of other descriptors used in chemical and materials informatics) is +very high, which makes the resulting classifications, clustering or mapping very +hard to visualize. Dimensionality reduction algorithms [@Schlkopf1998; +@Ceriotti2011; @McInnes2018] can reduce the number of relevant dimensions to a +handful, creating 2D or 3D maps of the full database. + +![The Qm7b database [@Montavon2013] visualized with chemiscope](screenshot.png) + +Chemiscope is a graphical tool for the interactive exploration of materials and +molecular databases, correlating local and global structural descriptors with +the physical properties of the different systems. The interface consists of +two panels. The left panel displays a 2D or 3D scatter plot, in which each +point corresponds to a chemical entity. The axes, color, and style of each point +can be set to represent a property or a structural descriptor to visualize +structure-property relations directly. Structural descriptors are not computed +directly by chemiscope, but must be obtained from one of the many codes +implementing general-purpose atomic representation [@librascal; @QUIP] or more specialized descriptors. Since the most common +descriptors can be very high dimensional, it can be convenient to apply a +dimensionality reduction algorithm that maps them to a lower-dimensional space +for easier visualization. For example the sketch-map algorithm [@Ceriotti2011] +was used with the Smooth Overlap of Atomic Positions representation [@Bartok2013] to +generate the visualization in Figure 1. The right panel displays the +three-dimensional structure of the chemical entities, possibly including +periodic repetition for crystals. Visualizing the chemical structure can help +in finding an intuitive rationalization of the layout of the dataset and the +structure-property relations. + +Whereas similar tools [@Gong2013; @Gutlein2014; @Probst2017; @ISV] only allow +visualizing maps and structures in which each data point corresponds to a +molecule, or a crystal structure, a distinctive feature of chemiscope is the +possibility of visualizing maps in which points correspond to atom-centred +environments. This is useful, for instance, to rationalize the relationship +between structure and atomic properties such as nuclear chemical shieldings +(Figure 2). This is also useful as a diagnostic tool for the many +machine-learning schemes that decompose properties into atom-centred +contributions [@Behler2007; @Bartok2010]. + +![Database of chemical shieldings [@Paruzzo2018] in chemiscope demonstrating the use of a 3D plot and highlighting of atomic environments](./screenshot-3d.png) + +Chemiscope took strong inspiration from a previous similar graphical software, +the interactive sketch-map visualizer [@ISV]. This previous software was used in +multiple research publication, related to the exploration of large-scale +databases, and the mapping of structure-property relationships [@De2016; +@De2017; @Musil2018]. + +# Implementation + +Chemiscope is implemented using the web platform: HTML5, CSS and WebGL to +display graphical elements, and TypeScript (compiled to JavaScript) for +interactivity. It uses [Plotly.js](https://plot.ly/javascript/) to render and +animate 2D and 3D plots; and the JavaScript version of [Jmol](http://jmol.org/) +to display atomic structures. The visualization is fast enough to be used with +datasets containing up to a million points, reacting to user input within a few +hundred milliseconds in the default 2D mode. More elaborate visualizations are +slower, while still handling 100k points easily. + +The use of web technologies makes chemiscope usable from different operating +systems without the need to develop, maintain and package the code for each +operating system. It also means that we can provide an online service at +http://chemiscope.org that allows users to visualize their own dataset without any +local installation. Chemiscope is implemented as a library of re-usable +components linked together via callbacks. This makes it easy to modify the +default interface to generate more elaborate visualizations, for example, +displaying multiple maps generated with different parameters of a dimensionality +reduction algorithm. Chemiscope can also be distributed in a standalone mode, +where the code and a predefined dataset are merged together as a single HTML +file. This standalone mode is useful for archival purposes, for example as +supplementary information for a published article and for use in corporate +environments with sensitive datasets. + +# Acknowledgements + +The development of chemiscope have been funded by the [NCCR +MARVEL](http://nccr-marvel.ch/), the [MAX](http://max-centre.eu/) European +centre of excellence, and the European Research Council (Horizon 2020 grant +agreement no. 677013-HBMAP). + +# References diff --git a/paper/paper-2020/paper.bib b/paper/paper-2020/paper.bib new file mode 100644 index 000000000..c2635cec3 --- /dev/null +++ b/paper/paper-2020/paper.bib @@ -0,0 +1,273 @@ +@article{Ceriotti2011, + doi = {10.1073/pnas.1108486108}, + url = {https://doi.org/10.1073/pnas.1108486108}, + year = {2011}, + month = {07}, + publisher = {Proceedings of the National Academy of Sciences}, + volume = {108}, + number = {32}, + pages = {13023--13028}, + author = {Michele Ceriotti and Gareth A. Tribello and Michele Parrinello}, + title = {Simplifying the representation of complex free-energy landscapes using sketch-map}, + journal = {Proceedings of the National Academy of Sciences} +} + +@article{Bartok2013, + doi = {10.1103/physrevb.87.184115}, + url = {https://doi.org/10.1103/physrevb.87.184115}, + year = {2013}, + month = {05}, + publisher = {American Physical Society ({APS})}, + volume = {87}, + number = {18}, + author = {Albert P. Bart{\'{o}}k and Risi Kondor and G{\'{a}}bor Cs{\'{a}}nyi}, + title = {On representing chemical environments}, + journal = {Physical Review B} +} + +@article{Montavon2013, + doi = {10.1088/1367-2630/15/9/095003}, + url = {https://doi.org/10.1088/1367-2630/15/9/095003}, + year = {2013}, + month = {09}, + publisher = {{IOP} Publishing}, + volume = {15}, + number = {9}, + pages = {095003}, + author = {Grégoire Montavon and Matthias Rupp and Vivekanand Gobre and Alvaro Vazquez-Mayagoitia and Katja Hansen and Alexandre Tkatchenko and Klaus-Robert M\"{u}ller and O Anatole von Lilienfeld}, + title = {Machine learning of molecular electronic properties in chemical compound space}, + journal = {New Journal of Physics} +} + +@article{Gutlein2014, + doi = {10.1186/s13321-014-0041-7}, + url = {https://doi.org/10.1186/s13321-014-0041-7}, + year = {2014}, + month = sep, + publisher = {Springer Science and Business Media {LLC}}, + volume = {6}, + number = {1}, + author = {Martin G\"{u}tlein and Andreas Karwath and Stefan Kramer}, + title = {{CheS}-Mapper 2.0 for visual validation of (Q){SAR} models}, + journal = {Journal of Cheminformatics} +} + +@article{Probst2017, + doi = {10.1093/bioinformatics/btx760}, + url = {https://doi.org/10.1093/bioinformatics/btx760}, + year = {2017}, + month = {10}, + publisher = {Oxford University Press ({OUP})}, + volume = {34}, + number = {8}, + pages = {1433--1435}, + author = {Daniel Probst and Jean-Louis Reymond}, + editor = {Jonathan Wren}, + title = {{FUn}: a framework for interactive visualizations of large, high-dimensional datasets on the web}, + journal = {Bioinformatics} +} + +@article{Gong2013, + doi = {10.1093/bioinformatics/btt270}, + url = {https://doi.org/10.1093/bioinformatics/btt270}, + year = {2013}, + month = {05}, + publisher = {Oxford University Press ({OUP})}, + volume = {29}, + number = {14}, + pages = {1827--1829}, + author = {Jiayu Gong and Chaoqian Cai and Xiaofeng Liu and Xin Ku and Hualiang Jiang and Daqi Gao and Honglin Li}, + title = {{ChemMapper}: a versatile web server for exploring pharmacology and chemical structure association based on molecular 3D similarity method}, + journal = {Bioinformatics} +} + +@article{Paruzzo2018, + doi = {10.1038/s41467-018-06972-x}, + url = {https://doi.org/10.1038/s41467-018-06972-x}, + year = {2018}, + month = oct, + publisher = {Springer Science and Business Media {LLC}}, + volume = {9}, + number = {1}, + author = {Federico M. Paruzzo and Albert Hofstetter and Félix Musil and Sandip De and Michele Ceriotti and Lyndon Emsley}, + title = {Chemical shifts in molecular solids by machine learning}, + journal = {Nature Communications} +} + +@software{ISV, + author = {De, Sandip and Ceriotti, Michele}, + title = {Interactive Sketchmap Visualizer}, + publisher = {Zenodo}, + year = {2019}, + version = {1.0.0}, + doi = {10.5281/zenodo.3541831}, + url = {https://doi.org/10.5281/zenodo.3541831} +} + +@article{De2016, + doi = {10.1039/c6cp00415f}, + url = {https://doi.org/10.1039/c6cp00415f}, + year = {2016}, + publisher = {Royal Society of Chemistry ({RSC})}, + volume = {18}, + number = {20}, + pages = {13754--13769}, + author = {Sandip De and Albert P. Bart{\'{o}}k and G{\'{a}}bor Cs{\'{a}}nyi and Michele Ceriotti}, + title = {Comparing molecules and solids across structural and alchemical space}, + journal = {Physical Chemistry Chemical Physics} +} + +@article{De2017, + doi = {10.1186/s13321-017-0192-4}, + url = {https://doi.org/10.1186/s13321-017-0192-4}, + year = {2017}, + month = {02}, + publisher = {Springer Science and Business Media {LLC}}, + volume = {9}, + number = {1}, + author = {Sandip De and Félix Musil and Teresa Ingram and Carsten Baldauf and Michele Ceriotti}, + title = {Mapping and classifying molecules from a high-throughput structural database}, + journal = {Journal of Cheminformatics} +} + +@article{Musil2018, + doi = {10.1039/c7sc04665k}, + url = {https://doi.org/10.1039/c7sc04665k}, + year = {2018}, + publisher = {Royal Society of Chemistry ({RSC})}, + volume = {9}, + number = {5}, + pages = {1289--1300}, + author = {Félix Musil and Sandip De and Jack Yang and Joshua E. Campbell and Graeme M. Day and Michele Ceriotti}, + title = {Machine learning for the structure-energy-property landscapes of molecular crystals}, + journal = {Chemical Science} +} + +@article{Hautier2019, + doi = {10.1016/j.commatsci.2019.02.040}, + url = {https://doi.org/10.1016/j.commatsci.2019.02.040}, + year = {2019}, + month = {06}, + publisher = {Elsevier {BV}}, + volume = {163}, + pages = {108--116}, + author = {Geoffroy Hautier}, + title = {Finding the needle in the haystack: Materials discovery and design through computational ab initio high-throughput screening}, + journal = {Computational Materials Science} +} + +@article{Willatt2019, + doi = {10.1063/1.5090481}, + url = {https://doi.org/10.1063/1.5090481}, + year = {2019}, + month = {04}, + publisher = {{AIP} Publishing}, + volume = {150}, + number = {15}, + pages = {154110}, + author = {Michael J. Willatt and F{\'{e}}lix Musil and Michele Ceriotti}, + title = {Atom-density representations for machine learning}, + journal = {The Journal of Chemical Physics} +} + +@article{Behler2007, + doi = {10.1103/physrevlett.98.146401}, + url = {https://doi.org/10.1103/physrevlett.98.146401}, + year = {2007}, + month = {04}, + publisher = {American Physical Society ({APS})}, + volume = {98}, + number = {14}, + author = {J\"{o}rg Behler and Michele Parrinello}, + title = {Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces}, + journal = {Physical Review Letters} +} + +@article{Ruddigkeit2012, + doi = {10.1021/ci300415d}, + url = {https://doi.org/10.1021/ci300415d}, + year = {2012}, + month = {11}, + publisher = {American Chemical Society ({ACS})}, + volume = {52}, + number = {11}, + pages = {2864--2875}, + author = {Lars Ruddigkeit and Ruud van Deursen and Lorenz C. Blum and Jean-Louis Reymond}, + title = {Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database {GDB}-17}, + journal = {Journal of Chemical Information and Modeling} +} + +@article{Ramakrishnan2014, + doi = {10.1038/sdata.2014.22}, + url = {https://doi.org/10.1038/sdata.2014.22}, + year = {2014}, + month = {08}, + publisher = {Springer Science and Business Media {LLC}}, + volume = {1}, + number = {1}, + author = {Raghunathan Ramakrishnan and Pavlo O. Dral and Matthias Rupp and O. Anatole von Lilienfeld}, + title = {Quantum chemistry structures and properties of 134 kilo molecules}, + journal = {Scientific Data} +} + +@article{Bartok2010, + doi = {10.1103/physrevlett.104.136403}, + url = {https://doi.org/10.1103/physrevlett.104.136403}, + year = {2010}, + month = {04}, + publisher = {American Physical Society ({APS})}, + volume = {104}, + number = {13}, + author = {Albert P. Bart{\'{o}}k and Mike C. Payne and Risi Kondor and G{\'{a}}bor Cs{\'{a}}nyi}, + title = {Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons}, + journal = {Physical Review Letters} +} + +@article{Schlkopf1998, + doi = {10.1162/089976698300017467}, + url = {https://doi.org/10.1162/089976698300017467}, + year = {1998}, + month = {08}, + publisher = {{MIT} Press - Journals}, + volume = {10}, + number = {5}, + pages = {1299--1319}, + author = {Bernhard Sch\"{o}lkopf and Alexander Smola and Klaus-Robert M\"{u}ller}, + title = {Nonlinear Component Analysis as a Kernel Eigenvalue Problem}, + journal = {Neural Computation} +} + +@article{Maier2007, + doi = {10.1002/anie.200603675}, + url = {https://doi.org/10.1002/anie.200603675}, + year = {2007}, + month = aug, + publisher = {Wiley}, + volume = {46}, + number = {32}, + pages = {6016--6067}, + author = {Wilhelm{\hspace{0.25em}}F. Maier and Klaus St\"{o}we and Simone Sieg}, + title = {Combinatorial and High-Throughput Materials Science}, + journal = {Angewandte Chemie International Edition} +} + +@article{McInnes2018, + title={UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction}, + author={Leland McInnes and John Healy and James Melville}, + year={2018}, + eprint={1802.03426}, +} + +@online{librascal, + author = {librascal}, + title = {}, + date = {}, + url = {https://github.com/lab-cosmo/librascal} +} + +@online{QUIP, + author = {QUIP}, + title = {}, + date = {}, + url = {http://libatoms.github.io/QUIP/} +} diff --git a/paper/screenshot-3d.png b/paper/paper-2020/screenshot-3d.png similarity index 100% rename from paper/screenshot-3d.png rename to paper/paper-2020/screenshot-3d.png diff --git a/paper/screenshot.png b/paper/paper-2020/screenshot.png similarity index 100% rename from paper/screenshot.png rename to paper/paper-2020/screenshot.png diff --git a/paper/paper.bib b/paper/paper.bib index c2635cec3..bc4fb07b9 100644 --- a/paper/paper.bib +++ b/paper/paper.bib @@ -1,107 +1,383 @@ -@article{Ceriotti2011, - doi = {10.1073/pnas.1108486108}, - url = {https://doi.org/10.1073/pnas.1108486108}, - year = {2011}, - month = {07}, - publisher = {Proceedings of the National Academy of Sciences}, - volume = {108}, - number = {32}, - pages = {13023--13028}, - author = {Michele Ceriotti and Gareth A. Tribello and Michele Parrinello}, - title = {Simplifying the representation of complex free-energy landscapes using sketch-map}, - journal = {Proceedings of the National Academy of Sciences} -} - -@article{Bartok2013, - doi = {10.1103/physrevb.87.184115}, - url = {https://doi.org/10.1103/physrevb.87.184115}, - year = {2013}, - month = {05}, - publisher = {American Physical Society ({APS})}, - volume = {87}, - number = {18}, - author = {Albert P. Bart{\'{o}}k and Risi Kondor and G{\'{a}}bor Cs{\'{a}}nyi}, - title = {On representing chemical environments}, - journal = {Physical Review B} -} - -@article{Montavon2013, - doi = {10.1088/1367-2630/15/9/095003}, - url = {https://doi.org/10.1088/1367-2630/15/9/095003}, - year = {2013}, - month = {09}, - publisher = {{IOP} Publishing}, - volume = {15}, - number = {9}, - pages = {095003}, - author = {Grégoire Montavon and Matthias Rupp and Vivekanand Gobre and Alvaro Vazquez-Mayagoitia and Katja Hansen and Alexandre Tkatchenko and Klaus-Robert M\"{u}ller and O Anatole von Lilienfeld}, - title = {Machine learning of molecular electronic properties in chemical compound space}, - journal = {New Journal of Physics} -} - -@article{Gutlein2014, - doi = {10.1186/s13321-014-0041-7}, - url = {https://doi.org/10.1186/s13321-014-0041-7}, - year = {2014}, - month = sep, - publisher = {Springer Science and Business Media {LLC}}, - volume = {6}, +@article{Fraux2020, + doi = {10.21105/joss.02117}, + url = {https://doi.org/10.21105/joss.02117}, + year = {2020}, + publisher = {The Open Journal}, + volume = {5}, + number = {51}, + pages = {2117}, + author = {Fraux, Guillaume and Cersonsky, Rose K. and Ceriotti, Michele}, + title = {{Chemiscope}: interactive structure-property explorer for materials and + molecules}, + journal = {Journal of Open Source Software}, +} + +@article{ase-paper, + author = {Ask Hjorth Larsen and Jens Jørgen Mortensen and Jakob Blomqvist and Ivano E + Castelli and Rune Christensen and Marcin Dułak and Jesper Friis and Michael N Groves + and Bjørk Hammer and Cory Hargus and Eric D Hermes and Paul C Jennings and Peter + Bjerre Jensen and James Kermode and John R Kitchin and Esben Leonhard Kolsbjerg and + Joseph Kubal and Kristen Kaasbjerg and Steen Lysgaard and Jón Bergmann Maronsson and + Tristan Maxson and Thomas Olsen and Lars Pastewka and Andrew Peterson and Carsten + Rostgaard and Jakob Schiøtz and Ole Schütt and Mikkel Strange and Kristian S Thygesen + and Tejs Vegge and Lasse Vilhelmsen and Michael Walter and Zhenhua Zeng and Karsten W + Jacobsen}, + title = {The atomic simulation environment—a {Python} library for working with atoms}, + journal = {Journal of Physics: Condensed Matter}, + volume = {29}, + number = {27}, + pages = {273002}, + doi = {10.1088/1361-648X/aa680e}, + url = {http://stacks.iop.org/0953-8984/29/i=27/a=273002}, + year = {2017}, +} + +@article{MDAnalysis, + author = {Gowers, Richard J. and Linke, Max and Barnoud, Jonathan and Reddy, Tyler J. + E. and Melo, Manuel N. and Seyler, Sean L. and Domański, Jan and Dotson, David L. and + Buchoux, Sébastien and Kenney, Ian M. and Beckstein, Oliver}, + title = {{MDAnalysis}: A {Python} Package for the Rapid Analysis of Molecular Dynamics + Simulations}, + journal = {SciPy 2016}, + year = {2016}, + doi = {10.25080/Majora-629e541a-00e}, + url = {https://doi.org/10.25080/Majora-629e541a-00e}, +} + +@article{STK, + title = {stk: An extendable {Python} framework for automated molecular and + supramolecular structure assembly and discovery}, + author = {Turcani, Lukas and Tarzia, Andrew and Szczypi{\'n}ski, Filip T. and Jelfs, + Kim E.}, + journal = {The Journal of Chemical Physics}, + year = {2021}, + volume = {154}, + number = {21}, + pages = {214102}, + doi = {10.1063/5.0049708}, +} + +@software{chemfiles, + author = {Guillaume Fraux and Len Kimms and Jonathan Fine and German P. Barletta and + Mykola Dimura and FX Coudert and pelsa and Maximilien Levesque and Shoubhik Maiti and + Simon Guionniere and jmintser}, + title = {{chemfiles}/{chemfiles}: Version 0.10.4}, + month = {may}, + year = {2023}, + publisher = {Zenodo}, + version = {0.10.4}, + doi = {10.5281/zenodo.7904565}, + url = {https://doi.org/10.5281/zenodo.7904565}, +} + +@article{Mazitov2025, + title = {{PET-MAD} as a lightweight universal interatomic potential for advanced + materials modeling}, + author = {Mazitov, Arslan and Bigi, Filippo and Kellner, Matthias and Pegolo, Paolo + and Tisi, Davide and Fraux, Guillaume and Pozdnyakov, Sergey and Loche, Philip and + Ceriotti, Michele}, + journal = {Nature Communications}, + volume = {16}, + number = {1}, + pages = {10653}, + year = {2025}, + doi = {10.1038/s41467-025-65662-7}, + url = {https://doi.org/10.1038/s41467-025-65662-7}, +} + +@article{MAD, + author = {Mazitov, Arslan and Chorna, Sofiia and Fraux, Guillaume and Bercx, Marnik + and Pizzi, Giovanni and De, Sandip and Ceriotti, Michele}, + title = {Massive Atomic Diversity: a compact universal dataset for atomistic machine + learning}, + journal = {Scientific Data}, + volume = {12}, number = {1}, - author = {Martin G\"{u}tlein and Andreas Karwath and Stefan Kramer}, - title = {{CheS}-Mapper 2.0 for visual validation of (Q){SAR} models}, - journal = {Journal of Cheminformatics} + pages = {1857}, + year = {2025}, + doi = {10.1038/s41597-025-06109-y}, + url = {https://doi.org/10.1038/s41597-025-06109-y}, } -@article{Probst2017, - doi = {10.1093/bioinformatics/btx760}, - url = {https://doi.org/10.1093/bioinformatics/btx760}, +@article{metatensor, + title = {{metatensor} and {metatomic}: Foundational libraries for interoperable + atomistic machine learning}, + author = {Filippo Bigi and Joseph W. Abbott and Philip Loche and Arslan Mazitov and + Davide Tisi and Marcel F. Langer and Alexander Goscinski and Paolo Pegolo and Sanggyu + Chong and Rohit Goswami and Pol Febrer and Sofiia Chorna and Matthias Kellner and + Michele Ceriotti and Guillaume Fraux}, + journal = {Journal of Chemical Physics}, + volume = {164}, + number = {6}, + pages = {064113}, + year = {2026}, + month = {Feb}, + doi = {10.1063/5.0304911}, + url = {https://doi.org/10.1063/5.0304911} +} + +@article{Jupyter, + author = {Granger, Brian E. and Pérez, Fernando}, + journal = {Computing in Science & Engineering}, + title = {{Jupyter}: Thinking and Storytelling With Code and Data}, + year = {2021}, + volume = {23}, + number = {2}, + pages = {7-14}, + doi = {10.1109/MCSE.2021.3059263}, +} + +@incollection{JupyterNotebook, + author = {{Kluyver}, Thomas and {Ragan-Kelley}, Benjain and {P{\'e}rez}, Fernando and + {Granger}, Brian and {Bussonnier}, Matthias and {Frederic}, Jonathan and {Kelley}, + Kyle and {Hamrick}, Jessica and {Grout}, Jason and {Corlay}, Sylvain and {Ivanov}, + Paul and {Avila}, Dami{\'a}n and {Abdalla}, Safia and {Willing}, Carol and {Jupyter + Development Team}}, + title = {{Jupyter} Notebooks--a publishing format for reproducible computational + workflows}, + booktitle = {Positioning and Power in Academic Publishing: Players, Agents and + Agendas}, + year = {2016}, + pages = {87--90}, + publisher = {IOS Press}, + doi = {10.3233/978-1-61499-649-1-87}, +} + +@article{Du2024, + title = {{Jupyter} widgets and extensions for education and research in computational + physics and chemistry}, + journal = {Computer Physics Communications}, + volume = {305}, + pages = {109353}, + year = {2024}, + issn = {0010-4655}, + doi = {10.1016/j.cpc.2024.109353}, + url = {https://www.sciencedirect.com/science/article/pii/S0010465524002765}, + author = {Dou Du and Taylor J. Baird and Kristjan Eimre and Sara Bonella and Giovanni + Pizzi}, +} + +@article{Humphrey1996, + author = {William Humphrey and Andrew Dalke and Klaus Schulten}, + title = {{VMD} -- {V}isual {M}olecular {D}ynamics}, + journal = {Journal of Molecular Graphics}, + year = {1996}, + volume = {14}, + pages = {33--38}, + doi = {10.1016/0263-7855(96)00018-5}, +} + +@article{Stukowski2010, + author = {Stukowski, Alexander}, + title = {Visualization and analysis of atomistic simulation data with {OVITO}--the + {Open Visualization Tool}}, + journal = {Modelling and Simulation in Materials Science and Engineering}, + year = {2010}, + volume = {18}, + number = {1}, + pages = {015012}, + doi = {10.1088/0965-0393/18/1/015012}, +} + +@article{Rego2015, + author = {Rego, Nicholas and Koes, David}, + title = {{3Dmol.js}: molecular visualization with {WebGL}}, + journal = {Bioinformatics}, + year = {2015}, + volume = {31}, + number = {8}, + pages = {1322--1324}, + issn = {1367-4803}, + doi = {10.1093/bioinformatics/btu829}, + url = {https://doi.org/10.1093/bioinformatics/btu829}, + note = {Available at: \url{http://3Dmol.csb.pitt.edu}}, + publisher = {Oxford University Press}, +} + +@article{Nguyen2017, + author = {Nguyen, Hai and Case, David A. and Rose, Alexander S.}, + title = {NGLview-interactive molecular graphics for {Jupyter} notebooks}, + journal = {Bioinformatics}, year = {2017}, - month = {10}, - publisher = {Oxford University Press ({OUP})}, volume = {34}, - number = {8}, - pages = {1433--1435}, - author = {Daniel Probst and Jean-Louis Reymond}, - editor = {Jonathan Wren}, - title = {{FUn}: a framework for interactive visualizations of large, high-dimensional datasets on the web}, - journal = {Bioinformatics} -} - -@article{Gong2013, - doi = {10.1093/bioinformatics/btt270}, - url = {https://doi.org/10.1093/bioinformatics/btt270}, - year = {2013}, - month = {05}, - publisher = {Oxford University Press ({OUP})}, - volume = {29}, - number = {14}, - pages = {1827--1829}, - author = {Jiayu Gong and Chaoqian Cai and Xiaofeng Liu and Xin Ku and Hualiang Jiang and Daqi Gao and Honglin Li}, - title = {{ChemMapper}: a versatile web server for exploring pharmacology and chemical structure association based on molecular 3D similarity method}, - journal = {Bioinformatics} + number = {7}, + pages = {1241--1242}, + issn = {1367-4803}, + doi = {10.1093/bioinformatics/btx789}, + url = {https://doi.org/10.1093/bioinformatics/btx789}, + note = {Source code available at: \url{https://github.com/arose/nglview}}, + publisher = {Oxford University Press}, } -@article{Paruzzo2018, - doi = {10.1038/s41467-018-06972-x}, - url = {https://doi.org/10.1038/s41467-018-06972-x}, - year = {2018}, - month = oct, - publisher = {Springer Science and Business Media {LLC}}, +@article{IPython, + author = {Perez, Fernando and Granger, Brian E.}, + journal = {Computing in Science & Engineering}, + title = {{IPython}: A System for Interactive Scientific Computing}, + year = {2007}, volume = {9}, - number = {1}, - author = {Federico M. Paruzzo and Albert Hofstetter and Félix Musil and Sandip De and Michele Ceriotti and Lyndon Emsley}, - title = {Chemical shifts in molecular solids by machine learning}, - journal = {Nature Communications} + number = {3}, + pages = {21-29}, + doi = {10.1109/MCSE.2007.53}, } -@software{ISV, - author = {De, Sandip and Ceriotti, Michele}, - title = {Interactive Sketchmap Visualizer}, +@software{sphinx, + author = {Óscar Nájera and Eric Larson and Lucy Liu and Loïc Estève and Gael Varoquaux + and Jaques Grobler and Elliott Sales de Andrade and Chris Holdgraf and Alexandre + Gramfort and Mainak Jas and Joel Nothman and Steffen Rehberg and Olivier Grisel and + Nelle Varoquaux and Steven Hiscocks and alexis and Emmanuelle Gouillart and Tim + Hoffmann and Antony Lee and Gavin Uberti and Martin Luessi and Albert Y. Shih and Jake + Vanderplas and Jody Klymak and Alex Rockhill and John Muradeli and Thomas A Caswell + and Bane Sullivan and Alyssa Batula and Patrick Kunzmann}, + title = {sphinx-gallery/sphinx-gallery: v0.12.2}, + month = {mar}, + year = {2023}, publisher = {Zenodo}, - year = {2019}, - version = {1.0.0}, - doi = {10.5281/zenodo.3541831}, - url = {https://doi.org/10.5281/zenodo.3541831} + version = {v0.12.2}, + doi = {10.5281/zenodo.7716999}, + url = {https://doi.org/10.5281/zenodo.7716999}, +} + +@article{Talirz_2020, + title = {{Materials Cloud}, a platform for open computational science}, + volume = {7}, + issn = {2052-4463}, + url = {http://dx.doi.org/10.1038/s41597-020-00637-5}, + doi = {10.1038/s41597-020-00637-5}, + number = {1}, + journal = {Scientific Data}, + publisher = {Springer Science and Business Media LLC}, + author = {Talirz, Leopold and Kumbhar, Snehal and Passaro, Elsa and Yakutovich, + Aliaksandr V. and Granata, Valeria and Gargiulo, Fernando and Borelli, Marco and + Uhrin, Martin and Huber, Sebastiaan P. and Zoupanos, Spyros and Adorf, Carl S. and + Andersen, Casper Welzel and Schütt, Ole and Pignedoli, Carlo A. and Passerone, Daniele + and VandeVondele, Joost and Schulthess, Thomas C. and Smit, Berend and Pizzi, Giovanni + and Marzari, Nicola}, + year = {2020}, + month = {sep}, +} + +@article{Goscinski2025scicodewidgets, + title = {scicode-widgets: Bringing Computational Experiments to the Classroom with + {Jupyter} Widgets}, + author = {Goscinski, Alexander and Baird, Taylor J. and Du, Dou and Prado, Jo{\~a}o + and Suman, Divya and Sodjargal, Tulga-Erdene and Bonella, Sara and Pizzi, Giovanni and + Ceriotti, Michele}, + year = {2025}, + eprint = {2507.05734}, + archivePrefix = {arXiv}, + primaryClass = {physics.ed-ph}, + doi = {10.48550/arXiv.2507.05734}, + url = {https://arxiv.org/abs/2507.05734}, +} + +@article{orlov2025, + author = {Orlov, Alexey A. and Sosnin, Sergey and Fedorov, Maxim V.}, + title = {From High Dimensions to Human Insight: Exploring Dimensionality Reduction for + Chemical Space Visualization}, + journal = {Molecular Informatics}, + year = {2025}, + volume = {44}, + number = {1}, + pages = {e202400265}, + doi = {10.1002/minf.202400265}, + issn = {1868-1743}, + url = {https://doi.org/10.1002/minf.202400265}, + publisher = {Wiley}, +} + +@article{Walsh2025mapping, + author = {Park, Hyunsoo and Onwuli, Anthony and Butler, Keith T. and Walsh, Aron}, + title = {Mapping inorganic crystal chemical space}, + journal = {Faraday Discussions}, + year = {2025}, + volume = {256}, + pages = {601--613}, + doi = {10.1039/D4FD00063C}, + url = {https://doi.org/10.1039/D4FD00063C}, +} + +@article{Cheng2020, + author = {Cheng, Bingqing and Griffiths, Ryan-Rhys and Wengert, Simon and Kunkel, + Christian and Stenczel, Tamas and Zhu, Bonan and Deringer, Volker L. and Bernstein, + Noam and Margraf, Johannes T. and Reuter, Karsten and Csanyi, Gabor}, + title = {Mapping Materials and Molecules}, + journal = {Accounts of Chemical Research}, + volume = {53}, + number = {9}, + pages = {1981--1991}, + year = {2020}, + doi = {10.1021/acs.accounts.0c00403}, + url = {https://doi.org/10.1021/acs.accounts.0c00403}, +} + +@article{Tamura2022, + author = {Tamura, Ryo and Matsuda, Momo and Lin, Jianbo and Futamura, Yasunori and + Sakurai, Tetsuya and Miyazaki, Tsuyoshi}, + title = {Structural analysis based on unsupervised learning: Search for a + characteristic low-dimensional space by local structures in atomistic simulations}, + journal = {Physical Review B}, + volume = {105}, + number = {7}, + pages = {075107}, + year = {2022}, + doi = {10.1103/PhysRevB.105.075107}, + url = {https://doi.org/10.1103/PhysRevB.105.075107}, +} + +@article{Chapman2022, + author = {Chapman, James and Goldman, Nir and Wood, Brandon C.}, + title = {Efficient and universal characterization of atomic structures through a + topological graph order parameter}, + journal = {npj Computational Materials}, + volume = {8}, + number = {1}, + pages = {37}, + year = {2022}, + doi = {10.1038/s41524-022-00717-7}, + url = {https://doi.org/10.1038/s41524-022-00717-7}, +} + +@article{Huang2020, + author = {Huang, Yue and Zhang, Jingtian and Jiang, Edwin S. and Oya, Yutaka and + Saeki, Akinori and Kikugawa, Gota and Okabe, Tomonaga and Ohuchi, Fumio S.}, + title = {Structure--Property Correlation Study for Organic Photovoltaic Polymer + Materials Using Data Science Approach}, + journal = {The Journal of Physical Chemistry C}, + volume = {124}, + number = {24}, + pages = {12871--12882}, + year = {2020}, + doi = {10.1021/acs.jpcc.0c00517}, + url = {https://doi.org/10.1021/acs.jpcc.0c00517}, +} + +@article{xie2018, + author = {Tian Xie and Jeffrey C. Grossman}, + title = {Hierarchical visualization of materials space with graph convolutional neural + networks}, + journal = {The Journal of Chemical Physics}, + year = {2018}, + volume = {149}, + number = {17}, + pages = {174111}, + month = {nov}, + doi = {10.1063/1.5047803}, + issn = {0021-9606}, + url = {https://doi.org/10.1063/1.5047803}, + publisher = {AIP Publishing}, +} + +@article{Nicholas2020, + author = {Nicholas, Thomas C. and Goodwin, Andrew L. and Deringer, Volker L.}, + title = {Understanding the geometric diversity of inorganic and hybrid frameworks + through structural coarse-graining}, + journal = {Chemical Science}, + volume = {11}, + number = {46}, + pages = {12580--12587}, + year = {2020}, + doi = {10.1039/D0SC03287E}, + url = {https://doi.org/10.1039/D0SC03287E}, } @article{De2016, @@ -112,162 +388,192 @@ @article{De2016 volume = {18}, number = {20}, pages = {13754--13769}, - author = {Sandip De and Albert P. Bart{\'{o}}k and G{\'{a}}bor Cs{\'{a}}nyi and Michele Ceriotti}, + author = {Sandip De and Albert P. Bart{\'{o}}k and G{\'{a}}bor Cs{\'{a}}nyi and + Michele Ceriotti}, title = {Comparing molecules and solids across structural and alchemical space}, - journal = {Physical Chemistry Chemical Physics} + journal = {Physical Chemistry Chemical Physics}, } -@article{De2017, - doi = {10.1186/s13321-017-0192-4}, - url = {https://doi.org/10.1186/s13321-017-0192-4}, - year = {2017}, - month = {02}, - publisher = {Springer Science and Business Media {LLC}}, - volume = {9}, +@article{HernandezLeon2024, + doi = {10.1088/1402-4896/ad432e}, + url = {https://doi.org/10.1088/1402-4896/ad432e}, + year = {2024}, + month = {may}, + publisher = {IOP Publishing}, + volume = {99}, + number = {6}, + pages = {066004}, + author = {Hernández-León, Patricia and Caro, Miguel A}, + title = {Cluster-based multidimensional scaling embedding tool for data + visualization}, + journal = {Physica Scripta}, +} + +@article{Wurger2021, + author = {W{\"u}rger, Tim and Mei, Di and Vaghefinazari, Bahram and Winkler, David A. + and Lamaka, Sviatlana V. and Zheludkevich, Mikhail L. and Mei{\ss}ner, Robert H. and + Feiler, Christian}, + title = {Exploring structure-property relationships in magnesium dissolution + modulators}, + journal = {npj Materials Degradation}, + volume = {5}, number = {1}, - author = {Sandip De and Félix Musil and Teresa Ingram and Carsten Baldauf and Michele Ceriotti}, - title = {Mapping and classifying molecules from a high-throughput structural database}, - journal = {Journal of Cheminformatics} + pages = {2}, + year = {2021}, + doi = {10.1038/s41529-020-00148-z}, + url = {https://doi.org/10.1038/s41529-020-00148-z}, } -@article{Musil2018, - doi = {10.1039/c7sc04665k}, - url = {https://doi.org/10.1039/c7sc04665k}, - year = {2018}, - publisher = {Royal Society of Chemistry ({RSC})}, +@article{Helfrecht2020, + doi = {10.1088/2632-2153/aba9ef}, + url = {https://doi.org/10.1088/2632-2153/aba9ef}, + year = {2020}, + publisher = {IOP}, + volume = {1}, + pages = {045021}, + author = {Helfrecht, Benjamin A. and Cersonsky, Rose K. and Fraux, Guillaume and Ceriotti, Michele}, + title = {Structure-property maps with Kernel principal covariates regression}, + journal = {Machine Learning: Science and Technology}, +} + +@article{Jorgensen2026, + title = {Interpretable Visualizations of Data Spaces for Classification Problems}, + author = {Jorgensen, Christian and Lin, Arthur Y. and Vasavada, Rhushil and Cersonsky, + Rose K.}, + journal = {Machine Learning: Science and Technology}, + volume = {7}, + number = {2}, + pages = {025008}, + year = {2026}, + doi = {10.1088/2632-2153/ae466e}, + url = {https://doi.org/10.1088/2632-2153/ae466e}, +} + +@article{MD22, + title = {Accurate global machine learning force fields for molecules with hundreds of + atoms}, + author = {Stefan Chmiela and Valentin Vassilev-Galindo and Oliver T. Unke and Adil + Kabylda and Huziel E. Sauceda and Alexandre Tkatchenko and Klaus-Robert Müller}, + year = {2023}, + journal = {Science Advances}, volume = {9}, - number = {5}, - pages = {1289--1300}, - author = {Félix Musil and Sandip De and Jack Yang and Joshua E. Campbell and Graeme M. Day and Michele Ceriotti}, - title = {Machine learning for the structure-energy-property landscapes of molecular crystals}, - journal = {Chemical Science} -} - -@article{Hautier2019, - doi = {10.1016/j.commatsci.2019.02.040}, - url = {https://doi.org/10.1016/j.commatsci.2019.02.040}, - year = {2019}, - month = {06}, - publisher = {Elsevier {BV}}, + number = {2}, + pages = {eadf0873}, + doi = {10.1126/sciadv.adf0873}, + url = {https://www.science.org/doi/abs/10.1126/sciadv.adf0873}, + eprint = {https://www.science.org/doi/pdf/10.1126/sciadv.adf0873}, +} + +@misc{plotlyjs, + author = {{Plotly Technologies Inc.}}, + title = {Collaborative data science}, + year = {2015}, + url = {https://plot.ly}, +} + +@misc{MaterialsCloudChemiscopeSearch, + title = {Materials Cloud Archive}, + url = {https://archive.materialscloud.org/search?q=&f=ext_apps%3Achemiscope&l=list&p=1&s=10&sort=newest}, + note = {Accessed 2026-05-14}, + year = {2026}, +} + +@misc{PyPIStatsChemiscope, + title = {{PyPI} Stats for chemiscope}, + url = {https://pypistats.org/packages/chemiscope}, + note = {Accessed 2026-05-14}, + year = {2026}, +} + +@misc{AtomisticCookbook, + title = {The Atomistic Cookbook}, + url = {https://atomistic-cookbook.org/software/chemiscope.html}, + note = {Accessed 2026-05-14}, + year = {2026}, +} + +@article{MACE, + author = {Ilyes Batatia and Philipp Benner and Yuan Chiang and Alin M. Elena and Dávid + P. Kovács and Janosh Riebesell and Xavier R. Advincula and Mark Asta and Matthew + Avaylon and William J. Baldwin and Fabian Berger and Noam Bernstein and Arghya Bhowmik + and Filippo Bigi and Samuel M. Blau and Vlad Cărare and Michele Ceriotti and Sanggyu + Chong and James P. Darby and Sandip De and Flaviano Della Pia and Volker L. Deringer + and Rokas Elijošius and Zakariya El-Machachi and Edvin Fako and Fabio Falcioni and + Andrea C. Ferrari and John L. A. Gardner and Mikołaj J. Gawkowski and Annalena + Genreith-Schriever and Janine George and Rhys E. A. Goodall and Jonas Grandel and + Clare P. Grey and Petr Grigorev and Shuang Han and Will Handley and Hendrik H. Heenen + and Kersti Hermansson and Cheuk Hin Ho and Stephan Hofmann and Christian Holm and Jad + Jaafar and Konstantin S. Jakob and Hyunwook Jung and Venkat Kapil and Aaron D. Kaplan + and Nima Karimitari and James R. Kermode and Panagiotis Kourtis and Namu Kroupa and + Jolla Kullgren and Matthew C. Kuner and Domantas Kuryla and Guoda Liepuoniute and Chen + Lin and Johannes T. Margraf and Ioan-Bogdan Magdău and Angelos Michaelides and J. + Harry Moore and Aakash A. Naik and Samuel P. Niblett and Sam Walton Norwood and Niamh + O’Neill and Christoph Ortner and Kristin A. Persson and Karsten Reuter and Andrew S. + Rosen and Louise A. M. Rosset and Lars L. Schaaf and Christoph Schran and Benjamin X. + Shi and Eric Sivonxay and Tamás K. Stenczel and Christopher Sutton and Viktor Svahn + and Thomas D. Swinburne and Jules Tilly and Cas van der Oord and Santiago Vargas and + Eszter Varga-Umbrich and Tejs Vegge and Martin Vondrák and Yangshuai Wang and William + C. Witt and Thomas Wolf and Fabian Zills and Gábor Csányi}, + title = {A foundation model for atomistic materials chemistry}, + journal = {The Journal of Chemical Physics}, volume = {163}, - pages = {108--116}, - author = {Geoffroy Hautier}, - title = {Finding the needle in the haystack: Materials discovery and design through computational ab initio high-throughput screening}, - journal = {Computational Materials Science} -} - -@article{Willatt2019, - doi = {10.1063/1.5090481}, - url = {https://doi.org/10.1063/1.5090481}, - year = {2019}, - month = {04}, - publisher = {{AIP} Publishing}, - volume = {150}, - number = {15}, - pages = {154110}, - author = {Michael J. Willatt and F{\'{e}}lix Musil and Michele Ceriotti}, - title = {Atom-density representations for machine learning}, - journal = {The Journal of Chemical Physics} -} - -@article{Behler2007, - doi = {10.1103/physrevlett.98.146401}, - url = {https://doi.org/10.1103/physrevlett.98.146401}, - year = {2007}, - month = {04}, - publisher = {American Physical Society ({APS})}, - volume = {98}, - number = {14}, - author = {J\"{o}rg Behler and Michele Parrinello}, - title = {Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces}, - journal = {Physical Review Letters} -} - -@article{Ruddigkeit2012, - doi = {10.1021/ci300415d}, - url = {https://doi.org/10.1021/ci300415d}, - year = {2012}, - month = {11}, - publisher = {American Chemical Society ({ACS})}, - volume = {52}, - number = {11}, - pages = {2864--2875}, - author = {Lars Ruddigkeit and Ruud van Deursen and Lorenz C. Blum and Jean-Louis Reymond}, - title = {Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database {GDB}-17}, - journal = {Journal of Chemical Information and Modeling} -} - -@article{Ramakrishnan2014, - doi = {10.1038/sdata.2014.22}, - url = {https://doi.org/10.1038/sdata.2014.22}, - year = {2014}, - month = {08}, - publisher = {Springer Science and Business Media {LLC}}, - volume = {1}, - number = {1}, - author = {Raghunathan Ramakrishnan and Pavlo O. Dral and Matthias Rupp and O. Anatole von Lilienfeld}, - title = {Quantum chemistry structures and properties of 134 kilo molecules}, - journal = {Scientific Data} + number = {18}, + pages = {184110}, + year = {2025}, + doi = {10.1063/5.0297006} } -@article{Bartok2010, - doi = {10.1103/physrevlett.104.136403}, - url = {https://doi.org/10.1103/physrevlett.104.136403}, - year = {2010}, - month = {04}, - publisher = {American Physical Society ({APS})}, - volume = {104}, - number = {13}, - author = {Albert P. Bart{\'{o}}k and Mike C. Payne and Risi Kondor and G{\'{a}}bor Cs{\'{a}}nyi}, - title = {Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons}, - journal = {Physical Review Letters} -} - -@article{Schlkopf1998, - doi = {10.1162/089976698300017467}, - url = {https://doi.org/10.1162/089976698300017467}, - year = {1998}, - month = {08}, - publisher = {{MIT} Press - Journals}, - volume = {10}, - number = {5}, - pages = {1299--1319}, - author = {Bernhard Sch\"{o}lkopf and Alexander Smola and Klaus-Robert M\"{u}ller}, - title = {Nonlinear Component Analysis as a Kernel Eigenvalue Problem}, - journal = {Neural Computation} -} - -@article{Maier2007, - doi = {10.1002/anie.200603675}, - url = {https://doi.org/10.1002/anie.200603675}, - year = {2007}, - month = aug, - publisher = {Wiley}, - volume = {46}, - number = {32}, - pages = {6016--6067}, - author = {Wilhelm{\hspace{0.25em}}F. Maier and Klaus St\"{o}we and Simone Sieg}, - title = {Combinatorial and High-Throughput Materials Science}, - journal = {Angewandte Chemie International Edition} +@article{He2025, + author = {He, Yuqing and De Breuck, Pierre-Paul and Weng, Hongming and Giantomassi, + Matteo and Rignanese, Gian-Marco}, + title = {Machine learning on multiple topological materials datasets}, + journal = {npj Computational Materials}, + volume = {11}, + number = {1}, + pages = {181}, + year = {2025}, + doi = {10.1038/s41524-025-01687-2}, + url = {https://doi.org/10.1038/s41524-025-01687-2}, } -@article{McInnes2018, - title={UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction}, - author={Leland McInnes and John Healy and James Melville}, - year={2018}, - eprint={1802.03426}, +@article{Keith2021, + author = {Keith, John A. and Vassilev-Galindo, Valentin and Cheng, Bingqing and + Chmiela, Stefan and Gastegger, Michael and M{\"u}ller, Klaus-Robert and Tkatchenko, + Alexandre}, + title = {Combining Machine Learning and Computational Chemistry for Predictive + Insights Into Chemical Systems}, + journal = {Chemical Reviews}, + volume = {121}, + number = {16}, + pages = {9816--9872}, + year = {2021}, + doi = {10.1021/acs.chemrev.1c00107}, + url = {https://doi.org/10.1021/acs.chemrev.1c00107}, } -@online{librascal, - author = {librascal}, - title = {}, - date = {}, - url = {https://github.com/lab-cosmo/librascal} +@article{Gallarati2022, + author = {Simone Gallarati and Puck van Gerwen and Ruben Laplaza and Sergi Vela and + Alberto Fabrizio and Clémence Corminboeuf}, + title = {{OSCAR}: An extensive repository of chemically and functionally diverse + organocatalysts}, + journal = {Chemical Science}, + volume = {13}, + number = {46}, + pages = {13782-13794}, + year = {2022}, + doi = {10.1039/D2SC04251G}, } -@online{QUIP, - author = {QUIP}, - title = {}, - date = {}, - url = {http://libatoms.github.io/QUIP/} +@article{Blaskovits2024, + author = {Blaskovits, J. Terence and Laplaza, Ruben and Vela, Sergi and Corminboeuf, + Cl{\'e}mence}, + title = {Data-Driven Discovery of Organic Electronic Materials Enabled by Hybrid + Top-Down/Bottom-Up Design}, + journal = {Advanced Materials}, + volume = {36}, + number = {2}, + pages = {2305602}, + year = {2024}, + doi = {10.1002/adma.202305602}, + url = {https://doi.org/10.1002/adma.202305602}, } diff --git a/paper/paper.md b/paper/paper.md index 911a552c6..cc20a27bf 100644 --- a/paper/paper.md +++ b/paper/paper.md @@ -1,17 +1,28 @@ --- -title: 'Chemiscope: interactive structure-property explorer for materials and molecules' +title: 'Chemiscope 1.0: interactive exploration of atomistic data from analysis to dissemination' tags: - TypeScript - JavaScript + - Python - chemistry - - material science + - materials science - machine learning + - visualization authors: - - name: Guillaume Fraux - orcid: 0000-0003-4824-6512 + - name: Sofiia Chorna + orcid: 0009-0008-7426-0856 + affiliation: 1 + - name: Jakub Lála + orcid: 0000-0002-5424-5260 + affiliation: "1, 2" + - name: Qianjun Xu + orcid: 0000-0003-0778-7208 affiliation: 1 - name: Rose K. Cersonsky orcid: 0000-0003-4515-3441 + affiliation: 3 + - name: Guillaume Fraux + orcid: 0000-0003-4824-6512 affiliation: 1 - name: Michele Ceriotti orcid: 0000-0003-2571-2832 @@ -19,102 +30,176 @@ authors: affiliations: - name: Laboratory of Computational Science and Modeling, IMX, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland index: 1 -date: 30 January 2020 + - name: Department of Materials, Imperial College London, London SW7 2AZ, United Kingdom + index: 2 + - name: Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53705, United States + index: 3 +date: 31 January 2026 bibliography: paper.bib --- # Summary +Chemiscope is an interactive visualization tool for exploring structure-property +relationships in molecular and materials datasets [@Fraux2020]. It links a map view, +e.g. a low-dimensional embedding or property-property scatter plot, to an interactive 3D +structure viewer, which streamlines inspection of clusters and outliers by moving +between points in feature and property space and the corresponding atomic +configurations. + +Chemiscope 1.0 turns the original browser-based visualizer into a versatile, +multi-purpose tool that fits into Python-centric workflows. The same visualization can +be rendered as a standalone web viewer, embedded as a Jupyter widget [@Jupyter; +@IPython], included in Streamlit web applications, or integrated into Sphinx-built +documentation and sphinx-gallery examples for reproducible software manuals [@sphinx]. +Chemiscope 1.0 also provides support for visualizing datasets directly from widely used +atomistic Python toolkits, including ASE [@ase-paper], MDAnalysis [@MDAnalysis], stk +[@STK], and Chemfiles [@chemfiles]. + +![Overview of Chemiscope 1.0 multi-environment support. The Python API accepts +structures from ASE, MDAnalysis, stk, and Chemfiles, along with user-defined properties +and visualization settings. These inputs can be rendered as an interactive Jupyter +widget, embedded in Streamlit applications, integrated into Sphinx documentation, or +exported for the standalone web application at +chemiscope.org.](chemiscope-v1.0.svg){width=100%} + +# Statement of need + +Atomistic modeling workflows produce collections of molecular and materials structures +together with associated quantities, including energies, forces, charges, and other +scalar or tensorial properties. These datasets are commonly explored using +post-processing analysis, including property-property correlations [@Huang2020; +@Wurger2021] and low-dimensional projections [@Helfrecht2020; @Jorgensen2026; +@orlov2025; @Tamura2022; @HernandezLeon2024], to relate abstract representations to the +underlying atomic configurations [@Chapman2022; @Nicholas2020]. Interactive +visualization provides a practical means to interpret structure-property relationships +[@Wurger2021], verify computational results, identify unexpected patterns [@xie2018], +and explore learned representations [@Walsh2025mapping; @Cheng2020; @De2016]. + +# State of the field + +Chemiscope has been adopted across multiple atomistic modeling and coarse-grained +studies, with interactive viewers shared alongside publications and archived datasets on +platforms such as Materials Cloud [@Talirz_2020]. While complementary visualization +tools exist, from desktop applications such as VMD and OVITO [@Humphrey1996; +@Stukowski2010] to WebGL-based molecular viewers such as 3Dmol.js and NGLview +[@Rego2015; @Nguyen2017], Chemiscope distinguishes itself by providing a single dataset +representation and rendering stack that can be reused across multiple contexts. This is +especially important in Python-based workflows, where the same visualization is often +needed in a Jupyter notebook for analysis, a web view for sharing, and documentation for +reproducibility and teaching [@JupyterNotebook; @Goscinski2025scicodewidgets; @Du2024]. + +# Software design + +Chemiscope 1.0 is implemented as a TypeScript visualization library with the Python +package providing platform-specific integrations. The Python API can be used to build a +Chemiscope dataset from atomic structures, associated properties, and visualization +settings, and export it in the JSON schema consumed by the JavaScript renderer. The +interface is organized into linked map, structure, and information panels. The map panel +uses Plotly.js to render 2D and 3D scatter plots [@plotlyjs], the structure panel uses +3Dmol.js for molecular rendering. + +The map rendering is a primary performance bottleneck for large datasets. Chemiscope 1.0 +introduces adaptive Level of Detail (LOD) rendering for scatter views, which downsample +large datasets based on screen-space density, i.e., how many points would overlap in the +current view. As users zoom or change view parameters, the displayed subset is updated +to preserve both responsiveness and visual structure. In practice, this handles maps +with more than 500,000 points on commodity hardware, without requiring users to +pre-filter or manually decimate their data. Structure data can also be off-loaded to +external files, reducing memory footprint and initial loading time. + +Chemiscope 1.0 introduces the possibility of rendering atom-centered shapes to represent +vectorial and tensorial properties, including arrows (e.g. dipoles or forces), +ellipsoids (e.g. polarizabilities), and user-defined triangular meshes. For biomolecular +systems, it supports cartoon representations based on residue and chain information. The +structure viewer handles a grid layout for side-by-side comparison of multiple +structures or local environments. + +In Jupyter notebooks, the viewer is exposed as a widget with bidirectional communication +between Python and the JavaScript runtime, implemented via traitlets [@Jupyter; +@IPython]. The widget supports programmatic control of the visualization, including +selection synchronization, settings modification, and export of map snapshots. Users can +create a visualization by preparing structures and associated properties and calling +`chemiscope.show`: + +```python +import ase.io +import chemiscope + +structures = ase.io.read("trajectory.xyz", ":") + +# Extract properties present in the trajectory (e.g. energy, forces) +properties = chemiscope.extract_properties(structures) + +# Set default settings for multi-frame trajectories +settings = chemiscope.quick_settings(trajectory=True) + +# Display the viewer +chemiscope.show(structures=structures, properties=properties, settings=settings) +``` + +For web applications built with Streamlit, the Chemiscope component renders a viewer +from an in-memory dataset and propagates user interactions (e.g. selection and settings +changes) back to Python, coupling to other Streamlit widgets. For reproducible +documentation, Chemiscope includes a Sphinx extension that embeds interactive viewers +alongside narrative text and executable examples [@sphinx]. + +![50k random structures from the MD22 dataset [@MD22] visualized with Chemiscope by +projecting them into the PET-MAD reduced latent space using `chemiscope.explore`. Panel +a) shows the Chemiscope widget overall, panel b) a zoom-in of the map demonstrating +adaptive level-of-detail rendering, panel c) the 3D view with selective coloring by +cohesive energy, and panel d) the shape functionality displaying forces as +arrows.](chemiscope-v1.0-overview.svg){width=100%} + +Finally, the package includes an `explore` function that generates interactive +visualizations starting from structures alone. It integrates metatomic models +[@metatensor], particularly the PET-MAD model [@Mazitov2025], which is used by default, +to derive informative representations and produce map coordinates without requiring +manual descriptor engineering or an explicit dimensionality reduction step [@MAD]: + +```python +chemiscope.explore(structures, featurizer="pet-mad-1.0") +``` + +Chemiscope is distributed as an open-source package that can be installed from PyPI, and +the default standalone viewer is available online at https://chemiscope.org for quick +inspection of datasets without local installation. Optional features can be installed +via extras: `pip install 'chemiscope[streamlit]'` and `pip install +'chemiscope[explore]'`. + +# Research impact statement + +Chemiscope has been adopted by the atomistic modeling community as a tool for +interactive exploration of structure-property relationships. Interactive visualizations +built with Chemiscope accompany publications and archived datasets on Materials Cloud +[@Talirz_2020] and provide citable links that readers can use to explore data beyond +static images. As of May 2026, the Materials Cloud Archive lists 19 publications using +Chemiscope as an external app [@MaterialsCloudChemiscopeSearch]. + +The tool has been cited in studies spanning machine-learned interatomic potentials and +datasets [@MACE; @He2025; @MAD; @Keith2021; @Cheng2020], coarse-grained molecular +representations [@Helfrecht2020; @Nicholas2020], and high-throughput screening of +materials [@Jorgensen2026; @Blaskovits2024; @Gallarati2022]. Chemiscope is integrated +with the metatensor ecosystem [@metatensor] through the `chemiscope.explore` function, +which uses foundation models like PET-MAD [@Mazitov2025] to generate map coordinates +without manual descriptor engineering. + +Beyond research papers, Chemiscope serves educational and reproducibility purposes: +tutorials and course materials use live widgets to demonstrate dimensionality reduction +and structure-property correlations [@Goscinski2025scicodewidgets], and Chemiscope +viewers can be embedded directly into manuals, as is done in the Atomistic Cookbook +recipes [@AtomisticCookbook]. The Python package reached 5,726 downloads in the last +month on PyPI Stats, accessed May 2026 [@PyPIStatsChemiscope]. + +# AI usage disclosure -The number of materials or molecules that can be created by combining different -chemical elements in various proportions and spatial arrangements is enormous. -Computational chemistry can be used to generate databases containing billions of -potential structures [@Ruddigkeit2012], and predict some of the associated -properties [@Montavon2013; @Ramakrishnan2014]. Unfortunately, the very large -number of structures makes exploring such database — to understand -structure-property relations or find the _best_ structure for a given -application — a daunting task. In recent years, multiple molecular -_representations_ [@Behler2007; @Bartok2013; @Willatt2019] have been developed -to compute structural similarities between materials or molecules, incorporating -physically-relevant information and symmetries. The features associated with -these representations can be used for unsupervised machine learning -applications, such as clustering or classification of the different structures, -and high-throughput screening of database for specific properties [@Maier2007; -@De2017; @Hautier2019]. Unfortunately, the dimensionality of these features (as -well as most of other descriptors used in chemical and materials informatics) is -very high, which makes the resulting classifications, clustering or mapping very -hard to visualize. Dimensionality reduction algorithms [@Schlkopf1998; -@Ceriotti2011; @McInnes2018] can reduce the number of relevant dimensions to a -handful, creating 2D or 3D maps of the full database. - -![The Qm7b database [@Montavon2013] visualized with chemiscope](screenshot.png) - -Chemiscope is a graphical tool for the interactive exploration of materials and -molecular databases, correlating local and global structural descriptors with -the physical properties of the different systems. The interface consists of -two panels. The left panel displays a 2D or 3D scatter plot, in which each -point corresponds to a chemical entity. The axes, color, and style of each point -can be set to represent a property or a structural descriptor to visualize -structure-property relations directly. Structural descriptors are not computed -directly by chemiscope, but must be obtained from one of the many codes -implementing general-purpose atomic representation [@librascal; @QUIP] or more specialized descriptors. Since the most common -descriptors can be very high dimensional, it can be convenient to apply a -dimensionality reduction algorithm that maps them to a lower-dimensional space -for easier visualization. For example the sketch-map algorithm [@Ceriotti2011] -was used with the Smooth Overlap of Atomic Positions representation [@Bartok2013] to -generate the visualization in Figure 1. The right panel displays the -three-dimensional structure of the chemical entities, possibly including -periodic repetition for crystals. Visualizing the chemical structure can help -in finding an intuitive rationalization of the layout of the dataset and the -structure-property relations. - -Whereas similar tools [@Gong2013; @Gutlein2014; @Probst2017; @ISV] only allow -visualizing maps and structures in which each data point corresponds to a -molecule, or a crystal structure, a distinctive feature of chemiscope is the -possibility of visualizing maps in which points correspond to atom-centred -environments. This is useful, for instance, to rationalize the relationship -between structure and atomic properties such as nuclear chemical shieldings -(Figure 2). This is also useful as a diagnostic tool for the many -machine-learning schemes that decompose properties into atom-centred -contributions [@Behler2007; @Bartok2010]. - -![Database of chemical shieldings [@Paruzzo2018] in chemiscope demonstrating the use of a 3D plot and highlighting of atomic environments](./screenshot-3d.png) - -Chemiscope took strong inspiration from a previous similar graphical software, -the interactive sketch-map visualizer [@ISV]. This previous software was used in -multiple research publication, related to the exploration of large-scale -databases, and the mapping of structure-property relationships [@De2016; -@De2017; @Musil2018]. - -# Implementation - -Chemiscope is implemented using the web platform: HTML5, CSS and WebGL to -display graphical elements, and TypeScript (compiled to JavaScript) for -interactivity. It uses [Plotly.js](https://plot.ly/javascript/) to render and -animate 2D and 3D plots; and the JavaScript version of [Jmol](http://jmol.org/) -to display atomic structures. The visualization is fast enough to be used with -datasets containing up to a million points, reacting to user input within a few -hundred milliseconds in the default 2D mode. More elaborate visualizations are -slower, while still handling 100k points easily. - -The use of web technologies makes chemiscope usable from different operating -systems without the need to develop, maintain and package the code for each -operating system. It also means that we can provide an online service at -http://chemiscope.org that allows users to visualize their own dataset without any -local installation. Chemiscope is implemented as a library of re-usable -components linked together via callbacks. This makes it easy to modify the -default interface to generate more elaborate visualizations, for example, -displaying multiple maps generated with different parameters of a dimensionality -reduction algorithm. Chemiscope can also be distributed in a standalone mode, -where the code and a predefined dataset are merged together as a single HTML -file. This standalone mode is useful for archival purposes, for example as -supplementary information for a published article and for use in corporate -environments with sensitive datasets. +Generative AI tools were used occasionally during software development (e.g. to obtain +code suggestions). All AI-generated suggestions were reviewed, modified, and verified by +the authors before inclusion, and the authors take full responsibility for the final +content of the software. # Acknowledgements -The development of chemiscope have been funded by the [NCCR -MARVEL](http://nccr-marvel.ch/), the [MAX](http://max-centre.eu/) European -centre of excellence, and the European Research Council (Horizon 2020 grant -agreement no. 677013-HBMAP). +The development of Chemiscope 1.0 has been funded primarily by the [NCCR +MARVEL](https://nccr-marvel.ch/). # References