Issues 74 and 149 msa spc spd by mkuehbach · Pull Request #167 · FAIRmat-NFDI/pynxtools-em

mkuehbach · 2026-05-28T14:43:34Z

Add parsing for EMSA/MSA #74
Applies greyscale conversion patch no longer only on tiff_zeiss but also other tiff_parsers
Further utility code for the FAIRmat 1 final project meeting (PM) run-through for sampling dataset from sections including writing of ODS files programmatically
Refined configuration of dependencies for rosettasciio to support EDS for e.g. rsciio parsers
Setting the default compression level for deflate to 1 as using 9 turned out to be impractically slow for run-throughs
Parsing experiment description that combines title, authors, and doi from citations is present
Initial tests on hardening typing but minor
Adding code for running a batch processing alternatively via command line. That was found necessary as running queue interactively using ipynbs turned out to accumulate too much intermediate memory that was not frequently enough released to the operating system, therefore also addition of how to spawn the heavy computing off using pythons multiprocessing, which improved the situation on the ipynb side but for the largest jobs it was only possible to get them processed using the command line 04_batch_process.py
Add parsing for SPC, SPD #149

…conversion run-through

…ntained via the pynxtools-microstructure plugin

…n studies, skeleton code for parsing, to be modified still from pynxtools-apm and pynxtools-microstructure, spellchecking

… dm2, dm3, dm4, iii) go to images

…ry as intermediate objects and jupyter objects are not deallocated and thus memory safely returned to the operating system

…nt memory consumption overhead of jupyter notebooks

…ompress these to compose a representative set for the use case images, i.e. tif, bmp, with and without sidecar files, the sampling generates a configuration file (ods spreadsheet automatically, is useful also for showing how to ingest per project extra metadata), sub-sampling is required as the total number of images in the EM dataset collection contains more than 300k images which we currently cannot sieve through manually and for which the adding of atom_types is plagued by file names which do not include chemical symbols of elements in the material or not even notes on the sample material, tested regex based expression of parsers but that gave too many false atom_types specifically single character ones like C, S, P, H, so not useful, tested also usage of AI technology to analyze the file names, was similarly challenged and using only the resources I had available locally does not allow me to processed as many tokens as those generated by the >300k file's names.

…so to all other tiff parsers, namely convert to grayscale and checking image mode

…er development tackles topics other than the here mentioned issues74 and 149

atomprobe-tc added 19 commits April 8, 2026 16:07

bump versions precommit and pyproject toml

b15116e

constrain rosettasciio always to use the latest

72fe06c

skeleton rsciio_msa

2a2e11e

Merge branch 'main' into issues_74_and_149_msa_spc_spd

a1cb2df

Merge branch 'main' into issues_74_and_149_msa_spc_spd

c276d71

tweaking atom type handling and bibliography handling for oasisb example

8bf48bf

iso3_city_firstauthor naming information collected, starting mtex.h5 …

8baaad9

…conversion run-through

remove NeXus/MTex parser as this one has been updated and will be mai…

1a9fb4b

…ntained via the pynxtools-microstructure plugin

proper return values when file already exists and using logger

8998926

reorganize location of the code for examples related to data ingestio…

b3bea29

…n studies, skeleton code for parsing, to be modified still from pynxtools-apm and pynxtools-microstructure, spellchecking

spellchecking minor

cef3603

fixing batch_process queue, next steps i) test and run with msa, emd,…

5c83b3f

… dm2, dm3, dm4, iii) go to images

explicit chunking for all parsers were applicable

2504460

explicit chunking for all parsers were applicable

35b513e

Merge branch 'fairmat1_final_pm' into issues_74_and_149_msa_spc_spd

0e03188

Merge branch 'main' into issues_74_and_149_msa_spc_spd

752b6fd

working version of the msa parser

a0dc5b1

spellchecking

3bfeb18

fix the issue that the running long ipynb session drain the main memo…

0ed6ebf

…ry as intermediate objects and jupyter objects are not deallocated and thus memory safely returned to the operating system

mkuehbach mentioned this pull request Jun 9, 2026

Add parsing for EMSA/MSA #74

Closed

atomprobe-tc added 7 commits June 10, 2026 11:28

configure rosettasciio so that eds-streams are included

501b942

adding cli executable script to overcome the problem of the significa…

56c4704

…nt memory consumption overhead of jupyter notebooks

adding parsing of NXuser

985a748

apply the patch that so far was used only in the tiff_zeiss parser al…

bd5fc36

…so to all other tiff parsers, namely convert to grayscale and checking image mode

apply the patch that so far was used only in the tiff_zeiss parser al…

0bc1aeb

…so to all other tiff parsers, namely convert to grayscale and checking image mode

minor fix spellchecking to prepare merging this feature branch, furth…

66395e6

…er development tackles topics other than the here mentioned issues74 and 149

mkuehbach merged commit 3a0f0e5 into main Jun 11, 2026
14 checks passed

mkuehbach deleted the issues_74_and_149_msa_spc_spd branch June 11, 2026 11:49

mkuehbach mentioned this pull request Jun 16, 2026

Issue149 spc spd parser #173

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues 74 and 149 msa spc spd#167

Issues 74 and 149 msa spc spd#167
mkuehbach merged 26 commits into
mainfrom
issues_74_and_149_msa_spc_spd

mkuehbach commented May 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mkuehbach commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mkuehbach commented May 28, 2026 •

edited

Loading