Skip to content

Issues 74 and 149 msa spc spd#167

Merged
mkuehbach merged 26 commits into
mainfrom
issues_74_and_149_msa_spc_spd
Jun 11, 2026
Merged

Issues 74 and 149 msa spc spd#167
mkuehbach merged 26 commits into
mainfrom
issues_74_and_149_msa_spc_spd

Conversation

@mkuehbach

@mkuehbach mkuehbach commented May 28, 2026

Copy link
Copy Markdown
Collaborator
  • Add parsing for EMSA/MSA #74
  • Applies greyscale conversion patch no longer only on tiff_zeiss but also other tiff_parsers
  • Further utility code for the FAIRmat 1 final project meeting (PM) run-through for sampling dataset from sections including writing of ODS files programmatically
  • Refined configuration of dependencies for rosettasciio to support EDS for e.g. rsciio parsers
  • Setting the default compression level for deflate to 1 as using 9 turned out to be impractically slow for run-throughs
  • Parsing experiment description that combines title, authors, and doi from citations is present
  • Initial tests on hardening typing but minor
  • Adding code for running a batch processing alternatively via command line. That was found necessary as running queue interactively using ipynbs turned out to accumulate too much intermediate memory that was not frequently enough released to the operating system, therefore also addition of how to spawn the heavy computing off using pythons multiprocessing, which improved the situation on the ipynb side but for the largest jobs it was only possible to get them processed using the command line 04_batch_process.py
  • Add parsing for SPC, SPD #149

…ntained via the pynxtools-microstructure plugin
…n studies, skeleton code for parsing, to be modified still from pynxtools-apm and pynxtools-microstructure, spellchecking
…ry as intermediate objects and jupyter objects are not deallocated and thus memory safely returned to the operating system
@mkuehbach mkuehbach mentioned this pull request Jun 9, 2026
…nt memory consumption overhead of jupyter notebooks
…ompress these to compose a representative set for the use case images, i.e. tif, bmp, with and without sidecar files, the sampling generates a configuration file (ods spreadsheet automatically, is useful also for showing how to ingest per project extra metadata), sub-sampling is required as the total number of images in the EM dataset collection contains more than 300k images which we currently cannot sieve through manually and for which the adding of atom_types is plagued by file names which do not include chemical symbols of elements in the material or not even notes on the sample material, tested regex based expression of parsers but that gave too many false atom_types specifically single character ones like C, S, P, H, so not useful, tested also usage of AI technology to analyze the file names, was similarly challenged and using only the resources I had available locally does not allow me to processed as many tokens as those generated by the >300k file's names.
…so to all other tiff parsers, namely convert to grayscale and checking image mode
…so to all other tiff parsers, namely convert to grayscale and checking image mode
…er development tackles topics other than the here mentioned issues74 and 149
@mkuehbach mkuehbach merged commit 3a0f0e5 into main Jun 11, 2026
14 checks passed
@mkuehbach mkuehbach deleted the issues_74_and_149_msa_spc_spd branch June 11, 2026 11:49
@mkuehbach mkuehbach mentioned this pull request Jun 16, 2026
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants