Skip to content

Fix ADF15 parser and refine regex patterns#497

Open
munechika-koyo wants to merge 4 commits intocherab:developmentfrom
munechika-koyo:fix/parse-adf15
Open

Fix ADF15 parser and refine regex patterns#497
munechika-koyo wants to merge 4 commits intocherab:developmentfrom
munechika-koyo:fix/parse-adf15

Conversation

@munechika-koyo
Copy link
Member

@munechika-koyo munechika-koyo commented Mar 22, 2026

Summary

Because ADF15 raw format for tungsten, like pec40#w_ic#w12.dat, is different from what cherab has handled like:

C  energy levels
C  -------------
C
C   lv      configuration    (2S+1)L(w-1/2)      energy (cm^-1)
C  ---   ------------------- --------------      --------------
c    1   60964A52B             (1) 0(  0.0)             0.0
c    2   60963A52B51C          (3) 2(  3.0)         16720.4
				:
				:
				:

C  photon emissivity atomic transitions
C  ------------------------------------
C
C   isel  wvlen(A)            transition                type  ispb nspb
C                                                             ispp nspp sz    tg pr wr
C  ----- ---------- ----------------------------------- ----- ---- ---- --    -- -- --
C      1   56.5300  1074(3)1(  2.0)-   2(3)2(  3.0)     excit   1    1  12   569 12  1
C      2   56.5520  1075(1)3(  3.0)-   3(1)4(  4.0)     excit   1    1  12  1068  7  2
C      3   70.1877   414(1)5(  5.0)-   3(1)4(  4.0)     excit   1    1  12   778 31  3
C      4   70.5640   365(3)1(  2.0)-   2(3)2(  3.0)     excit   1    1  12   275 36  4
C      5   70.6699   371(3)3(  4.0)-   3(1)4(  4.0)     excit   1    1  12   756 43  5
C      6   70.7048   367(3)3(  3.0)-   3(1)4(  4.0)     excit   1    1  12   753 44  6

I tried to modify the ADF15 parser to match these lines.

Key changes include:

  • Fixed the ADF15 parser to handle additional raw file (tungsten) formats.

  • Moved regex pattern definitions outside of loops for efficiency.

@munechika-koyo munechika-koyo marked this pull request as ready for review March 22, 2026 00:53
@jacklovell
Copy link
Member

Looks OK to me. I don't think lifting the declaration of a string literal out of the loop will have any impact on performance, as the assignment operation is trivial. If you really wanted to optimize, you could instead create compiled regex objects using r = re.compile(expression) and then use r.match instead of re.match(expression), but Python caches the compilation of regexes anyway so unless there's a large number of regular expressions all being used this wouldn't make a difference either. It is more correct to do so if you're refactoring anyway though, so I'd recommend switching to re.compile.

The other thing it would be good to have in place are some unit/regression tests for this. We can't use data from real ADF15 files in unit tests for licensing reasons, but I don't see a problem with creating some mock files using the same format but using dummy numerical data to use in unit tests: try dummy variants of Hydrogen, Carbon and Tungsten files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants