Skip to content

Commit 50f927e

Browse files
authored
MAD-NG Features and Compatibility Modes (#135)
1 parent aa8f961 commit 50f927e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+2056
-698
lines changed

.github/workflows/coverage.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,5 @@ jobs:
1212
uses: pylhc/.github/.github/workflows/coverage.yml@master
1313
with:
1414
src-dir: tfs
15-
pytest-options: -m "not cern_network" --cov-report term-missing
15+
pytest-options: --cov-report term-missing
1616
secrets: inherit

.github/workflows/cron.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
11
# Runs all tests on master on Mondays at 3 am (UTC time)
22
name: Cron Testing
33

4-
5-
on:
4+
on:
65
schedule:
76
- cron: '* 3 * * mon'
87

.zenodo.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
},
1717
{
1818
"name": "Felix Soubelet",
19-
"affiliation": "University of Liverpool & CERN",
19+
"affiliation": "CERN",
2020
"orcid": "0000-0001-8012-1440"
2121
},
2222
{

CHANGELOG.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,37 @@
11
# TFS-Pandas Changelog
22

3+
## Version 4.0.0
4+
5+
Version `4.0` is a major release bringing compatibility with `MAD-NG` features in **TFS** files and tables, apart from the more exotic ones.
6+
We also bring new documentation pages regarding the **TFS** format, code compatibilities, and the new features.
7+
Please have a look at the documentation should you intent to use `tfs-pandas` 4.0, as there are a few (now documented) caveats.
8+
9+
- Important:
10+
- Support for `Python 3.9` has been dropped. The minimum required Python version is now `3.10`.
11+
- DataFrame validation is now OFF by default, both when reading from and writing to file. It is up to the user to ask for a given validation mode.
12+
- Minimum supported `MAD-NG` version is `1.0.0`, due to synchronized development of some feature details. The corresponding `pymadng` version is `0.6.0`.
13+
14+
- Added:
15+
- Handling of boolean-type and complex-type header values (`MAD-NG` feature).
16+
- Handling of bolean-type and complex-type columns (`MAD-NG` feature).
17+
- Handling of nullable-type `nil` values in headers and columns (`MAD-NG` feature).
18+
- Compatibility modes for dataframe validation. The `tfs.frame.validate` function now expects this, and valid choices are `MADX`, `MAD-X`, `MADNG` and `MAD-NG` (case-insensitive, see API documentation).
19+
- Many new exceptions have been created for raised errors, which will be more specific. They all inherit from the previously raised `TfsFormatError`, so user code that was catching it will still work.
20+
21+
- Changed:
22+
- By default, `TfsDataFrame` validation is now skipped on reading.
23+
- By default, `TfsDataFrame` validation is now skipped on writing.
24+
- By default, `TfsDataFrame` validation in `MAD-X` requires headers to be present in the dataframe.
25+
26+
- Fixed:
27+
- Writing a dataframe with no headers (not empty headers), e.g. a `pandas.DataFrame` - now works correctly.
28+
29+
- Documentation:
30+
- The documentation has been updated to reflect the new features and changes.
31+
- The documentation now includes a new page on the `TFS` format itself.
32+
- The documentation now includes a new page on compatibility modes for `TfsDataFrame` validation.
33+
- A great deal of internal documentation has been added to the codebase.
34+
335
## Version 3.9.0
436

537
- Added:
@@ -93,7 +125,7 @@ Minor API changes to the `TFSCollections`:
93125
## Version 3.5.2
94126

95127
- Changed:
96-
- The dependency on `pandas` has been pinned to `<2.0` to guarantee the proper functionning of the compability `append` and `join` methods in `TfsDataFrames`. These will be removed with the next release of `tfs-pandas` and users should use the `tfs.frame.concat` compatibility function instead.
128+
- The dependency on `pandas` has been pinned to `<2.0` to guarantee the proper functionning of the compatibility `append` and `join` methods in `TfsDataFrames`. These will be removed with the next release of `tfs-pandas` and users should use the `tfs.frame.concat` compatibility function instead.
97129

98130
## Version 3.5.1
99131

README.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@
99
[![Conda-forge Version](https://img.shields.io/conda/vn/conda-forge/tfs-pandas?color=orange&logo=anaconda)](https://anaconda.org/conda-forge/tfs-pandas)
1010
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5070986.svg)](https://doi.org/10.5281/zenodo.5070986)
1111

12-
This package provides reading and writing functionality for [**Table Format System (TFS)** files](http://mad.web.cern.ch/mad/madx.old/Introduction/tfs.html).
13-
Files are read into a `TfsDataFrame`, a class built on top of the famous `pandas.DataFrame`, which in addition to the normal behavior attaches a dictionary of headers to the `DataFrame`.
12+
This package provides reading and writing functionality for [**Table Format System (TFS)**](https://pylhc.github.io/tfs/tfsformat.html) files.
13+
Files are read into a `TfsDataFrame`, a class built on top of the `pandas.DataFrame`, which in addition to the normal behavior attaches a dictionary of headers to the `DataFrame`.
1414

1515
See the [API documentation](https://pylhc.github.io/tfs/) for details.
1616

@@ -45,14 +45,19 @@ data_frame.headers["NEW_KEY"] = some_variable
4545
# Manipulate data as you do with pandas DataFrames
4646
data_frame["NEWCOL"] = data_frame.COL_A * data_frame.COL_B
4747

48-
# You can check the validity of a TfsDataFrame, and choose the behavior in case of errors
49-
tfs.frame.validate(data_frame, non_unique_behavior="raise") # or choose "warn"
48+
# You can check the validity of a TfsDataFrame, speficying the
49+
# compatibility mode as well as the behavior in case of errors
50+
tfs.frame.validate(
51+
data_frame,
52+
non_unique_behavior="raise", # or choose "warn"
53+
compatibility="mad-x", # or choose "mad-ng"
54+
)
5055

5156
# Writing out to disk is simple too
5257
tfs.write("path_to_output.tfs", data_frame, save_index="index_column")
5358
```
5459

55-
Reading and writing compressed files is also supported, and done automatically based on the provided file extension:
60+
Compression is automatically supported, based on the provided file extension (for supported formats):
5661

5762
```python
5863
import tfs

doc/compatibility.rst

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
MAD-X and MAD-NG Compatibility
2+
==============================
3+
4+
As `tfs-pandas` allows one to write `TfsDataFrames` as files in the **TFS** format, which are typically output by simulations codes, compatibility of these files with said codes is crucial.
5+
Specifically, `tfs-pandas` aims to ensure the files it writes to disk are accepted as input for `MAD-X <https://madx.web.cern.ch/>`_ and `MAD-NG <https://madx.web.cern.ch/releases/madng/html/>`_.
6+
7+
However, as ``MAD-NG`` is the successor to ``MAD-X``, it includes new features regarding the **TFS** format, and files including these features will not be accepted by ``MAD-X``.
8+
To circumvent this issue, `tfs-pandas` offers functionality - named validation - to ensure compatibility with either code.
9+
10+
TfsDataFrame Validation
11+
-----------------------
12+
13+
It is possible to perform automatic validation of a `TfsDataFrame` both when reading and writing, or to validate them at any time using the `tfs.frame.validate` function.
14+
Validation enforces the rules described in the :ref:`caveats section <tfs-pandas caveats>`, both to guarantee the integrity of the dataframe and compatibility of written files with simulation codes.
15+
16+
.. admonition:: When Does Validation Happen?
17+
18+
Validation is **optional**, and is by default turned off at both read-time and write-time.
19+
20+
Validation is done by providing a `TfsDataFrame` and a compatibility mode to `tfs.frame.validate` (see the :ref:`API reference <modules/index:frame>`).
21+
It goes as:
22+
23+
.. code-block:: python
24+
25+
import tfs
26+
from tfs.frame import validate
27+
28+
df = tfs.read("path/to/file.tfs")
29+
30+
# To validate with MAD-X compatibility
31+
validate(df, compatibility="mad-x") # or use "madx"
32+
33+
# To validate with MAD-NG compatibility
34+
validate(df, compatibility="mad-ng") # or use "madng"
35+
36+
In case of compatibility issue, an exception is raised which will point to the specific incompatible element.
37+
All exceptions inherit from the `TfsFormatError`, which one can `except` as a catch-all for this package.
38+
39+
.. _common rules:
40+
41+
Common Validation Rules
42+
-----------------------
43+
44+
In either compatibility mode, some common rules are enforced.
45+
These rules are listed and described in the :ref:`API reference <modules/index:frame>` for the `tfs.frame.validate` function.
46+
47+
When validating a `TfsDataFrame`, the behavior in case one of these rules is violated depends on the value of the `non_unique_behavior` parameter.
48+
These rules are *always* checked against when validating a `TfsDataFrame`.
49+
Additional checks can be performed by setting the `compatibility` parameter, as described in the :ref:`MAD-NG <madng mode>` and :ref:`MAD-X <madx mode>` below.
50+
51+
.. _madng mode:
52+
53+
MAD-NG Compatibility
54+
--------------------
55+
56+
.. admonition:: Supported Versions
57+
58+
The compatibility with ``MAD-NG`` is guaranteed starting with version `1.0.0`.
59+
See below for details and caveats.
60+
61+
Since ``MAD-NG`` implements and accepts more features into its **TFS** files, its compatibility mode is naturally less restrictive.
62+
Namely, the following are accepted by ``MAD-NG`` and ``MAD-NG`` only:
63+
64+
- Boolean dtype for header parameters and table columns.
65+
- Complex dtype for header parameters and table columns.
66+
- Nullable dtype for header parameters and table columns (value `nil`).
67+
68+
.. admonition:: Complex Number Representation
69+
70+
In Python, the imaginary part of a complex number is represented by the letter ``j``, as in `1.4 + 2.6j`.
71+
When writing complex values to file, `tfs-pandas` will instead use the ``MAD-NG`` (read `Lua`) representation, and uses the letter ``i``, as in `1.4 + 2.6i`, so that ``MAD-NG`` can properly read such a file.
72+
Both of these representations will be correctly read by `tfs-pandas` (including when ``MAD-NG`` uses ``I`` for special complex numbers).
73+
74+
.. admonition:: Handling of Nullable Types
75+
76+
``MAD-NG`` uses the nullable `nil`, which is accepted by `tfs-pandas` with the following caveats:
77+
78+
- When reading from file, `nil` values in the headers are converted to `None` while those in the dataframe are cast to `NaN`, except in string-dtyped columns where they are converted to `None`.
79+
- When writing to file, `None` values anywhere are written as `nil`, and `NaN` values in the dataframe are written as `NaN` (remember that setting a `None` in a numeric `pandas.DataFrame` column automatically casts it as `NaN`).
80+
81+
.. attention::
82+
83+
The exotic "features" of ``MAD-NG`` such as the ``Lua`` operator overloading for ranges and tables, and their inclusion in **TFS** files are not supported by `tfs-pandas`.
84+
Should one need to use these features, it is recommended to go through the `pymadng <https://pymadng.readthedocs.io/en/latest/>`_ package to handle them in-memory.
85+
86+
.. _madx mode:
87+
88+
MAD-X Compatibility
89+
-------------------
90+
91+
The ``MAD-X`` compatibility mode is more restrictive, and enforces that none of the features listed in the :ref:`MAD-NG section <madng mode>` appear in the `TfsDataFrame`.
92+
93+
Additionally, ``MAD-X`` will refuse to read into a table any **TFS** file that does not include a `TYPE` entry in the headers (which should be a string).
94+
As such, when checking for compatibility with ``MAD-X``:
95+
96+
- If the dataframe has no headers, an error will be raised indicating the dataframe should have headers.
97+
- If the dataframe has headers but no `TYPE` entry is not found, `tfs-pandas` will log a warning and add one with the value `"Added by tfs-pandas for MAD-X compatibility"`
98+
99+
.. admonition:: Default mode
100+
101+
The default compatibility mode enforced by the `tfs.frame.validate` function is ``MAD-X``.

doc/conf.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ def about_package(init_posixpath: pathlib.Path) -> dict:
5050
# ones.
5151
extensions = [
5252
"sphinx.ext.autodoc", # Include documentation from docstrings
53+
"sphinx.ext.autosectionlabel", # Create explicit doc targets for each section
5354
"sphinx.ext.coverage", # Collect doc coverage stats
5455
"sphinx.ext.doctest", # Test snippets in the documentation
5556
"sphinx.ext.githubpages", # Publish HTML docs in GitHub Pages
@@ -64,7 +65,7 @@ def about_package(init_posixpath: pathlib.Path) -> dict:
6465
]
6566

6667
# Config for autosectionlabel extension
67-
autosectionlabel_prefix_document = True
68+
autosectionlabel_prefix_document = True # Make sure the target is unique
6869
autosectionlabel_maxdepth = 2
6970

7071
# Config for the napoleon extension
@@ -138,7 +139,7 @@ def about_package(init_posixpath: pathlib.Path) -> dict:
138139

139140
html_theme_options = {
140141
"collapse_navigation": False,
141-
"display_version": True,
142+
"version_selector": True, # replaces 'display_version' since sphinx-rtd-theme 3.0 but only works on ReadTheDocs
142143
"logo_only": True,
143144
"navigation_depth": 3,
144145
}

doc/index.rst

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,30 @@
11
Welcome to tfs-pandas' documentation!
22
=====================================
33

4-
``tfs-pandas`` is a library for reading and writing capabilities for `TFS files <http://mad.web.cern.ch/mad/madx.old/Introduction/tfs.html>`_ used at `CERN <https://home.cern/>`_.
4+
``tfs-pandas`` is a library for reading and writing capabilities for **TFS** files used at `CERN <https://home.cern/>`_, namely by codes such as `MAD-X <https://madx.web.cern.ch/>`_ and `MAD-NG <https://madx.web.cern.ch/releases/madng/html/>`_.
55

6-
It provides functionality through a ``TfsDataFrame`` object, an extension of the popular **pandas** ``DataFrame``, which in addition to the normal behaviour attaches a dictionary of headers to the ``DataFrame``.
6+
It provides functionality through a ``TfsDataFrame`` object, an extension of the popular **pandas** ``DataFrame``, which in addition to the normal behaviour attaches a dictionary of headers to the dataframe.
77
Functions are also exported that handle reading and writing of **TFS** files to and from ``TfsDataFrames`` as well as merging and validating for ``TfsDataFrames``.
88

99
.. admonition:: **Package Scope**
1010

1111
The package only has as a goal to provide a simple and easy to use interface from **TFS** files to a familiar object build upon the `pandas.DataFrame`.
1212
It is not meant to implement various calculations on `TfsDataFrames`.
1313

14-
Tools relative to the **TFS** format are provided, such as validating a `~.TfsDataFrame` and its headers; or lazily manage a collection of **TFS** files.
14+
Tools relative to the **TFS** format are provided, such as validating a `TfsDataFrame` and its headers; or lazily managing a collection of **TFS** files.
1515

1616

1717
Installation
1818
============
1919

20-
Installation is easily done via `pip`:
20+
The package is published on `PyPI` and installation is easily done via `pip`:
2121

2222
.. code-block:: bash
2323
2424
python -m pip install tfs-pandas
2525
26-
One can also install in a `conda`/`mamba` environment via the `conda-forge` channel with:
26+
There is also a maintained version of the package on `conda-forge`.
27+
One can install in a `conda`/`mamba` environment via the `conda-forge` channel with:
2728

2829
.. code-block:: bash
2930
@@ -40,6 +41,8 @@ Contents
4041
:maxdepth: 2
4142

4243
quickstart
44+
tfsformat
45+
compatibility
4346
modules/index
4447

4548

doc/quickstart.rst

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,6 @@
44
Yes, 2 minutes.
55
That's how little it takes!
66

7-
.. hint::
8-
9-
You can click the function names in the code examples below to go directly to their documentation.
10-
117
Basic Usage
128
-----------
139

@@ -44,7 +40,7 @@ Compression
4440

4541
A **TFS** file being text-based, it benefits heavily from compression.
4642
Thankfully, `tfs-pandas` supports automatic reading and writing of various compression formats.
47-
Just use the API as you would normally, and the compression will be handled automatically:
43+
Just use the API as you would normally, and the compression will be handled automatically based on the extension:
4844

4945
.. autolink-preface:: import tfs
5046
.. code-block:: python
@@ -60,7 +56,7 @@ First though, one needs to install the package with the `hdf5` extra requirement
6056

6157
.. code-block:: bash
6258
63-
python -m pip install --upgrade tfs-pandas[hdf5]
59+
python -m pip install --upgrade "tfs-pandas[hdf5]"
6460
6561
Then, access the functionality from `tfs.hdf`.
6662

@@ -70,13 +66,22 @@ Then, access the functionality from `tfs.hdf`.
7066
from tfs.hdf import read_hdf, write_hdf
7167
7268
# Read a TfsDataFrame from an HDF5 file
73-
df = tfs.hdf.read("path_to_input.hdf5", key="key_in_hdf5_file")
69+
df = tfs.hdf.read_hdf("path_to_input.hdf5", key="key_in_hdf5_file")
7470
7571
# Write a TfsDataFrame to an HDF5 file
76-
tfs.hdf.write("path_to_output.hdf5", df, key="key_in_hdf5_file")
72+
tfs.hdf.write_hdf("path_to_output.hdf5", df, key="key_in_hdf5_file")
73+
74+
Validation
75+
----------
76+
77+
As **TFS** files typically come from the output of simulations codes, validation modes are available to ensure compatibility with said codes.
78+
This is done through the `tfs.frame.validate` function, or relevant arguments in both the reader and writer functions.
79+
80+
As validation modes and compatibility details are complex, validation warrants its own documentation page.
81+
Please refer to the :doc:`compatibility and validation guide <compatibility>` for more information.
7782

78-
Compatibility
79-
-------------
83+
Function Replacements
84+
---------------------
8085

8186
Finally, some replacement functions are provided for some `pandas` operations which, if used, would return a `pandas.DataFrame` instead of a `~.TfsDataFrame`.
8287

@@ -89,9 +94,9 @@ Finally, some replacement functions are provided for some `pandas` operations wh
8994
# This returns a pandas.DataFrame and makes you lose the headers
9095
result = pd.concat([df1, df2])
9196
92-
# Instead, use our own
97+
# Instead, use our own wrapper
9398
result = tfs.frame.concat([df1, df2]) # you can choose how to merge headers too
9499
assert isinstance(result, tfs.TfsDataFrame) # that's ok!
100+
assert getattr(result, "headers", None) is not None # headers are not lost
95101
96102
That's it!
97-
Happy using :)

0 commit comments

Comments
 (0)