Skip to content

Commit 43cb566

Browse files
authored
[Fix] Series-like writing (#130)
1 parent 7500d77 commit 43cb566

File tree

4 files changed

+125
-92
lines changed

4 files changed

+125
-92
lines changed

CHANGELOG.md

Lines changed: 102 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,45 @@
11
# TFS-Pandas Changelog
22

3+
## Version 3.7.3
4+
5+
- Fixed:
6+
- Fixed a regression where the writing of a `pd.Series`-like object to disk was raising an error. It is now possible again.
7+
38
## Version 3.7.2
49

510
- Fixed:
6-
- fixing the issues with `pandas` >= `v2.1.0` (see `tfs-pandas` `v3.7.1`) by overwriting the `_constructor_from_mgr` function.
11+
- fixing the issues with `pandas` >= `v2.1.0` (see `tfs-pandas` `v3.7.1`) by overwriting the `_constructor_from_mgr` function.
712

813
## Version 3.7.1
914

1015
- Changed:
11-
- The dependency on `pandas` was restricted to avoid the latest version, `2.1.0` and above as a temporary workaround to an attribute access bug that arose with it.
16+
- The dependency on `pandas` was restricted to avoid the latest version, `2.1.0` and above as a temporary workaround to an attribute access bug that arose with it.
1217

1318
## Version 3.7.0
1419

1520
Minor API changes to the `TFSCollections`:
16-
- the old `write_to` and `get_filename` are renamed to `_write_to` and `_get_filename` as they
21+
22+
- the old `write_to` and `get_filename` are renamed to `_write_to` and `_get_filename` as they
1723
could only be accessed internally (due to the input parameters not available to the user).
1824
This also means, that - in case they are overwritten by a user's implementation - they need to be renamed there!!
1925

20-
- The column which is set as index can now also be defined manually, by overwriting the attribute `INDEX`, which defaults to `"NAME"`.
26+
- The column which is set as index can now also be defined manually, by overwriting the attribute `INDEX`, which defaults to `"NAME"`.
2127

22-
- New Functions of `TFSCollection` Instances:
23-
- `get_filename(name)`: Returns the associated filename to the property with name `name`.
24-
- `get_path(name)`: Return the actual file path of the property `name`
25-
- `flush()`: Write the current state of the TFSDataFrames into their respective files.
26-
- `write_tfs(filename, data_frame)`: Write the `data_frame` to `self.directory` with the given `filename`.
28+
- New Functions of `TFSCollection` Instances:
29+
- `get_filename(name)`: Returns the associated filename to the property with name `name`.
30+
- `get_path(name)`: Return the actual file path of the property `name`
31+
- `flush()`: Write the current state of the TFSDataFrames into their respective files.
32+
- `write_tfs(filename, data_frame)`: Write the `data_frame` to `self.directory` with the given `filename`.
2733

28-
- New Special Properties of `TFSCollection` Instances:
29-
- `defined_properties`: Tuple of strings of the defined properties on this instance.
30-
- `filenames` is a convenience wrapper for `get_filename()`:
31-
- When called (`filenames(exist: bool)`) returns a dictionary of the defined properties and their associated filenames.
34+
- New Special Properties of `TFSCollection` Instances:
35+
- `defined_properties`: Tuple of strings of the defined properties on this instance.
36+
- `filenames` is a convenience wrapper for `get_filename()`:
37+
- When called (`filenames(exist: bool)`) returns a dictionary of the defined properties and their associated filenames.
3238
The `exist` boolean filters between existing files or filenames for all properties.
33-
- Can also be used either `filenames.name` or `filenames[name]` to call `get_filename(name)` on the instance.
39+
- Can also be used either `filenames.name` or `filenames[name]` to call `get_filename(name)` on the instance.
3440

35-
- Moved the define-properties functions directly into the `Tfs`-attribute marker class.
36-
- Return of `None` for the `MaybeCall` class in case of attribute not found (instead of empty function, which didn't make sense).
41+
- Moved the define-properties functions directly into the `Tfs`-attribute marker class.
42+
- Return of `None` for the `MaybeCall` class in case of attribute not found (instead of empty function, which didn't make sense).
3743

3844
## Version 3.6.0
3945

@@ -56,177 +62,184 @@ Minor API changes to the `TFSCollections`:
5662
## Version 3.5.1
5763

5864
- Fixed:
59-
- Allow reading of empty lines in headers again.
65+
- Allow reading of empty lines in headers again.
6066

6167
## Version 3.5.0
6268

6369
- Fixed:
64-
- Any empty strings ("") in a file's columns will now properly be read as such and not converted to `NaN`.
70+
- Any empty strings ("") in a file's columns will now properly be read as such and not converted to `NaN`.
6571

6672
- Added:
6773
- It is now possible to only read the headers of a file by using a new function, `read_headers`. The function API is not exported at the top level of the package but is available to import from `tfs.reader`.
6874

6975
## Version 3.4.0
7076

7177
- Added:
72-
- The `read_tfs` and `write_tfs` functions can now handle reading / writing compressed files, see documentation for details.
78+
- The `read_tfs` and `write_tfs` functions can now handle reading / writing compressed files, see documentation for details.
7379

7480
## Version 3.3.1
7581

7682
- Changed:
77-
- Column types are now assigned at read time instead of later on, which should improve performance for large data frames.
83+
- Column types are now assigned at read time instead of later on, which should improve performance for large data frames.
7884

7985
## Version 3.3.0
8086

8187
- Added:
82-
- The option is now given to the user to skip data frame validation after reading from file / before writing to file. Validation is left "on" by default, but can be turned off with a boolean argument.
88+
- The option is now given to the user to skip data frame validation after reading from file / before writing to file. Validation is left "on" by default, but can be turned off with a boolean argument.
8389

8490
- Changes:
85-
- The `tfs.frame.validate` function has seen its internal logic reworked to be more efficient and users performing validation on large data frames should notice a significant performance improvement.
86-
- The documentation has been expanded and improved, with notably the addition of example code snippets.
91+
- The `tfs.frame.validate` function has seen its internal logic reworked to be more efficient and users performing validation on large data frames should notice a significant performance improvement.
92+
- The documentation has been expanded and improved, with notably the addition of example code snippets.
8793

8894
## Version 3.2.1
8995

9096
- Changed:
91-
- Allow spaces in header names.
97+
- Allow spaces in header names.
9298

9399
## Version 3.2.0
94100

95-
- Added:
96-
- HDF5 read/write.
101+
- Added:
102+
- HDF5 read/write.
97103

98104
- Changed:
99105
- The minimum required Python version is now `3.7`.
100106

101107
## Version 3.1.0
102108

103109
- Fixed:
104-
- Removed dependency on depricated `numpy.str`
110+
- Removed dependency on depricated `numpy.str`
105111

106112
- Changed:
107-
- No logging of error messages internally for reading files and checking dataframes.
113+
- No logging of error messages internally for reading files and checking dataframes.
108114
Instead logging is either moved to `debug`-level or all info is now in the error message itself
109115
to be handled externally by the user.
110116

111117
## Version 3.0.2
112118

113119
- Fixed:
114-
- String representation of empty headers is fixed (accidentally printed 'None' before).
120+
- String representation of empty headers is fixed (accidentally printed 'None' before).
115121

116122
## Version 3.0.1
117123

118124
- Fixed:
119-
- Merging functionality from `TfsDataFrame.append`, `TfsDataFrame.join`, `TfsDataFrame.merge` and `tfs.concat` do not crash anymore when encountering a `pandas.DataFrame` (or more for `tfs.concat`) in their input. Signatures have been updated and tests were added for this behavior.
125+
- Merging functionality from `TfsDataFrame.append`, `TfsDataFrame.join`, `TfsDataFrame.merge` and `tfs.concat` do not crash anymore when encountering a `pandas.DataFrame` (or more for `tfs.concat`) in their input. Signatures have been updated and tests were added for this behavior.
120126

121127
## Version 3.0.0
122128

123129
A long-standing issue where merging functionality used on `TfsDataFrame` (through `.merge` or `pandas.concat` for instance) would cause them to be cast back to `pandas.DataFrame` and lose their headers has been patched.
124130

125131
- Breaking changes:
126-
- The internal API has been reworked for clarity and consistency. Note that anyone previously using the high-level exports `tfs.read`, `tfs.write` and `tfs.TfsDataFrame` **will not be affected**.
132+
- The internal API has been reworked for clarity and consistency. Note that anyone previously using the high-level exports `tfs.read`, `tfs.write` and `tfs.TfsDataFrame` **will not be affected**.
127133

128134
- Added:
129-
- The `TfsDataFrame` class now has new `.append`, `.join` and `.merge` methods wrapping the inherited methods of the same name and fixing the aforementioned issue.
130-
- A `tfs.frame.concat` function, exported as `tfs.concat`, has been added to wrap `pandas.concat` and fix the aforementioned issue.
131-
- A `tfs.frame.merge_headers` function has been added.
132-
- Top level exports are now: `tfs.TfsDataFrame`, `tfs.read`, `tfs.write` and `tfs.concat`.
135+
- The `TfsDataFrame` class now has new `.append`, `.join` and `.merge` methods wrapping the inherited methods of the same name and fixing the aforementioned issue.
136+
- A `tfs.frame.concat` function, exported as `tfs.concat`, has been added to wrap `pandas.concat` and fix the aforementioned issue.
137+
- A `tfs.frame.merge_headers` function has been added.
138+
- Top level exports are now: `tfs.TfsDataFrame`, `tfs.read`, `tfs.write` and `tfs.concat`.
133139

134140
- Changes:
135-
- The `tfs.frame.validate` function is now a public-facing documented API and may be used stably.
136-
- The `write_tfs` function now appends an `EOL` (`\n`) at the end of the file when writing out for visual clarity and readability. This is a purely cosmetic and **does not** change functionality / compatibility of the files.
137-
- Documentation and README have been updated and cleared up.
141+
- The `tfs.frame.validate` function is now a public-facing documented API and may be used stably.
142+
- The `write_tfs` function now appends an `EOL` (`\n`) at the end of the file when writing out for visual clarity and readability. This is a purely cosmetic and **does not** change functionality / compatibility of the files.
143+
- Documentation and README have been updated and cleared up.
138144

139145
Please do refer to the documentation for the use of the new merging functionality to be aware of caveats, especially when merging headers.
140146

141-
142147
## Version 2.1.0
143148

144149
- Changes:
145-
- The parsing in `read_tfs` has been reworked to make use of `pandas`'s C engine, resulting in drastic performance improvements when loading files. No functionality was lost or changed.
150+
- The parsing in `read_tfs` has been reworked to make use of `pandas`'s C engine, resulting in drastic performance improvements when loading files. No functionality was lost or changed.
146151

147152
## Version 2.0.3
148153

149154
- Fixed:
150-
- Took care of a numpy deprecation warning when using `np.str`, which should not appear anymore for users.
155+
- Took care of a numpy deprecation warning when using `np.str`, which should not appear anymore for users.
151156

152157
- Changes:
153-
- Prior to version `2.0.3`, reading and writing would raise a `TfsFormatError` in case of non-unique indices or columns. From now on, this behavior is an option in `read_tfs` and `write_tfs`called `non_unique_bahvior` which by default is set to log a warning. If explicitely asked by the user, the failed check will raise a `TfsFormatError`.
158+
- Prior to version `2.0.3`, reading and writing would raise a `TfsFormatError` in case of non-unique indices or columns. From now on, this behavior is an option in `read_tfs` and `write_tfs`called `non_unique_bahvior` which by default is set to log a warning. If explicitely asked by the user, the failed check will raise a `TfsFormatError`.
154159

155160
## Version 2.0.2
161+
156162
- Fixed:
157-
- Proper error on non-string columns
158-
- Writing numeric-only mixed type dataframes bug
163+
- Proper error on non-string columns
164+
- Writing numeric-only mixed type dataframes bug
159165

160166
## Version 2.0.1
167+
161168
- Fixed:
162-
- No longer warns on MAD-X styled string column types (`%[num]s`).
163-
- Documentation is up-to-date, and plays nicely with `Sphinx`'s parsing.
164-
- Fix a wrong type hint.
169+
- No longer warns on MAD-X styled string column types (`%[num]s`).
170+
- Documentation is up-to-date, and plays nicely with `Sphinx`'s parsing.
171+
- Fix a wrong type hint.
165172

166173
## Version 2.0.0
174+
167175
- Breaking Changes:
168-
- `FixedColumn`, `FixedColumnCollection` and `FixedTfs` have been removed from the package
169-
- Objects are not converted to strings upon read anymore, and will raise an error
170-
- Minimum pandas version is 1.0
176+
- `FixedColumn`, `FixedColumnCollection` and `FixedTfs` have been removed from the package
177+
- Objects are not converted to strings upon read anymore, and will raise an error
178+
- Minimum pandas version is 1.0
171179

172180
- Fixed:
173-
- No longer writes an empty line to file in case of empty headers
174-
- "Planed" dataframes capitalize plane key attributes to be consistent with other `pylhc` packages, however they can be accessed with and without capitalizing your query.
181+
- No longer writes an empty line to file in case of empty headers
182+
- "Planed" dataframes capitalize plane key attributes to be consistent with other `pylhc` packages, however they can be accessed with and without capitalizing your query.
175183

176184
- Changes:
177-
- Minimum required `numpy` version is now 1.19
178-
- TfsDataFrames now automatically cast themselves to pandas datatypes using `.convert_dtypes()`
179-
- Lighter dependency matrix
180-
- Full testing of supported Python versions across linux, macOS and windows systems through Github Actions
185+
- Minimum required `numpy` version is now 1.19
186+
- TfsDataFrames now automatically cast themselves to pandas datatypes using `.convert_dtypes()`
187+
- Lighter dependency matrix
188+
- Full testing of supported Python versions across linux, macOS and windows systems through Github Actions
181189

182190
## Version 1.0.5
191+
183192
- Fixed:
184-
- Bug with testing for headers, also in pandas DataFrames
185-
- Same testing method for all data-frame comparisons
186-
- Some minor fixes
193+
- Bug with testing for headers, also in pandas DataFrames
194+
- Same testing method for all data-frame comparisons
195+
- Some minor fixes
187196

188197
- Added:
189-
- Testing of writing of pandas DataFrames
190-
198+
- Testing of writing of pandas DataFrames
191199

192200
## Version 1.0.4
193-
- Added:
194-
- support for pathlib Paths
195-
- strings with spaces support (all strings in data are quoted)
196-
- more validation checks (no spaces in header/columns)
197-
- nicer string representation
198-
- left-align of index-column
199201

200-
- Removed:
201-
- `.indx` from class (use `index="NAME"` instead)
202+
- Added:
203+
- support for pathlib Paths
204+
- strings with spaces support (all strings in data are quoted)
205+
- more validation checks (no spaces in header/columns)
206+
- nicer string representation
207+
- left-align of index-column
208+
209+
- Removed:
210+
- `.indx` from class (use `index="NAME"` instead)
202211

203-
- Fixed:
204-
- Writing of empty dataframes
205-
- Doc imports
206-
- Minor bugfixes
212+
- Fixed:
213+
- Writing of empty dataframes
214+
- Doc imports
215+
- Minor bugfixes
207216

208217
## Version 1.0.3
209-
- Fixed:
210-
- From relative to absolute imports (IMPORTANT FIX!!)
218+
219+
- Fixed:
220+
- From relative to absolute imports (IMPORTANT FIX!!)
211221

212222
## Version 1.0.2
213-
- Fixed:
214-
- Additional index column after writing is removed again
215-
- Renamded sigificant_numbers to significant_digits
216-
- significant_digits throws proper error if zero-error is given
217223

218-
- Added:
219-
- Fixed Dataframe Class
220-
- Type Annotations
224+
- Fixed:
225+
- Additional index column after writing is removed again
226+
- Renamded sigificant_numbers to significant_digits
227+
- significant_digits throws proper error if zero-error is given
228+
229+
- Added:
230+
- Fixed Dataframe Class
231+
- Type Annotations
221232

222233
## Version 1.0.1
223-
- Fixed:
224-
- Metaclass-Bug in Collections
225234

226-
- Added:
227-
- Additional Unit Tests
228-
- Versioning
229-
- Changelog
235+
- Fixed:
236+
- Metaclass-Bug in Collections
237+
238+
- Added:
239+
- Additional Unit Tests
240+
- Versioning
241+
- Changelog
230242

231243
## Version 1.0.0
232-
- Initial Release
244+
245+
- Initial Release

tests/test_writer.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
import pathlib
33
import random
44
import string
5-
import sys
65

76
import numpy
87
import pandas
@@ -36,6 +35,22 @@ def test_tfs_write_empty_columns_dataframe(self, tmp_path):
3635
assert_frame_equal(df, new)
3736
assert_dict_equal(df.headers, new.headers, compare_keys=True)
3837

38+
def test_tfs_write_series_like_dataframe(self, tmp_path):
39+
"""Write-read a pandas.Series-like to disk and make sure all goes right."""
40+
df = pandas.Series([1,2,3,4,5])
41+
42+
write_location = tmp_path / "test.tfs"
43+
test_headers = {"test": 1, "test_string": "test_write_series_like"}
44+
write_tfs(write_location, df, headers_dict=test_headers, save_index=True)
45+
assert write_location.is_file()
46+
47+
# Read data will be TfsDataFrame, so in pd.DataFrame-like form
48+
# For the comparison we only compare the column (as Series-like) and accept that the
49+
# user sees a little difference in the data format (Series vs DataFrame with 1 column)
50+
new = read_tfs(write_location)
51+
assert_series_equal(df, new["0"], check_names=False)
52+
assert_dict_equal(test_headers, new.headers, compare_keys=True)
53+
3954
def test_madx_reads_written_tfsdataframes(self, _bigger_tfs_dataframe, tmp_path):
4055
dframe = _bigger_tfs_dataframe
4156
dframe.headers["TYPE"] = "TWISS" # MAD-X complains on TFS files with no "TYPE" header

tfs/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
__title__ = "tfs-pandas"
1111
__description__ = "Read and write tfs files."
1212
__url__ = "https://github.com/pylhc/tfs"
13-
__version__ = "3.7.2"
13+
__version__ = "3.7.3"
1414
__author__ = "pylhc"
1515
__author_email__ = "[email protected]"
1616
__license__ = "MIT"

tfs/writer.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,12 @@ def write_tfs(
9797
"""
9898
left_align_first_column = False
9999
tfs_file_path = pathlib.Path(tfs_file_path)
100-
100+
101+
# Force a conversion from pd.Series-like to TfsDataFrame to avoid empty columns issues
102+
if not isinstance(data_frame, (TfsDataFrame, pd.DataFrame)):
103+
data_frame = TfsDataFrame(data_frame)
104+
data_frame.columns = data_frame.columns.astype(str) # need column names to be strings
105+
101106
if validate:
102107
validate_frame(data_frame, f"to be written in {tfs_file_path.absolute()}", non_unique_behavior)
103108

0 commit comments

Comments
 (0)