improve pdf file created by FOM #327

caglayantuna · 2025-02-28T14:46:35Z

This PR solves the issue #123 .

To do this, I improved the save_plot_convergence_model_parameters function. Now, it creates a pdf with several pages (6 plots per page) and it plots some additional parameters such as mixing matrix for all models and zeta from joint model. Since these new parameters have multiple files, I have also added legends for them specifically.

I add an example pdf which was created after fitting data with joint model.
convergence_parameters.pdf

NicolasGensollen

Thanks @caglayantuna !

I made some suggestion to improve the code readability.

Also, it would be nice to add some unit tests for these new functions.

NicolasGensollen · 2025-02-28T15:05:15Z

src/leaspy/io/logs/fit_output_manager.py

+                        / f"{parameter_name}.csv"
+                    )
+                    if os.path.exists(file_path):
+                        file_to_plot = file_path


Lines 191 to 211 can be replaced by:

files_to_plot = [ f for f in self.path_save_model_parameters_convergence.glob(f"{parameter_name}*.csv") ]

If needed the regexp can be made more robust but I think this should be enough.

NicolasGensollen · 2025-02-28T15:26:11Z

src/leaspy/io/logs/fit_output_manager.py

+                        file_to_plot = file_path
+
+                    if any(
+                        f"_{j}" in str(file) for j, file in enumerate(files_to_plot)


My previous suggestion gives you a list of files matching your query. This list can be of length zero, one or more. I suggest you handle these three cases rather than this strange condition.

# in this function you'd have: for i, parameter_name in enumerate(params_to_plot[page : page + 6]): files_to_plot = [ f for f in self.path_save_model_parameters_convergence.glob(f"{parameter_name}*.csv") ] _plot_parameter(files_to_plot, ax[i]) # with helper functions: def _plot_parameter(files_to_plot: Sequence[Path], ax: Axes): if len(files_to_plot) == 0: # Do something... elif len(files_to_plot) == 1: _plot_single_parameter_file(files_to_plot[0], ax) else: _plot_multiple_parameter_file(files_to_plot, ax) def _plot_single_parameter_file(file_to_plot: Path, ax: Axes) -> None: df = pd.read_csv(file_to_plot, index_col=0, header=None) ax.plot(df) def _plot_multiple_parameter_file(files_to_plot: Iterable[Path], ax: Axes) -> None: colormap = cm.viridis(np.linspace(0, 1, len(files_to_plot))) for idx, file in enumerate(files_to_plot): df = pd.read_csv(file, index_col=0, header=None) line = ax.plot(df, color=colormap[idx])[0] lines.append(line) ax.legend(lines, [f.stem for f in files_to_plot], loc="best")

Does that make sense ?

caglayantuna · 2025-03-07T14:24:03Z

Thanks @NicolasGensollen for your comments. I changed this PR after some discussions with the team. Review is not necessary for now. I am attaching latest pdf results to discuss about them.
convergence_parameters_joint.pdf
convergence_parameters_logistic.pdf

caglayantuna · 2025-03-20T13:25:05Z

This PR is ready for reviewing. I added a test function for FitOutputManager. This function tests some functions of FitOutputManager class except print functions. I couldn't find how to test print based functions.

Also, I added plot_sourcewise option to logs. I share 2 pdf results when plot_sourcewise is true.

convergence_parameters_joint_plot_sourcewise.pdf

convergence_parameters_logistic_plot_sourcewise.pdf

NicolasGensollen

Thanks @caglayantuna

I made a first pass. My general comment is that the function you added is very very long, and hence difficult to read and understand as a whole.
I think the code would read much better by extracting some logic in smaller functions.

NicolasGensollen · 2025-04-01T08:13:04Z

src/leaspy/io/logs/fit_output_manager.py

+
+        params_to_plot = list(model.state.tracked_variables - to_skip)
+
+        files = os.listdir(self.path_save_model_parameters_convergence)


self.path_save_model_parameters_convergence should be a Path object and not a string since it is a path.

Also, the logic used to get the files is duplicated, I'd turn it into a function:

def _get_files_related_to_parameters(self, parameters: Iterable[str]) - > list[Path]: return [ f for f in self.path_save_model_parameters_convergence.iterdir() if any(f.name.startswith(param) for param in parameters) ]

So getting the files to plot here would become:

files_to_plot = self._get_files_related_to_parameters(params_to_plot).sort()

Thanks Nicolas I added several helper functions to simplify this function

NicolasGensollen · 2025-04-01T08:22:17Z

src/leaspy/io/logs/fit_output_manager.py

+        # If plot sourcewise is true, new sourcewise csv files will be created
+        if self.plot_sourcewise:
+            new_files = []
+            for param_name in params_with_sources:


This would become:

related_files = self._get_files_related_to_parameters(params_with_sources)

NicolasGensollen · 2025-04-01T08:23:28Z

src/leaspy/io/logs/fit_output_manager.py

+                for source_idx in range(num_sources):
+                    combined_data = []
+
+                    for file_name in related_files:


This would also be simplified since the list holds the full path:

for file_path in related_files: ...

NicolasGensollen · 2025-04-01T08:28:10Z

src/leaspy/io/logs/fit_output_manager.py

+                        combined_data.append(df.iloc[:, source_idx])
+
+                    combined_df = pd.concat(combined_data, axis=1, join="inner")
+                    new_file_name = "sourcewise_" f"{param_name}_{source_idx + 1}.csv"


This looks weird, is this what you wanted to write:

Suggested change

new_file_name = "sourcewise_" f"{param_name}_{source_idx + 1}.csv"

new_file_name = f"sourcewise_{param_name}_{source_idx + 1}.csv"

thanks, since I added sourcewise later, I forgot to change "f". now , it is ok

NicolasGensollen · 2025-04-01T08:31:08Z

src/leaspy/io/logs/fit_output_manager.py

+                _, ax = plt.subplots(3, 2, figsize=(width, 3 * height_per_row))
+                ax = ax.flatten()
+
+                for i, file_name in enumerate(files_to_plot[page : page + 6]):


Suggested change

for i, file_name in enumerate(files_to_plot[page : page + 6]):

for i, file_path in enumerate(files_to_plot[page : page + 6]):

Thanks, done

NicolasGensollen · 2025-04-01T08:31:19Z

src/leaspy/io/logs/fit_output_manager.py

+                ax = ax.flatten()
+
+                for i, file_name in enumerate(files_to_plot[page : page + 6]):
+                    file_path = self.path_save_model_parameters_convergence / file_name


Suggested change

file_path = self.path_save_model_parameters_convergence / file_name

I removed it

NicolasGensollen · 2025-04-01T08:31:35Z

src/leaspy/io/logs/fit_output_manager.py

+
+                for i, file_name in enumerate(files_to_plot[page : page + 6]):
+                    file_path = self.path_save_model_parameters_convergence / file_name
+                    parameter_name = file_name.split(".csv")[0]


Suggested change

parameter_name = file_name.split(".csv")[0]

parameter_name = file_path.name.split(".csv")[0]

NicolasGensollen · 2025-04-01T08:48:22Z

src/leaspy/io/logs/fit_output_manager.py

+                                parameter_name = parameter_name[:-3]
+                            else:
+                                feature_index = int(parameter_name[-1])
+                                parameter_name = parameter_name[:-2]


This is very fragile and not generalizable. You need to use regular expressions for these kind of tasks.

Here is a proposition which could still be improved:

def _extract_parameter_name_and_index(parameter_name: str) -> tuple[Optional[str], Optional[int]]: match = re.search(r'^(.*?)(\d+)$', parameter_name) if match: return match.group(1).strip("_"), int(match.group(2)) return None, None

Which gives:

>>> _extract_parameter_name_and_index("mixing_matrix_10") ('mixing_matrix', 10) >>> _extract_parameter_name_and_index("mixing_matrix_1") ('mixing_matrix', 1) >>> _extract_parameter_name_and_index("mixing_matrix_10400") ('mixing_matrix', 10400) >>> _extract_parameter_name_and_index("mixing_matrix_10400_foo") (None, None) >>> _extract_parameter_name_and_index("mixing_ma") (None, None) >>> _extract_parameter_name_and_index("100_mixing_ma") (None, None) >>> _extract_parameter_name_and_index("v0_12") ('v0', 12)

Thank you very much for your suggestions, it is very useful. However, I couldn't find a solution when input is v0. This is a parameter name and 0 is not index. To fix this, I had to add specific condition for this parameter. If you have any suggestion, would be nice.

NicolasGensollen · 2025-04-01T08:54:02Z

src/leaspy/io/logs/fit_output_manager.py

+                    df_convergence = pd.read_csv(file_path, index_col=0, header=None)
+                    ax[i].plot(df_convergence)
+
+                    if parameter_name == "mixing_matrix":


This looks very hardcoded, would there be a way to have something more generic ?

Also, could you move that to a separate function, this function is enormous and hard to follow along. Something like this maybe:

def _set_title_for_parameter(ax, parameter_name: str): if parameter_name == "mixing_matrix": ax.set_title(...) ...

Normally, we could use params_with_sources instead of hardcoding. However, zeta in joint model has different structure. I couldn't find a better solution but I created a function as you suggested.

caglayantuna · 2025-04-10T08:13:40Z

Thanks @NicolasGensollen for your suggestions. They are really helpful. I changed the PR accordingly, but I still have some questions that you will see in the comments

NicolasGensollen

Thanks @caglayantuna !

I made another pass with more suggestions. Most of them go in the same direction of breaking down huge functions into smaller units which are easier to understand. The base idea of doing so, is that the multi-line operations will be abstracted into the function name in your brain. If the function has no nasty side effects, does only one thing, and has a well-chosen name, it helps a lot and reduces the amount of objects your brain needs to keep track of.

Btw, feel free to improve names if you have better ideas !

src/leaspy/algo/settings.py

src/leaspy/io/logs/fit_output_manager.py

NicolasGensollen · 2025-04-10T09:10:14Z

src/leaspy/io/logs/fit_output_manager.py

+                ax[i].legend(events, loc="best")
+        return ax
+
+    def _update_files_sourcewise(self, files_to_plot, params_with_sources, num_sources):


src/leaspy/io/logs/fit_output_manager.py

NicolasGensollen · 2025-04-10T09:21:29Z

src/leaspy/io/logs/fit_output_manager.py

+        Update the list of files by creating sourcewise files based on the parameters with sources.
+
+        This function processes parameters that have sources and generates new CSV files for each source.
+        These new files are added to the list of files to plot.


Two small comments here:

First of all, this function is doing multiple things, and it's clearly visible when reading the description. Because of that, it's long, and not easy to understand.

Second, it is mutating the files_to_plot list for no reason. A safer implementation would compute the new_files list and return that. The caller would then perform the aggregation of the files_to_plot and new_files. Doing that, this function doesn't even need the files_to_plot list. It could basically look like that:

def _compute_files_sourcewise(self, params_with_sources: list[str], num_sources: int) -> list[Path]: new_files = [] for param_name in params_with_sources: ... return new_files

The caller:

files_to_plot = [ file for file in files_to_plot if not any(file.name.startswith(param) for param in params_with_sources) ] files_to_plot.extend(self._compute_files_sourcewise(...))

WDYT ?

NicolasGensollen · 2025-04-10T09:39:45Z

src/leaspy/io/logs/fit_output_manager.py

+            for source_idx in range(num_sources):
+                combined_data = []
+
+                for file_path in related_files:


We have three imbricated for loop in this function. It's another sign that we are trying to do too much at once.

For each source, we read the related files one by one and concatenate everything together, this concatenation is then saved to disk and the path is added to the new_files list.

Here is a suggestion of how we could refactor this to help readability:

... for source_idx in range(num_sources): new_files.append( _concatenate_and_save_sourcewise_parameters( param_name, related_files, source_idx ) ) ... def _concatenate_and_save_sourcewise_parameters( self, parameter_name: str, files: Iterable[Path], source_idx: int ) -> Path: """Explain what I do...""" df = _concatenate_parameters(files, source_idx) file_name = f"sourcewise_{parameter_name}_{source_idx + 1}.csv" file_path = self.path_save_model_parameters_convergence / file_name df.to_csv(file_path, header=False) return file_path def _concatenate_parameters(files: Iterable[Path], source_idx: int) -> pd.DataFrame: """Explain me...""" return pd.concat( [ pd.read_csv(file_path, index_col=0, header=None).iloc[:, source_idx] for file_path in files ], axis=1, join="inner", )

improve doc and function improve FOM improve fom with tests add local and sourwise option

Co-authored-by: Gensollen <[email protected]>

caglayantuna · 2025-04-18T14:31:19Z

Hello @NicolasGensollen , thanks for your suggestions again. I changed the PR accordingly.

NicolasGensollen

LGTM, thanks @caglayantuna !

* improve pdf file improve doc and function improve FOM improve fom with tests add local and sourwise option * update according to review * update * update * add helper functions with docs * Apply suggestions from code review Co-authored-by: Gensollen <[email protected]> * improve readability --------- Co-authored-by: Caglayan Tuna <[email protected]> Co-authored-by: Gensollen <[email protected]>

NicolasGensollen reviewed Feb 28, 2025

View reviewed changes

caglayantuna force-pushed the fix_fom_joint branch 2 times, most recently from 78e44b5 to 65beca5 Compare March 20, 2025 12:19

NicolasGensollen reviewed Apr 1, 2025

View reviewed changes

caglayantuna force-pushed the fix_fom_joint branch from 65beca5 to ea3bae0 Compare April 7, 2025 14:45

NicolasGensollen reviewed Apr 10, 2025

View reviewed changes

caglayantuna added 5 commits April 18, 2025 14:08

improve pdf file

4d2a4f9

improve doc and function improve FOM improve fom with tests add local and sourwise option

update according to review

9a2c0d5

update

cf4cce6

update

ca924ae

add helper functions with docs

ce0bda7

caglayantuna force-pushed the fix_fom_joint branch from 3d89817 to ce0bda7 Compare April 18, 2025 12:20

caglayantuna and others added 2 commits April 18, 2025 14:24

Apply suggestions from code review

78c067f

Co-authored-by: Gensollen <[email protected]>

improve readability

7b5aae0

NicolasGensollen approved these changes Apr 28, 2025

View reviewed changes

NicolasGensollen merged commit 114cfac into aramis-lab:v2 Apr 28, 2025
9 checks passed

caglayantuna deleted the fix_fom_joint branch May 6, 2025 12:05

caglayantuna mentioned this pull request Jun 17, 2025

Error: No file as Zeta.csv for joint multivariate competition #123

Closed


		params_to_plot = list(model.state.tracked_variables - to_skip)

		files = os.listdir(self.path_save_model_parameters_convergence)

	new_file_name = "sourcewise_" f"{param_name}_{source_idx + 1}.csv"
	new_file_name = f"sourcewise_{param_name}_{source_idx + 1}.csv"

	for i, file_name in enumerate(files_to_plot[page : page + 6]):
	for i, file_path in enumerate(files_to_plot[page : page + 6]):

	parameter_name = file_name.split(".csv")[0]
	parameter_name = file_path.name.split(".csv")[0]

improve pdf file created by FOM #327

improve pdf file created by FOM #327

Uh oh!

Conversation

caglayantuna commented Feb 28, 2025

Uh oh!

NicolasGensollen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

caglayantuna commented Mar 7, 2025

Uh oh!

caglayantuna commented Mar 20, 2025

Uh oh!

NicolasGensollen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

caglayantuna commented Apr 10, 2025

Uh oh!

NicolasGensollen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

caglayantuna commented Apr 18, 2025

Uh oh!

NicolasGensollen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!