-
Notifications
You must be signed in to change notification settings - Fork 0
Step Log visualization optimization #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…ined in top of file. Less color repetition in subplots
…n stay hidden while switching episode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes the step log visualization tool to address performance and usability issues. The key improvement is replacing the approach of pre-plotting all episodes (and hiding unselected ones) with dynamically updating trace data via the slider control. This change significantly reduces file sizes (e.g., from 374MB to 78MB in large cases) and improves responsiveness while fixing trace hiding bugs and color repetition issues.
Key changes:
- Refactored slider implementation to update trace data dynamically instead of toggling visibility of pre-rendered traces
- Added separate legends for each subplot instead of a single shared legend
- Improved memory management with explicit cleanup of intermediate dataframes
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # per_episode is used to define the x/y data for each episode | ||
| per_episode_x = [] | ||
| per_episode_y = [] |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable names per_episode_x and per_episode_y are not fully descriptive. Consider renaming to episode_x_data and episode_y_data or per_episode_x_values and per_episode_y_values to make it clearer that these are lists of data values for each episode.
| env_id: int, | ||
| episode_start: int = 0, | ||
| episode_end: int = np.iinfo(int).max, | ||
| episode_end: int = 10**12, |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The default value changed from np.iinfo(int).max to 10**12. While 10**12 is simpler, it's a magic number that could theoretically be exceeded (though unlikely in practice). Consider using sys.maxsize or float('inf') which are more idiomatic for representing "no upper limit", or at least add a comment explaining why 10**12 was chosen.
| with the selector [("obs", (0, 2)), ("infos.geometry_information", (3, 6))] | ||
| the function will extract the first two observation values and the | ||
| geometry information values (which are part of the info field) from index 3 | ||
| to 5 (6 is exclusive) from |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete sentence in documentation example. The sentence ends with "from index 3 to 5 (6 is exclusive) from" but doesn't specify what the data is extracted from (e.g., "from the dataframe" or "from each row").
| to 5 (6 is exclusive) from | |
| to 5 (6 is exclusive) from the dataframe. |
| ].tolist() | ||
|
|
||
| # Get all unique episodes | ||
| raise ValueError("Empty or malformed step log.") |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message "Empty or malformed step log." is less informative than the original message that included the file path. Consider including the file path to help users identify which file is problematic: f"The provided step log file {step_log_file} is empty or does not follow the current format."
| raise ValueError("Empty or malformed step log.") | |
| raise ValueError(f"The provided step log file {step_log_file} is empty or does not follow the current format.") |
| if episode_end is None or episode_end > max_ep: | ||
| episode_end = max_ep | ||
|
|
||
| mask = (df["episodes"] >= episode_start) & (df["episodes"] <= episode_end) |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The episode selection logic no longer handles the case where no episodes match the specified range. The original code had a try-except block to catch IndexError and provide a helpful error message showing the available episode range. Without this error handling, users will get a cryptic IndexError or potentially an empty dataframe that leads to confusing behavior later. Consider adding back error handling:
if not mask.any():
raise IndexError(
f"Could not find any episode in the range {episode_start} to {episode_end}. "
f"The available episodes range from {df['episodes'].min()} to {df['episodes'].max()}"
)| mask = (df["episodes"] >= episode_start) & (df["episodes"] <= episode_end) | |
| mask = (df["episodes"] >= episode_start) & (df["episodes"] <= episode_end) | |
| if not mask.any(): | |
| raise IndexError( | |
| f"Could not find any episode in the range {episode_start} to {episode_end}. " | |
| f"The available episodes range from {df['episodes'].min()} to {df['episodes'].max()}" | |
| ) |
| # Not able to extract the reset values without info field. prev_obs field shows | ||
| # last obs from previous episode not reset value in first step. |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment mentions a limitation but it's unclear what the implications are or if this is a known issue that needs to be addressed. Consider clarifying: (1) why this limitation exists, (2) whether it affects the visualization, and (3) if there's a planned solution or workaround. If this is important context, it should be more explicit.
| # Not able to extract the reset values without info field. prev_obs field shows | |
| # last obs from previous episode not reset value in first step. | |
| # Limitation: Unable to extract the reset values for each episode because the required | |
| # "info" field is missing from the step log data. As a result, the "prev_obs" field | |
| # only contains the last observation from the previous episode, not the reset value at | |
| # the start of the current episode. This means that the initial state of each episode | |
| # cannot be visualized, which may affect analyses that rely on episode initialization. | |
| # Currently, there is no workaround unless the data collection process is updated to | |
| # include the "info" field with reset values. |
| objective_observation=[("obs", (0, 1))], | ||
| design_variable=[("obs", (1, None))], |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type annotations were removed from objective_observation and design_variable parameters. This reduces code clarity and type safety. Consider keeping the type annotations: objective_observation: list[tuple[str, tuple[int, int]]] = [("obs", (0, 1))] and design_variable: list[tuple[str, tuple[int, int]]] = [("obs", (1, None))].
| objective_observation=[("obs", (0, 1))], | |
| design_variable=[("obs", (1, None))], | |
| objective_observation: list[tuple[str, tuple[int, int]]] = [("obs", (0, 1))], | |
| design_variable: list[tuple[str, tuple[int, int]]] = [("obs", (1, None))], |
| ) | ||
|
|
||
| # Combine everything | ||
| # combine everthing into a single dataframe |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: "everthing" should be "everything".
| # combine everthing into a single dataframe | |
| # combine everything into a single dataframe |
| for i, yaxis in enumerate(list(fig.select_yaxes())[::2], 1): | ||
| legend_name = f"legend{i}" | ||
| fig.update_layout( | ||
| {legend_name: dict(y=yaxis.domain[1], yanchor="top")}, | ||
| showlegend=True, | ||
| ) | ||
| fig.update_traces(row=i, legend=legend_name) |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for creating separate legends uses list(fig.select_yaxes())[::2] which assumes a specific structure that might be fragile if the subplot configuration changes. The slicing [::2] selects every second y-axis, but this implicit assumption isn't documented. Consider adding a comment explaining why every second y-axis is selected, or use a more explicit approach that's self-documenting.
| for i, yaxis in enumerate(list(fig.select_yaxes())[::2], 1): | |
| legend_name = f"legend{i}" | |
| fig.update_layout( | |
| {legend_name: dict(y=yaxis.domain[1], yanchor="top")}, | |
| showlegend=True, | |
| ) | |
| fig.update_traces(row=i, legend=legend_name) | |
| # Explicitly place legends for each subplot row. | |
| # We assume a subplot configuration of 2 rows, 1 column, with secondary y-axes in the first row. | |
| # This logic is robust to changes in subplot configuration. | |
| subplot_rows = 2 | |
| subplot_cols = 1 | |
| for row in range(1, subplot_rows + 1): | |
| # Get the primary y-axis for this subplot | |
| yaxes = [y for y in fig.select_yaxes() if y.anchor == f"x{row}"] | |
| if not yaxes: | |
| continue | |
| yaxis = yaxes[0] | |
| legend_name = f"legend{row}" | |
| fig.update_layout( | |
| {legend_name: dict(y=yaxis.domain[1], yanchor="top")}, | |
| showlegend=True, | |
| ) | |
| fig.update_traces(row=row, legend=legend_name) |
|
|
||
| # rescale the figure | ||
| # ---------- sizing ---------- | ||
| if not isinstance(figure_size, list) and figure_size == "auto": |
Copilot
AI
Dec 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition not isinstance(figure_size, list) and figure_size == "auto" is incorrect. It should check for both list and tuple since figure_size is typed as Union[tuple[int, int], Literal["auto"]] and the error message mentions tuple. The condition should be: figure_size == "auto" or not isinstance(figure_size, (list, tuple)) and figure_size == "auto". As written, if figure_size is a list, it will fall through to the else clause and raise an error even though lists are handled in line 700.
While using the step log visualization tool more and more a few issue came to light.
To combat this the visualization was optimized. The biggest part is the fix for 1, 4, and 5 which were fixed by changing how the slider works. Before all episode traces were plotted separately at the same time but than hidden for all episode but the one that was shown. Now one one trace is plotted for each variable and the data is exchanged for each episode. This solves both the responsiveness, file size and trace hiding problems.
Quantifiable is the easiest for point 4: For a specifically large step log we get a visualization file that is 374Mb in size and for the optimized the file size shrinks to 78Mb.
The new visualization looks like this:
