Skip to content

Output visualizer #296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from
Open

Conversation

elzorroartico
Copy link

No description provided.

@Matistjati
Copy link
Contributor

Matistjati commented Apr 11, 2025

This is not ready to be merged. Known issues:

  • Incorrect arguments when calling visualizer
  • The output visualizer uses pillow, which is not available to problemtools (? too tired to check rn). I don't think we want it as a dependency. The alternative is to include the library manually, but that's in the magnitude of thousands of lines of source code, which is not nice either. Perhaps we could generate raw images RGB and use imagemagick to convert to png? I haven't yet investigated if ImageMagick is sufficient for our SVG sanitization demands, will take a look tomorrow.
  • SVGs are not sanitized. We may want to disable them until we have SVG sanitization up and running?

Some added thoughts:

  • Before this PR, the feedback dir did not persist outside of the function outputvalidator.validate. The current system is that you can pass optionally a feedback directory to it, which it won't delete. Each submission on each testcase gets its own feedback dir in the root of the "problem verification directory". On new submissions, we delete old ones if they exist. The reason for this architecture is that it's useful for both multipass and output visualizers.
  • What mem/time limits should we place on it? Currently, the spec doesn't comment on this. IMO, reasonable defaults is enough. Given that we produce PNGs, maybe 5-10s TL and 1GB ram? IMO, 1GB ram suffices; 1920x1080 raw is ~6MB.
  • Currently, we don't run the output visualizer on the secret test data. It feels unnecessary, but may be nice for debugging?
  • We always generate the images even if we don't pass the flag to save them to disk in order to sanity check them (check for proper header and ensure the output visualizer returns exit code 0). I think this is good.
  • The folder structure when it outputs the images is arbitrary and there may be a better one
  • We don't delete old folder structures of outputted images, because that feels scary and may not be what the user wants
  • There's also some discussion over at Output visualizer on interactive/multipass problem problem-package-format#437

Copy link
Contributor

@gkreitz gkreitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look before I noticed there was already feedback from @Matistjati that needs to be addressed, so I'm holding off looking in more detail. Luckily, I think the comments I added do not overlap with his.


def main():
if not len(sys.argv) == 3:
print("Usage: output_visualizer.py <submission_output> <feedback_dir>")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Example of created file structure when run on different:
different_images
├── different.c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I have different.c both as an accepted and as a wrong_answer submission?

self.warning(f'Wrong amount of visualizers. \nExcpected: 1\nActual: {len(self._visualizer)}')

# Checks if a file's extension is allowed, and if so validates its header
def check_is_valid_image(self, file) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are libraries and standard tools to do this type of file type detection. I'd prefer we use them, if possible, instead of re-implementing that logic ourselves. There's a python package called python-magic. (An uglier option would be to just use file --mime-type from the shell, but that feels hacky).

@gkreitz
Copy link
Contributor

gkreitz commented Apr 12, 2025

The output visualizer uses pillow, which is not available to problemtools (? too tired to check rn). I don't think we want it as a dependency. The alternative is to include the library manually, but that's in the magnitude of thousands of lines of source code, which is not nice either.

This feels like a problem in the standard to me. How is one even intended to sanely write an output visualizer? The only option I can see is to dump a huge dependency in the validator directory. And, as you point out, we definitely do not want to do that in our examples folder.

Perhaps we could generate raw images RGB and use imagemagick to convert to png?

To do this safely, you'd need to include the imagemagick source (i.e., same issue as above with using PIL). Sure, problemtools may ensure it's there as a dependency, but there's no such guarantee for sandboxes on judge systems (and at some point, problemtools should also sandbox its runs).

Examples included in problemtools should be good examples for users to base stuff off of, so it's not a good sign for the standard if we can't come up with a good example for an output visualizer.

@Matistjati
Copy link
Contributor

To do this safely, you'd need to include the imagemagick source (i.e., same issue as above with using PIL). Sure, problemtools may ensure it's there as a dependency, but there's no such guarantee for sandboxes on judge systems (and at some point, problemtools should also sandbox its runs).

Good point, the examples should actually work outside of problemtools.

@elzorroartico I've thought more about it, and I think we do want to run the output visualizers on the secret data. Currently, the structure is basically {shortname}_images/{submission_name}/{testcase}/res.png. It's pretty natural then to run the visualizer on the data in secret and sample, and replace {submission_name} with either secret or sample. So for example, we could have hello_images/secret/01/vis.png.

I propose changes in the {submission_name} part. As Gunnar pointed out, we get problems if multiple submissions are named the same thing. I think the most reasonable solution is to include the submission folder it's in, so for example, accepted.joshua.cpp and wrong_answer.joshua.cpp would create unique folders.

In general, I also think we need to be more careful with filenames. Some annoying examples:

  • Test data that has deep levels of nesting. Test cases can be nested arbitrarily deeply in folders
  • Test cases with the same name across different subgroups
  • Submissions consisting of a folder. Especially one named secret or sample....

Perhaps the solution is to prepend a unique counter to all filenames? Although at that point, the order is going to be nondeterministic due to multithreading. I can't think of a better solution than to keep {submission_name} as I proposed above, and change {testcase} to be the full path relative either sample or secret. I think Fredrik considered requiring all testcase names to be unique, I'll make a ticket later today. But I don't think we can count on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants