Skip to content

Future tasks for match frequency-based highlight extraction #2595

@tsaglam

Description

@tsaglam

As soon as possible (after #2563 is merged):

  • Remove less-performant strategies, keeping complete matches and windowing.
  • Remove superfluous weighing function.
  • Check and adapt default parameters if necessary.
  • Parallelize the frequency calculation1.
  • Check if FrequencyAnalysisOptions can be simplified with the record builder.
  • Deduplicate FrequencyStrategy and FrequencyAnalysisStrategy by deleting the enum.
  • Dissolve FrequencyUtil if possible
  • Dissolve MatchFrequency
  • Ensure it only runs when the flag is set (also only creates overhead/objects when flag enabled)
  • ensure if it actually overrides the result (and ensure in JPlag.run the syntax makes that clear)
  • Try to make Submission.getSimilarityDivisor() non-public again (if possible)
  • Check if renaming bug has been resolved (isFrequencyAnalysisEnabled)
  • Check if public method should be private (e.g. MatchWeightCalculator.weightAllMatches is only called from tests)

In the future:

  • Ensure compatibility with asymmetric matches (currently, tokens of first submissions in a match are used)
  • Instead of enabling frequency analysis per CLI flag, it could be enabled by default but fed into a separate similarity metric (this is only viable if the performance is fine for large datasets).
  • Especially rare matches could be visually highlighted in the comparison view to draw attention during human inspection.

Footnotes

  1. should be trivial for operations that work on a on-match or on-comparison basis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementIssue/PR that involves features, improvements and other changesmajorMajor issue/feature/contribution/changereport-viewerPR / Issue deals (partly) with the report viewer and thus involves web-dev technologies

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions