Skip to content

Can JPlag Support for Incremental Comparison with a Persistent Reference Repository (e.g., Pre-built Fingerprint Index stored on disk)? #2539

@ElgebarOri

Description

@ElgebarOri

Feature Description

Hi JPlag team,

First of all, thank you for developing and maintaining such a powerful and widely-used code similarity detection tool!

I'm currently using JPlag to detect code reuse or potential plagiarism in a large codebase (e.g., a course with many student submissions over multiple semesters). I have a feature request / question regarding incremental comparison.

Currently, JPlag requires all submissions (both historical and new) to be provided together in a single run for comparison. This works well for batch processing, but becomes inefficient when:

The historical codebase (e.g., past submissions) is very large.
New submissions arrive incrementally (e.g., weekly assignments).
We want to avoid re-parsing and re-processing the entire historical dataset every time.
🚀 Feature Request:
Is it possible to support a mode where:

A persistent reference repository or fingerprint index can be pre-built from a large codebase and stored on local disk.
For future checks, users can submit only new code files, and JPlag compares them against the pre-built index on disk without requiring the full original set.
This would significantly improve performance and usability in long-term or large-scale deployments.

Use Case

🔍 Alternatives / Workarounds:
I understand this may require substantial changes to the current architecture. As a workaround, I'm currently merging new submissions with the full historical set before running JPlag — but this becomes slow and resource-intensive over time.

Are there any plans or existing tools/plugins that support such a capability? Or would you consider this as a future enhancement?

Thank you for your time and consideration!

Best regards,
ElgebarOri

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementIssue/PR that involves features, improvements and other changesquestionA question, so neither a bug nor a enhancement proposal.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions