-
Notifications
You must be signed in to change notification settings - Fork 359
Description
Feature Description
Hi JPlag team,
First of all, thank you for developing and maintaining such a powerful and widely-used code similarity detection tool!
I'm currently using JPlag to detect code reuse or potential plagiarism in a large codebase (e.g., a course with many student submissions over multiple semesters). I have a feature request / question regarding incremental comparison.
Currently, JPlag requires all submissions (both historical and new) to be provided together in a single run for comparison. This works well for batch processing, but becomes inefficient when:
The historical codebase (e.g., past submissions) is very large.
New submissions arrive incrementally (e.g., weekly assignments).
We want to avoid re-parsing and re-processing the entire historical dataset every time.
🚀 Feature Request:
Is it possible to support a mode where:
A persistent reference repository or fingerprint index can be pre-built from a large codebase and stored on local disk.
For future checks, users can submit only new code files, and JPlag compares them against the pre-built index on disk without requiring the full original set.
This would significantly improve performance and usability in long-term or large-scale deployments.
Use Case
🔍 Alternatives / Workarounds:
I understand this may require substantial changes to the current architecture. As a workaround, I'm currently merging new submissions with the full historical set before running JPlag — but this becomes slow and resource-intensive over time.
Are there any plans or existing tools/plugins that support such a capability? Or would you consider this as a future enhancement?
Thank you for your time and consideration!
Best regards,
ElgebarOri