-
Notifications
You must be signed in to change notification settings - Fork 495
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededtestingThis issue is about improving testingThis issue is about improving testing
Description
The new models ("standard_v2_x" and "standard_v3_0") supports 200+ content types: https://github.com/google/magika/tree/main/assets/models/standard_v3_0/README.md
Ideally, we have at least one "basic sample" for each of the supported content types (See /tests_data/basic/*).
This issue acts as a call for action -- external help is very welcome!
Important aspects to keep in mind:
- Content types for which we have no samples yet should be prioritized. Among these, prioritize more common content types rather than niche ones.
- The "basic" test samples (in the
tests_data/basic/<content_type>/*) are supposed to be "easy to recognize". In other words, the goal for these samples is to check that the model does a reasonable job with clear-cut samples, rather than corner-cases. - It's OK to group a bunch of test cases in a single PR.
- The PR should state the origin of each sample.
- The samples should NOT be taken from existing projects / online resources (in these settings, it would be very challenging to properly document the origin of these files); they should be manually written/created by the PR author.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededtestingThis issue is about improving testingThis issue is about improving testing