Skip to content

refactor(Taxonomy): replace TAXONOMY_URLS with TAXONOMY_MAPPING and a single method to build the url#475

Merged
raphodn merged 5 commits intodevelopfrom
raphodn/get-taxonomy-oxf-refactor
May 7, 2026
Merged

refactor(Taxonomy): replace TAXONOMY_URLS with TAXONOMY_MAPPING and a single method to build the url#475
raphodn merged 5 commits intodevelopfrom
raphodn/get-taxonomy-oxf-refactor

Conversation

@raphodn
Copy link
Copy Markdown
Member

@raphodn raphodn commented Apr 25, 2026

Description

Following #466

Solution

  • TaxonomyType type: add a new dataset_filename key & dataset_path property
  • replace TAXONOMY_URLS with TAXONOMY_MAPPING
  • new _generate_file_path and use both TAXONOMY_MAPPING & TaxonomyType.dataset_path

Related issue(s)

origin = "origin"
language = "language"
other_nutritional_substance = "other_nutritional_substance"
dataset_filename: str
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to avoid mypy error

src/openfoodfacts/types.py:901: error: "TaxonomyType" has no attribute "dataset_filename"  [attr-defined]

Copy link
Copy Markdown
Contributor

@Freso Freso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment/suggestion, but overall LGTM!

Comment thread src/openfoodfacts/types.py Outdated
Comment on lines +896 to +930
def __new__(cls, value: str, dataset_filename: str):
"""
Override __new__ to allow storing the dataset filename
associated with each taxonomy type.
"""
obj = str.__new__(cls, value)
obj._value_ = value
obj.dataset_filename = dataset_filename
return obj

category = ("category", "categories.full.json")
ingredient = ("ingredient", "ingredients.full.json")
label = ("label", "labels.full.json")
brand = ("brand", "brands.full.json")
packaging_shape = ("packaging_shape", "packaging_shapes.full.json")
packaging_material = ("packaging_material", "packaging_materials.full.json")
packaging_recycling = ("packaging_recycling", "packaging_recycling.full.json")
country = ("country", "countries.full.json")
store = ("store", "stores.full.json")
nova_group = ("nova_group", "nova_groups.full.json")
packaging = ("packaging", "packaging.full.json")
additive = ("additive", "additives.full.json")
vitamin = ("vitamin", "vitamins.full.json")
mineral = ("mineral", "minerals.full.json")
amino_acid = ("amino_acid", "amino_acids.full.json")
nucleotide = ("nucleotide", "nucleotides.full.json")
allergen = ("allergen", "allergens.full.json")
state = ("state", "states.full.json")
data_quality = ("data_quality", "data_quality.full.json")
origin = ("origin", "origins.full.json")
language = ("language", "languages.full.json")
other_nutritional_substance = (
"other_nutritional_substance",
"other_nutritional_substances.full.json",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense/work to flip these two block around?

Suggested change
def __new__(cls, value: str, dataset_filename: str):
"""
Override __new__ to allow storing the dataset filename
associated with each taxonomy type.
"""
obj = str.__new__(cls, value)
obj._value_ = value
obj.dataset_filename = dataset_filename
return obj
category = ("category", "categories.full.json")
ingredient = ("ingredient", "ingredients.full.json")
label = ("label", "labels.full.json")
brand = ("brand", "brands.full.json")
packaging_shape = ("packaging_shape", "packaging_shapes.full.json")
packaging_material = ("packaging_material", "packaging_materials.full.json")
packaging_recycling = ("packaging_recycling", "packaging_recycling.full.json")
country = ("country", "countries.full.json")
store = ("store", "stores.full.json")
nova_group = ("nova_group", "nova_groups.full.json")
packaging = ("packaging", "packaging.full.json")
additive = ("additive", "additives.full.json")
vitamin = ("vitamin", "vitamins.full.json")
mineral = ("mineral", "minerals.full.json")
amino_acid = ("amino_acid", "amino_acids.full.json")
nucleotide = ("nucleotide", "nucleotides.full.json")
allergen = ("allergen", "allergens.full.json")
state = ("state", "states.full.json")
data_quality = ("data_quality", "data_quality.full.json")
origin = ("origin", "origins.full.json")
language = ("language", "languages.full.json")
other_nutritional_substance = (
"other_nutritional_substance",
"other_nutritional_substances.full.json",
)
category = ("category", "categories.full.json")
ingredient = ("ingredient", "ingredients.full.json")
label = ("label", "labels.full.json")
brand = ("brand", "brands.full.json")
packaging_shape = ("packaging_shape", "packaging_shapes.full.json")
packaging_material = ("packaging_material", "packaging_materials.full.json")
packaging_recycling = ("packaging_recycling", "packaging_recycling.full.json")
country = ("country", "countries.full.json")
store = ("store", "stores.full.json")
nova_group = ("nova_group", "nova_groups.full.json")
packaging = ("packaging", "packaging.full.json")
additive = ("additive", "additives.full.json")
vitamin = ("vitamin", "vitamins.full.json")
mineral = ("mineral", "minerals.full.json")
amino_acid = ("amino_acid", "amino_acids.full.json")
nucleotide = ("nucleotide", "nucleotides.full.json")
allergen = ("allergen", "allergens.full.json")
state = ("state", "states.full.json")
data_quality = ("data_quality", "data_quality.full.json")
origin = ("origin", "origins.full.json")
language = ("language", "languages.full.json")
other_nutritional_substance = (
"other_nutritional_substance",
"other_nutritional_substances.full.json",
)
def __new__(cls, value: str, dataset_filename: str):
"""
Override __new__ to allow storing the dataset filename
associated with each taxonomy type.
"""
obj = str.__new__(cls, value)
obj._value_ = value
obj.dataset_filename = dataset_filename
return obj

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's a good idea

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done here: 09a4cba

Comment thread src/openfoodfacts/types.py Outdated
other_nutritional_substance = "other_nutritional_substance"
dataset_filename: str

def __new__(cls, value: str, dataset_filename: str):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raphodn raphodn requested a review from raphael0202 May 6, 2026 12:58
Base automatically changed from raphodn/pre-commit-ruff-fix to raphodn/dependencies-pre-commit May 7, 2026 07:48
Base automatically changed from raphodn/dependencies-pre-commit to develop May 7, 2026 07:50
@teolemon teolemon moved this from Todo to In Progress in 🧃🛠️ Open Food Facts SDK tracking May 7, 2026
@raphodn raphodn force-pushed the raphodn/get-taxonomy-oxf-refactor branch from 9a3d0ac to 44df653 Compare May 7, 2026 08:27
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 7, 2026

@raphodn raphodn merged commit 0f83128 into develop May 7, 2026
10 checks passed
@raphodn raphodn deleted the raphodn/get-taxonomy-oxf-refactor branch May 7, 2026 09:05
origin = "origin"
language = "language"
other_nutritional_substance = "other_nutritional_substance"
dataset_filename: str
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raphodn I'm really not a fan of storing additional data in an Enum. TaxonomyType is used by other projects such as Robotoff, which don't really care about the dataset_filename. Can we have either a dict mapping TaxonomyType to a dataset_filename?
In order to ensure every TaxonomyType has an associated dataset_filename, we could add a unit test that fails in case one is missing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah i shouldn't have merged. Can i force push and drop the commit on the develop branch and reopen the PR ?

Copy link
Copy Markdown
Member Author

@raphodn raphodn May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be better if every filename mapped perfectly to the enum, there's at least 1 plural exception 😅

  • no plural: packaging_recycling, packaging, data_quality
  • different plural: category, country

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok did a PR instead: #479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants