refactor(Taxonomy): replace TAXONOMY_URLS with TAXONOMY_MAPPING and a single method to build the url by raphodn · Pull Request #475 · openfoodfacts/openfoodfacts-python

raphodn · 2026-04-25T11:16:08Z

Description

Following #466

Solution

TaxonomyType type: add a new dataset_filename key & dataset_path property
replace TAXONOMY_URLS with TAXONOMY_MAPPING
new _generate_file_path and use both TAXONOMY_MAPPING & TaxonomyType.dataset_path

Related issue(s)

Extend get_taxonomy to other flavors (obf, opf, opff) #465

raphodn · 2026-04-25T11:19:17Z

-    origin = "origin"
-    language = "language"
-    other_nutritional_substance = "other_nutritional_substance"
+    dataset_filename: str


added to avoid mypy error

src/openfoodfacts/types.py:901: error: "TaxonomyType" has no attribute "dataset_filename" [attr-defined]

Freso

One comment/suggestion, but overall LGTM!

Freso · 2026-04-25T11:56:47Z

+    def __new__(cls, value: str, dataset_filename: str):
+        """
+        Override __new__ to allow storing the dataset filename
+        associated with each taxonomy type.
+        """
+        obj = str.__new__(cls, value)
+        obj._value_ = value
+        obj.dataset_filename = dataset_filename
+        return obj
+
+    category = ("category", "categories.full.json")
+    ingredient = ("ingredient", "ingredients.full.json")
+    label = ("label", "labels.full.json")
+    brand = ("brand", "brands.full.json")
+    packaging_shape = ("packaging_shape", "packaging_shapes.full.json")
+    packaging_material = ("packaging_material", "packaging_materials.full.json")
+    packaging_recycling = ("packaging_recycling", "packaging_recycling.full.json")
+    country = ("country", "countries.full.json")
+    store = ("store", "stores.full.json")
+    nova_group = ("nova_group", "nova_groups.full.json")
+    packaging = ("packaging", "packaging.full.json")
+    additive = ("additive", "additives.full.json")
+    vitamin = ("vitamin", "vitamins.full.json")
+    mineral = ("mineral", "minerals.full.json")
+    amino_acid = ("amino_acid", "amino_acids.full.json")
+    nucleotide = ("nucleotide", "nucleotides.full.json")
+    allergen = ("allergen", "allergens.full.json")
+    state = ("state", "states.full.json")
+    data_quality = ("data_quality", "data_quality.full.json")
+    origin = ("origin", "origins.full.json")
+    language = ("language", "languages.full.json")
+    other_nutritional_substance = (
+        "other_nutritional_substance",
+        "other_nutritional_substances.full.json",
+    )


Would it make sense/work to flip these two block around?

Suggested change

def __new__(cls, value: str, dataset_filename: str):

"""

Override __new__ to allow storing the dataset filename

associated with each taxonomy type.

"""

obj = str.__new__(cls, value)

obj._value_ = value

obj.dataset_filename = dataset_filename

return obj

category = ("category", "categories.full.json")

ingredient = ("ingredient", "ingredients.full.json")

label = ("label", "labels.full.json")

brand = ("brand", "brands.full.json")

packaging_shape = ("packaging_shape", "packaging_shapes.full.json")

packaging_material = ("packaging_material", "packaging_materials.full.json")

packaging_recycling = ("packaging_recycling", "packaging_recycling.full.json")

country = ("country", "countries.full.json")

store = ("store", "stores.full.json")

nova_group = ("nova_group", "nova_groups.full.json")

packaging = ("packaging", "packaging.full.json")

additive = ("additive", "additives.full.json")

vitamin = ("vitamin", "vitamins.full.json")

mineral = ("mineral", "minerals.full.json")

amino_acid = ("amino_acid", "amino_acids.full.json")

nucleotide = ("nucleotide", "nucleotides.full.json")

allergen = ("allergen", "allergens.full.json")

state = ("state", "states.full.json")

data_quality = ("data_quality", "data_quality.full.json")

origin = ("origin", "origins.full.json")

language = ("language", "languages.full.json")

other_nutritional_substance = (

"other_nutritional_substance",

"other_nutritional_substances.full.json",

)

category = ("category", "categories.full.json")

ingredient = ("ingredient", "ingredients.full.json")

label = ("label", "labels.full.json")

brand = ("brand", "brands.full.json")

packaging_shape = ("packaging_shape", "packaging_shapes.full.json")

packaging_material = ("packaging_material", "packaging_materials.full.json")

packaging_recycling = ("packaging_recycling", "packaging_recycling.full.json")

country = ("country", "countries.full.json")

store = ("store", "stores.full.json")

nova_group = ("nova_group", "nova_groups.full.json")

packaging = ("packaging", "packaging.full.json")

additive = ("additive", "additives.full.json")

vitamin = ("vitamin", "vitamins.full.json")

mineral = ("mineral", "minerals.full.json")

amino_acid = ("amino_acid", "amino_acids.full.json")

nucleotide = ("nucleotide", "nucleotides.full.json")

allergen = ("allergen", "allergens.full.json")

state = ("state", "states.full.json")

data_quality = ("data_quality", "data_quality.full.json")

origin = ("origin", "origins.full.json")

language = ("language", "languages.full.json")

other_nutritional_substance = (

"other_nutritional_substance",

"other_nutritional_substances.full.json",

)

def __new__(cls, value: str, dataset_filename: str):

"""

Override __new__ to allow storing the dataset filename

associated with each taxonomy type.

"""

obj = str.__new__(cls, value)

obj._value_ = value

obj.dataset_filename = dataset_filename

return obj

yes that's a good idea

done here: 09a4cba

raphodn · 2026-04-25T13:18:55Z

-    other_nutritional_substance = "other_nutritional_substance"
+    dataset_filename: str
+
+    def __new__(cls, value: str, dataset_filename: str):


logic taken from https://docs.python.org/3/howto/enum.html#when-to-use-new-vs-init

sonarqubecloud · 2026-05-07T09:03:42Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

raphael0202 · 2026-05-07T11:29:03Z

-    origin = "origin"
-    language = "language"
-    other_nutritional_substance = "other_nutritional_substance"
+    dataset_filename: str


@raphodn I'm really not a fan of storing additional data in an Enum. TaxonomyType is used by other projects such as Robotoff, which don't really care about the dataset_filename. Can we have either a dict mapping TaxonomyType to a dataset_filename?
In order to ensure every TaxonomyType has an associated dataset_filename, we could add a unit test that fails in case one is missing.

Aaah i shouldn't have merged. Can i force push and drop the commit on the develop branch and reopen the PR ?

It'd be better if every filename mapped perfectly to the enum, there's at least 1 plural exception 😅

no plural: packaging_recycling, packaging, data_quality

different plural: category, country

ok did a PR instead: #479

raphodn requested a review from a team as a code owner April 25, 2026 11:16

github-project-automation Bot added this to 🧃🛠️ Open Food Facts SDK tracking and 🐍 Python SDK - Keep on par with the API Apr 25, 2026

github-project-automation Bot moved this to In progress in 🐍 Python SDK - Keep on par with the API Apr 25, 2026

github-project-automation Bot moved this to Todo in 🧃🛠️ Open Food Facts SDK tracking Apr 25, 2026

github-actions Bot assigned raphodn Apr 25, 2026

raphodn mentioned this pull request Apr 25, 2026

feat(Taxonomy): allow fetching taxonomies from other flavors (obf, opff, opf) #466

Merged

raphodn commented Apr 25, 2026

View reviewed changes

Freso approved these changes Apr 25, 2026

View reviewed changes

raphodn commented Apr 25, 2026

View reviewed changes

raphodn requested a review from raphael0202 May 6, 2026 12:58

Base automatically changed from raphodn/pre-commit-ruff-fix to raphodn/dependencies-pre-commit May 7, 2026 07:48

Base automatically changed from raphodn/dependencies-pre-commit to develop May 7, 2026 07:50

teolemon added the Taxonomies label May 7, 2026

teolemon moved this from Todo to In Progress in 🧃🛠️ Open Food Facts SDK tracking May 7, 2026

raphodn added 4 commits May 7, 2026 10:27

TaxonomyType: add dataset_filename key & dataset_path property

e2d9473

start simplifying TAXONOMY_URLS

fdf7f8a

new _generate_file_path

eb8909d

fix mypy

44df653

raphodn force-pushed the raphodn/get-taxonomy-oxf-refactor branch from 9a3d0ac to 44df653 Compare May 7, 2026 08:27

put __new__ below

09a4cba

raphodn merged commit 0f83128 into develop May 7, 2026
10 checks passed

raphodn deleted the raphodn/get-taxonomy-oxf-refactor branch May 7, 2026 09:05

github-project-automation Bot moved this from In Progress to Done in 🧃🛠️ Open Food Facts SDK tracking May 7, 2026

github-project-automation Bot moved this from In progress to Done in 🐍 Python SDK - Keep on par with the API May 7, 2026

openfoodfacts-bot mentioned this pull request May 7, 2026

chore(develop): release 5.1.0 #471

Open

raphael0202 reviewed May 7, 2026

View reviewed changes

raphodn mentioned this pull request May 7, 2026

refactor(Taxonomy): for dataset filepath, use mapping instead of overriding enum #479

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(Taxonomy): replace TAXONOMY_URLS with TAXONOMY_MAPPING and a single method to build the url#475

refactor(Taxonomy): replace TAXONOMY_URLS with TAXONOMY_MAPPING and a single method to build the url#475
raphodn merged 5 commits intodevelopfrom
raphodn/get-taxonomy-oxf-refactor

raphodn commented Apr 25, 2026 •

edited

Loading

Uh oh!

raphodn Apr 25, 2026

Uh oh!

Freso left a comment

Uh oh!

Freso Apr 25, 2026

Uh oh!

raphodn Apr 25, 2026

Uh oh!

raphodn May 7, 2026

Uh oh!

raphodn Apr 25, 2026

Uh oh!

sonarqubecloud Bot commented May 7, 2026

Uh oh!

Uh oh!

raphael0202 May 7, 2026

Uh oh!

raphodn May 7, 2026

Uh oh!

raphodn May 7, 2026 •

edited

Loading

Uh oh!

raphodn May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

raphodn commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Solution

Related issue(s)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Freso left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 7, 2026

Quality Gate passed

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raphodn May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

raphodn commented Apr 25, 2026 •

edited

Loading

raphodn May 7, 2026 •

edited

Loading