feat: Unit standardization / conversion#7121
feat: Unit standardization / conversion#7121michael-genson wants to merge 33 commits intomealie-nextfrom
Conversation
| export type StandardizedUnitType | ||
| = | "fluid_ounce" | ||
| | "cup" | ||
| | "ounce" | ||
| | "pound" | ||
| | "milliliter" | ||
| | "liter" | ||
| | "gram" | ||
| | "kilogram"; |
There was a problem hiding this comment.
Our arbitrary list of unit types. Pint understands all of these.
There's no difference in defining a unit as 1 * liter vs 1000 * milliliter other than user preference.
| def populate_standards() -> None: | ||
| bind = op.get_bind() | ||
|
|
||
| session = orm.Session(bind) | ||
|
|
||
| # We aren't using most of the functionality of this class, so we pass dummy args | ||
| repo = RepositoryUnit(None, None, None, None, group_id=None) # type: ignore | ||
|
|
||
| stmt = sa.select(IngredientUnitModel) | ||
| units = session.execute(stmt).scalars().all() | ||
| if not units: | ||
| return | ||
|
|
||
| # Manually build repo._standardized_unit_map with all locales | ||
| repo._standardized_unit_map = {} | ||
| for locale in LOCALE_CONFIG: | ||
| locale_file = IngredientUnitsSeeder.get_file(locale) | ||
| for unit_key, unit in IngredientUnitsSeeder.load_file(locale_file).items(): | ||
| for prop in ["name", "plural_name", "abbreviation"]: | ||
| val = unit.get(prop) | ||
| if val and isinstance(val, str): | ||
| repo._standardized_unit_map[val.strip().lower()] = unit_key | ||
|
|
||
| for unit in units: | ||
| unit_data = { | ||
| "name": unit.name, | ||
| "plural_name": unit.plural_name, | ||
| "abbreviation": unit.abbreviation, | ||
| "plural_abbreviation": unit.plural_abbreviation, | ||
| } | ||
|
|
||
| standardized_data = repo._add_standardized_unit(unit_data) | ||
| std_q = standardized_data.get("standard_quantity") | ||
| std_u = standardized_data.get("standard_unit") | ||
| if std_q and std_u: | ||
| logger.info(f"Found unit '{unit.name}', which is standardized as '{std_q} * {std_u}'") | ||
| unit.standard_quantity = std_q | ||
| unit.standard_unit = std_u | ||
|
|
||
| session.commit() | ||
| session.close() |
There was a problem hiding this comment.
Check all units against all languages and automatically standardize known ones. This assumes there isn't an identical string between languages with different definitions. I think this is a fair assumption (and users can manually fix if needed).
| @property | ||
| def standardized_unit_map(self) -> dict[str, str]: | ||
| """A map of potential known units to its standardized name in our seed data""" | ||
|
|
||
| if self._standardized_unit_map is None: | ||
| from .seed.seeders import IngredientUnitsSeeder | ||
|
|
||
| ctx = get_locale_context() | ||
| if ctx: | ||
| locale = ctx[1].key | ||
| else: | ||
| locale = None | ||
|
|
||
| self._standardized_unit_map = {} | ||
| locale_file = IngredientUnitsSeeder.get_file(locale=locale) | ||
| for unit_key, unit in IngredientUnitsSeeder.load_file(locale_file).items(): | ||
| for prop in ["name", "plural_name", "abbreviation"]: | ||
| val = unit.get(prop) | ||
| if val and isinstance(val, str): | ||
| self._standardized_unit_map[val.strip().lower()] = unit_key | ||
|
|
||
| return self._standardized_unit_map | ||
|
|
There was a problem hiding this comment.
For a given locale, check values against "name", "plural_name" and "abbreviation" from our seed data (our seed data doesn't define a plural abbreviation for some reason).
| match standardized_unit_key: | ||
| case "teaspoon": | ||
| data["standard_quantity"] = 1 / 6 | ||
| data["standard_unit"] = StandardizedUnitType.FLUID_OUNCE | ||
| case "tablespoon": | ||
| data["standard_quantity"] = 1 / 2 | ||
| data["standard_unit"] = StandardizedUnitType.FLUID_OUNCE | ||
| case "cup": | ||
| data["standard_quantity"] = 1 | ||
| data["standard_unit"] = StandardizedUnitType.CUP | ||
| case "fluid-ounce": | ||
| data["standard_quantity"] = 1 | ||
| data["standard_unit"] = StandardizedUnitType.FLUID_OUNCE | ||
| case "pint": | ||
| data["standard_quantity"] = 2 | ||
| data["standard_unit"] = StandardizedUnitType.CUP | ||
| case "quart": | ||
| data["standard_quantity"] = 4 | ||
| data["standard_unit"] = StandardizedUnitType.CUP | ||
| case "gallon": | ||
| data["standard_quantity"] = 16 | ||
| data["standard_unit"] = StandardizedUnitType.CUP | ||
| case "milliliter": | ||
| data["standard_quantity"] = 1 | ||
| data["standard_unit"] = StandardizedUnitType.MILLILITER | ||
| case "liter": | ||
| data["standard_quantity"] = 1 | ||
| data["standard_unit"] = StandardizedUnitType.LITER | ||
| case "pound": | ||
| data["standard_quantity"] = 1 | ||
| data["standard_unit"] = StandardizedUnitType.POUND | ||
| case "ounce": | ||
| data["standard_quantity"] = 1 | ||
| data["standard_unit"] = StandardizedUnitType.OUNCE | ||
| case "gram": | ||
| data["standard_quantity"] = 1 | ||
| data["standard_unit"] = StandardizedUnitType.GRAM | ||
| case "kilogram": | ||
| data["standard_quantity"] = 1 | ||
| data["standard_unit"] = StandardizedUnitType.KILOGRAM | ||
| case "milligram": | ||
| data["standard_quantity"] = 1 / 1000 | ||
| data["standard_unit"] = StandardizedUnitType.GRAM | ||
| case _: | ||
| continue |
There was a problem hiding this comment.
Big hardcoded standardization logic. These are ultimately the units we automatically standardize.
| @model_validator(mode="after") | ||
| def validate_standardization_fields(self): | ||
| # If one is set, the other must be set. | ||
| # If quantity is <= 0, it's considered not set. | ||
| if not self.standard_unit: | ||
| self.standard_quantity = self.standard_unit = None | ||
| elif not ((self.standard_quantity or 0) > 0): | ||
| self.standard_quantity = self.standard_unit = None | ||
|
|
||
| return self |
There was a problem hiding this comment.
A standard quantity without a unit is useless, and vice versa, so we drop partial definitions.
| @@ -69,7 +81,20 @@ def merge_items( | |||
| Attributes of the `to_item` take priority over the `from_item`, except extras with overlapping keys | |||
| """ | |||
|
|
|||
| to_item.quantity += from_item.quantity | |||
| to_item_unit = to_item.unit or self.data_matcher.units_by_id.get(to_item.unit_id) | |||
| from_item_unit = from_item.unit or self.data_matcher.units_by_id.get(from_item.unit_id) | |||
| if to_item_unit and to_item_unit.standard_unit and from_item_unit and from_item_unit.standard_unit: | |||
| merged_qty, merged_unit = merge_quantity_and_unit( | |||
| from_item.quantity or 0, from_item_unit, to_item.quantity or 0, to_item_unit | |||
| ) | |||
| to_item.quantity = merged_qty | |||
| to_item.unit_id = merged_unit.id | |||
| to_item.unit = merged_unit | |||
|
|
|||
| else: | |||
| # No conversion needed, just sum the quantities | |||
| to_item.quantity += from_item.quantity | |||
|
|
|||
There was a problem hiding this comment.
The only user-facing part of this PR, aside from data management, which merges shopping list items
| from typing import TYPE_CHECKING, Literal, overload | ||
|
|
||
| from pint import Quantity, Unit, UnitRegistry | ||
|
|
||
| if TYPE_CHECKING: | ||
| from mealie.schema.recipe.recipe_ingredient import CreateIngredientUnit | ||
|
|
||
|
|
||
| class UnitNotFound(Exception): | ||
| """Raised when trying to access a unit not found in the unit registry.""" | ||
|
|
||
| def __init__(self, message: str = "Unit not found in unit registry"): | ||
| self.message = message | ||
| super().__init__(self.message) | ||
|
|
||
| def __str__(self): | ||
| return f"{self.message}" | ||
|
|
||
|
|
||
| class UnitConverter: | ||
| def __init__(self): | ||
| self.ureg = UnitRegistry() | ||
|
|
||
| def _resolve_ounce(self, unit_1: Unit, unit_2: Unit) -> tuple[Unit, Unit]: | ||
| """ | ||
| Often times "ounce" is used in place of "fluid ounce" in recipes. | ||
| When trying to convert/combine ounces with a volume, we can assume it should have been a fluid ounce. | ||
| This function will convert ounces to fluid ounces if the other unit is a volume. | ||
| """ | ||
|
|
||
| OUNCE = self.ureg("ounce") | ||
| FL_OUNCE = self.ureg("fluid_ounce") | ||
| VOLUME = "[length] ** 3" | ||
|
|
||
| if unit_1 == OUNCE and unit_2.dimensionality == VOLUME: | ||
| return FL_OUNCE, unit_2 | ||
| if unit_2 == OUNCE and unit_1.dimensionality == VOLUME: | ||
| return unit_1, FL_OUNCE | ||
|
|
||
| return unit_1, unit_2 | ||
|
|
||
| @overload | ||
| def parse(self, unit: str | Unit, strict: Literal[False] = False) -> str | Unit: ... | ||
|
|
||
| @overload | ||
| def parse(self, unit: str | Unit, strict: Literal[True]) -> Unit: ... | ||
|
|
||
| def parse(self, unit: str | Unit, strict: bool = False) -> str | Unit: | ||
| """ | ||
| Parse a string unit into a pint.Unit. | ||
|
|
||
| If strict is False (default), returns a pint.Unit if it exists, otherwise returns the original string. | ||
| If strict is True, raises UnitNotFound instead of returning a string. | ||
| If the input is already a parsed pint.Unit, returns it as-is. | ||
| """ | ||
| if isinstance(unit, Unit): | ||
| return unit | ||
|
|
||
| try: | ||
| return self.ureg(unit).units | ||
| except Exception as e: | ||
| if strict: | ||
| raise UnitNotFound(f"Unit '{unit}' not found in unit registry") from e | ||
| return unit | ||
|
|
||
| def can_convert(self, unit: str | Unit, to_unit: str | Unit) -> bool: | ||
| """Whether or not a given unit can be converted into another unit.""" | ||
|
|
||
| unit = self.parse(unit) | ||
| to_unit = self.parse(to_unit) | ||
|
|
||
| if not (isinstance(unit, Unit) and isinstance(to_unit, Unit)): | ||
| return False | ||
|
|
||
| unit, to_unit = self._resolve_ounce(unit, to_unit) | ||
| return unit.is_compatible_with(to_unit) | ||
|
|
||
| def convert(self, quantity: float, unit: str | Unit, to_unit: str | Unit) -> tuple[float, Unit]: | ||
| """ | ||
| Convert a quantity and a unit into another unit. | ||
|
|
||
| Returns tuple[quantity, unit] | ||
| """ | ||
|
|
||
| unit = self.parse(unit, strict=True) | ||
| to_unit = self.parse(to_unit, strict=True) | ||
| unit, to_unit = self._resolve_ounce(unit, to_unit) | ||
|
|
||
| qty = quantity * unit | ||
| converted = qty.to(to_unit) | ||
| return float(converted.magnitude), converted.units | ||
|
|
||
| def merge(self, quantity_1: float, unit_1: str | Unit, quantity_2: float, unit_2: str | Unit) -> tuple[float, Unit]: | ||
| """Merge two quantities together""" | ||
|
|
||
| unit_1 = self.parse(unit_1, strict=True) | ||
| unit_2 = self.parse(unit_2, strict=True) | ||
| unit_1, unit_2 = self._resolve_ounce(unit_1, unit_2) | ||
|
|
||
| q1 = quantity_1 * unit_1 | ||
| q2 = quantity_2 * unit_2 | ||
|
|
||
| out: Quantity = q1 + q2 | ||
| return float(out.magnitude), out.units | ||
|
|
||
|
|
||
| def merge_quantity_and_unit[T: CreateIngredientUnit]( | ||
| qty_1: float, unit_1: T, qty_2: float, unit_2: T | ||
| ) -> tuple[float, T]: | ||
| """ | ||
| Merge a quantity and unit. | ||
|
|
||
| Returns tuple[quantity, unit] | ||
| """ | ||
|
|
||
| if not (unit_1.standard_quantity and unit_1.standard_unit and unit_2.standard_quantity and unit_2.standard_unit): | ||
| raise ValueError("Both units must contain standardized unit data") | ||
|
|
||
| PINT_UNIT_1_TXT = "_mealie_unit_1" | ||
| PINT_UNIT_2_TXT = "_mealie_unit_2" | ||
|
|
||
| uc = UnitConverter() | ||
|
|
||
| # pre-process units to account for ounce -> fluid_ounce conversion | ||
| unit_1_standard = uc.parse(unit_1.standard_unit, strict=True) | ||
| unit_2_standard = uc.parse(unit_2.standard_unit, strict=True) | ||
| unit_1_standard, unit_2_standard = uc._resolve_ounce(unit_1_standard, unit_2_standard) | ||
|
|
||
| # create custon unit definition so pint can handle them natively | ||
| uc.ureg.define(f"{PINT_UNIT_1_TXT} = {unit_1.standard_quantity} * {unit_1_standard}") | ||
| uc.ureg.define(f"{PINT_UNIT_2_TXT} = {unit_2.standard_quantity} * {unit_2_standard}") | ||
|
|
||
| pint_unit_1 = uc.parse(PINT_UNIT_1_TXT) | ||
| pint_unit_2 = uc.parse(PINT_UNIT_2_TXT) | ||
|
|
||
| merged_q, merged_u = uc.merge(qty_1, pint_unit_1, qty_2, pint_unit_2) | ||
|
|
||
| # Convert to the bigger unit if quantity >= 1, else the smaller unit | ||
| merged_q, merged_u = uc.convert(merged_q, merged_u, max(pint_unit_1, pint_unit_2)) | ||
| if abs(merged_q) < 1: | ||
| merged_q, merged_u = uc.convert(merged_q, merged_u, min(pint_unit_1, pint_unit_2)) | ||
|
|
||
| if str(merged_u) == PINT_UNIT_1_TXT: | ||
| return merged_q, unit_1 | ||
| else: | ||
| return merged_q, unit_2 |
There was a problem hiding this comment.
The meat of the PR, which handles all the actual unit conversions
| return unit_1, FL_OUNCE | ||
|
|
||
| return unit_1, unit_2 | ||
|
|
There was a problem hiding this comment.
The magic "ounce -> fluid ounce" converter (more info in the PR body)
a187afd to
7347d81
Compare
YES, we absolutely do. Also its wasting huge amounts of vertical space because it adds space for potential hints, labels etc that are usually hidden |
|
Yeah something as simple as putting it in a grid and specifying the cols (defaulting to 12) would probably work quite well. Definitely something I can take up in the future |

What this PR does / why we need it:
(REQUIRED)
Enables standardization of units, allowing for things such as automatic unit conversion. This PR specifically implements:
To the user the only new useful feature of this PR is automatically merging shopping list units:

Going forward, unit standardization will enable several features, such as:
How is the data stored?
At the data layer standard units are just strings. The understanding is that anything stored as a "standard unit" is something understood by pint, but we don't actually validate this (the unit conversion processing has error handling if we get bad data, so it doesn't matter). On the frontend we have an arbitrary hardcoded list:
Pint supports hundreds of units so this is obviously not comprehensive, but I think this covers everything a Mealie user would need (includes a few different volume and mass units, in both imperial and metric systems). We could always add/remove options, too, since this list is not stored in the db.
How is this data used?
The new
UnitConverterclass handles conversions between units. It's a wrapper around pint and enables parsing units as strings orpint.Unitobjects, as well as associated quantities. When we receive a Mealie unit we tell pint what this unit means:...and now pint understands what
mealie-unit-1andmealie-unit-2are. This is done at runtime so we don't have to manage any external definitions file.How is this data managed?
Since units, unlike foods, are pretty universally standardized in context of recipes, I've implemented several automatic standardization methods:
So existing users don't have to do anything to standardize their data for the most part, unless they have unexpected units they want to standardize. Similarly, new users using seed data also don't have to do this manually.
For users with unexpected units, or users who want to tweak things for some reason, these fields are added to the data management page:

Where the second field is a dropdown of our arbitrary known units:

Which issue(s) this PR fixes:
(REQUIRED)
Several discussions either ask for or otherwise depend on this:
Special notes for your reviewer:
(fill-in or delete this section)
I added some gross-looking special handling for
ounce. Ounce is a weird one because often times (at least here in the US) recipes use "ounce" for both mass and volume (e.g. "1 cup = 8 ounces"). This is technically incorrect because ounces are a mass unit, whereas we should be using "fluid ounces" (e.g. "1 cup = 8 fluid ounces"). The special handling looks at the two units it wants to combine, and if one is literally "ounce" and the other is a volume unit, assume we actually meant "fluid ounce".Users can, of course, use "ounce" and "fluid ounce" correctly by defining both and diligently updating their recipes to use the correct unit... but I'm certainly not going to do that (and I'm sure most users agree). AFAIK there are no other cases where we want this kind of logic other than this exact use case.
Testing
(fill-in or delete this section)
Comprehensive backend tests. I also manually tested a few edgecases around the data management page.