Skip to content

Guidance for Integrating KBLaM with Food/Nutrition Knowledge Base #82

@maksvavken

Description

@maksvavken

Hello KBLaM team,

First of all, thank you for making this project open and accessible — it's a really exciting approach to structured knowledge injection.

I'ma student working on a domain-specific use case involving a large food and nutrition database, which includes thousands of entities covering products, nutritional values, food groups, and other attributes such as:

  • Macronutrients (e.g. ENERC, FAT, CHO, PROT)
  • Micronutrients (e.g. VITC, CA, FE)
  • Numeric values (e.g. 12.2 g sugar, 0.6 g saturated fat, 61.0 g carbohydrates)
  • Categorization tags (e.g. "cereal products", "vegetables")

Training Setup

I trained the model for 2,000 steps using the Meta-Llama-3-8B-Instruct model. The food database was transformed into the expected KBLaM format like this:

{"name": "White pepper", "property": "FOOD_GROUP", "value": "Seasoning"}
{"name": "White pepper", "property": "CA", "value": "265.0"}
{"name": "Beef soup", "property": "THIA", "value": "0.022"}

Observed Issues

  • Despite the training, I’m encountering several challenges:

  • Poor retrieval quality for health-related queries
    e.g. “Which foods are good for diabetics?” often retrieves items with high sugar or refined carbs.

  • Abbreviations like ENERC, FIBT, FASAT are not well understood
    (Note: I plan to map these to full names in the next training run.)

  • Generated outputs are incoherent, sometimes repeating the user's prompt or hallucinating answers.

  • Numeric values (e.g. 12.6g) appear particularly problematic for the model to use effectively.

Request for Guidance

Could you please advise on best practices for integrating a numeric-heavy, structured KB like this into KBLaM?

Specifically:

  • Handling Numeric Data: Any suggestions for how to better encode or structure numerical values so that the model can reason over them effectively?

  • Downstream Fine-Tuning: Should I augment the training with open-ended QA examples (e.g. “What foods are good for a Mediterranean diet?”)?

I understand that numerical information may be difficult for the current compression method, as noted in the paper, but any insights or advice would be greatly appreciated.

Thank you again for your great work and for your time!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions