Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions GSoC2018/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# GSoC 2018 Projects

Google Summer of Code 2018 projects for OpenFoodFacts AI.

## Projects

### Table Detection
Object detection system for detecting tables in food packaging images.

## Installation

Install the required dependencies:

```bash
pip install -r requirements.txt
```

## Usage

See the individual project directories for specific usage instructions:

- `table_detection/` - Table detection model and utilities
- `GSoC2018_poc/` - Proof of concept implementations

## Files

- `resize.py` - Image resizing utilities
- `table_detection/utils/` - Visualization and label utilities for object detection
5 changes: 5 additions & 0 deletions GSoC2018/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
tensorflow>=2.0.0
opencv-python
numpy
matplotlib
Pillow
7 changes: 7 additions & 0 deletions ai-emlyon/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pandas>=1.3.0
numpy>=1.20.0
matplotlib>=3.0.0
seaborn>=0.11.0
scikit-learn>=1.0.0
xgboost>=1.5.0
jupyter>=1.0.0
40 changes: 40 additions & 0 deletions circular-net-model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Circular Model

Machine learning model for circular detection in product images, with barcode generation capabilities.

## Features

- Circular pattern detection in product images
- Barcode generation and processing
- Image dataset downloading from OpenFoodFacts
- Jupyter notebook with model training pipeline

## Installation

Install the required dependencies:

```bash
pip install -r requirements.txt
```

## Usage

### Download Images
```bash
python download_images.py
```

### Generate Barcodes
```bash
python generate_barcode.py
```

### Model Training
Open and run the `circular_model.ipynb` notebook for model training and evaluation.

## Files

- `circular_model.ipynb` - Main Jupyter notebook with model implementation
- `download_images.py` - Script to download images from OpenFoodFacts
- `generate_barcode.py` - Barcode generation utilities
- `images/` - Directory for storing downloaded images
7 changes: 7 additions & 0 deletions circular-net-model/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
requests>=2.25.0
jupyter
matplotlib
numpy
pandas
tensorflow>=2.0.0
Pillow
29 changes: 29 additions & 0 deletions data-quality/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Data Quality

Tools and scripts for analyzing and improving data quality in the OpenFoodFacts database.

## Features

- Language switching for ingredient lists
- Data quality analysis and reporting
- Database consistency checks
- Automated data cleaning utilities

## Installation

Install the required dependencies:

```bash
pip install -r requirements.txt
```

## Usage

### Switch Ingredient Language
```bash
python switch_ingredient_lang.py
```

## Files

- `switch_ingredient_lang.py` - Script to switch ingredient language codes and analyze data quality
6 changes: 6 additions & 0 deletions data-quality/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
openfoodfacts>=0.2.0
requests>=2.25.0
typer>=0.12.0
tqdm>=4.60.0
redis>=4.0.0
backoff>=2.0.0
39 changes: 39 additions & 0 deletions front-image-classification/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Front Image Classification

Image classification system for categorizing front-facing product images using machine learning.

## Features

- Product front image classification
- Training pipeline with data augmentation
- CLI interface for training and inference
- Integration with OpenFoodFacts database
- Support for multiple ML backends

## Installation

This project uses Python script dependencies (PEP 723). The dependencies are defined inline in the script files.

For manual installation:

```bash
pip install typer tqdm Pillow ultralytics albumentations opencv-python numpy openfoodfacts duckdb torch
```

## Usage

### Training
```bash
python train.py
```

### CLI Interface
```bash
python cli.py --help
```

## Files

- `train.py` - Main training script with inline dependencies
- `cli.py` - Command-line interface for the classifier
- `ml_commons.py` - Common ML utilities and data transformations
14 changes: 14 additions & 0 deletions front-image-classification/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Front Image Classification uses PEP 723 inline script dependencies
# Dependencies are defined directly in the Python script files

# For manual installation, install these packages:
typer>=0.12.0
tqdm>=4.60.0
Pillow>=8.0.0
ultralytics>=8.0.0
albumentations>=1.0.0
opencv-python>=4.5.0
numpy>=1.20.0
openfoodfacts>=0.2.0
duckdb>=0.8.0
torch>=1.12.0
36 changes: 36 additions & 0 deletions genai-features/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# GenAI Features

Generative AI experiments and feature development for OpenFoodFacts data analysis.

## Features

- Analysis of recent changes in OpenFoodFacts data
- Jupyter notebooks for data exploration
- Generative AI model experiments
- Data pipeline prototypes

## Installation

Install the required dependencies:

```bash
pip install -r requirements.txt
```

## Usage

### Explore Recent Changes
Open and run the notebook:
```bash
jupyter notebook notebooks/explore_recent_changes.ipynb
```

## Structure

- `notebooks/` - Jupyter notebooks for data exploration and analysis
- `dataset/` - Dataset-related files and configurations
- `prompts/` - Prompt templates and configurations for generative AI models

## Files

- `dataset/recent_changes.txt` - Configuration for recent changes data source
7 changes: 7 additions & 0 deletions genai-features/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
jupyter>=1.0.0
pandas>=1.3.0
numpy>=1.20.0
matplotlib>=3.0.0
requests>=2.25.0
openfoodfacts>=0.2.0
transformers>=4.0.0
57 changes: 57 additions & 0 deletions ingredient_extraction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Ingredient Extraction

Machine learning models and tools for extracting structured ingredient information from product text.

## Features

- Dataset generation for ingredient extraction tasks
- Model training and fine-tuning pipelines
- LayoutLM-based document understanding
- Model analysis and evaluation tools
- Streamlit demo interface

## Installation

Install the required dependencies:

```bash
pip install -r requirements.txt
```

## Usage

### Dataset Generation
```bash
cd dataset-generation
python generate_dataset.py
```

### Model Training
```bash
cd train
python train_model.py
```

### LayoutLM Training
```bash
cd train-layoutlm
python train_layoutlm.py
```

### Model Analysis
```bash
cd model-analysis
python evaluate_model.py
```

### Demo
```bash
streamlit run model-analysis/streamlit_demo.py
```

## Structure

- `dataset-generation/` - Scripts for creating training datasets
- `train/` - Standard model training pipeline
- `train-layoutlm/` - LayoutLM-specific training code
- `model-analysis/` - Model evaluation and analysis tools
11 changes: 11 additions & 0 deletions ingredient_extraction/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
transformers>=4.20.0
torch>=1.12.0
datasets>=2.0.0
streamlit>=1.20.0
layoutlm>=0.1.0
openfoodfacts>=0.2.0
pandas>=1.3.0
numpy>=1.20.0
matplotlib>=3.0.0
scikit-learn>=1.0.0
tqdm>=4.60.0
52 changes: 52 additions & 0 deletions language_identification/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Language Identification

Machine learning models for automatic language identification in product text data.

## Features

- Language detection for product ingredients and descriptions
- Training pipelines for language classification models
- Data extraction and preprocessing scripts
- Model evaluation and metrics calculation
- Inference utilities for production use

## Installation

This project uses [Poetry](https://python-poetry.org/) for dependency management.

```bash
poetry install
```

Or install with pip:
```bash
pip install -r requirements.txt
```

## Usage

### Extract Data
```bash
poetry run python scripts/01_extract_data.py
```

### Calculate Metrics
```bash
poetry run python scripts/03_calculate_metrics.py
```

### Run Inference
```bash
poetry run python scripts/inference.py
```

## Project Structure

- `scripts/` - Data processing and model training scripts
- `01_extract_data.py` - Data extraction from OpenFoodFacts
- `03_calculate_metrics.py` - Model evaluation metrics
- `inference.py` - Model inference utilities

## Dependencies

This project uses Poetry for dependency management. See `pyproject.toml` for the complete list of dependencies.
Loading
Loading