MIT SGI 2025 topology control project.
A modular, config-driven ML pipeline for 3D shape classification using PyTorch. Features auto-discovery of mesh files, configurable train/val splitting, and comprehensive artifact management. Please see part 1 and part 2 of our article on the SGI 2025 Blog for details.
- Modular Pipeline: Data processing → Model building → Training → Evaluation
- Auto-Discovery: Automatically finds and processes mesh files in raw data directory
- Config-Driven: Fully configurable via YAML files
- Artifact Management: Saves experiment results, model info, and training plots
- 3D Shape Processing: Converts meshes to point clouds with signed distance fields
- Flexible Architecture: Supports configurable models
This project requires Python 3.10. You can set up the environment using either conda or venv:
conda create --name topologycontrol python=3.10
conda activate topologycontrolpython3.10 -m venv topologycontrol
source topologycontrol/bin/activate # On Linux/Mac
# or
topologycontrol\Scripts\activate # On Windowspython -m pip install -r requirements.txtThe requirements.txt includes:
torch- PyTorch for deep learningnumpy- Numerical computingmeshio- Mesh file I/Opolyscope- 3D visualizationmatplotlib- Plotting and visualizationpyyaml- YAML configuration parsinglibigl- Geometry processing (if available)
Note: Some packages like triangle may need to be installed separately if geometry processing fails.
topology-control/
├── main.py # Main pipeline entry point
├── config/
│ └── config.yaml # Configuration file
├── data/
│ ├── raw/ # Raw mesh files (.obj)
│ └── processed/ # Processed data (train/val splits)
├── src/
│ ├── CPipelineOrchestrator.py # Main pipeline controller
│ ├── CDataProcessor.py # Data processing and mesh handling
│ ├── CArchitectureManager.py # Model architecture definitions
│ ├── CModelTrainer.py # Training and validation logic
│ ├── CEvaluator.py # Model evaluation
│ ├── CGeometryUtils.py # 3D geometry utilities
│ └── CArtifactManager.py # Experiment artifact management
└── artifacts/ # Generated experiment artifacts
- Place your mesh files (
.objformat) indata/raw/ - Configure the pipeline in
config/config.yaml - Run the pipeline:
python main.pyEdit config/config.yaml to customize:
- Data paths: Raw and processed data directories
- Model settings: Architecture, input/output dimensions
- Training parameters: Learning rate, batch size, epochs
- Processing options: Point cloud sampling, train/val split ratio
- Pipeline control: Skip specific steps for debugging
# Basic setup
home: /path/to/topology-control
# Model configuration
model_config:
skip_building: false
model_name: mlp
input_dim: 3000 # 1000 points × 3 coordinates
hidden_dims: [512, 256, 128]
output_dim: 1
max_points: 1000 # Fixed number of points per shape
# Training parameters
trainer_config:
skip_training: false
learning_rate: 0.001
batch_size: 32
num_epochs: 50
optimizer: adam
loss_function: mseThe pipeline automatically:
- Discovers all
.objfiles indata/raw/ - Converts meshes to point clouds with signed distance fields
- Splits data into train/validation sets
- Saves processed data to
data/processed/train/anddata/processed/val/
Each experiment generates timestamped artifacts in artifacts/experiment_YYYYMMDD_HHMMSS/:
pipeline_summary.txt- Overall pipeline execution summarymodel_architecture.txt- Model structure and parameter countstraining_results.txt- Training metrics and loss curveserror_report.txt- Error details if pipeline fails
- ModuleNotFoundError: Ensure all dependencies are installed
- Tensor size mismatch: Check
max_pointsconfiguration matches modelinput_dim - No mesh files found: Verify
.objfiles are indata/raw/directory - CUDA errors: Set device explicitly or ensure GPU drivers are updated
If you encounter package conflicts:
# Clean conda environment
conda remove --name topologycontrol --all
conda create --name topologycontrol python=3.10
conda activate topologycontrol
pip install -r requirements.txtIf processing fails:
- Ensure mesh files are valid
.objformat - Check file permissions in data directories
- Verify sufficient disk space for processed data
- Implement model class in
CArchitectureManager.py - Add model configuration to
config.yaml - Update
get_model()method to handle new architecture
- Modify
CDataProcessor.pyfor new data formats - Update
CGeometryUtils.pyfor new geometry operations - Adjust dataset class in
CModelTrainer.pyif needed
MIT License - See LICENSE file for details.