MLflow Autologging Integration for BERTopic

### Feature request

I am currently implementing an autologging integration for BERTopic with MLflow for managing ML experiments. This would automatically log BERTopic's training parameters (e.g., embedding model, UMAP/HDBSCAN settings), metrics, artifacts , and the fitted model via MLflow's PyFunc flavor during `fit_transform` calls. The goal is to simplify experiment tracking in BERTopic workflows without manual logging.



### Motivation

**Approach**: Monkey-patching `BERTopic.fit_transform` using MLflow's `safe_patch` for safe integration.
**What Gets Logged:** 
**Parameters:** Embedding model name, UMAP (n_neighbors, n_components, ...), HDBSCAN (min_cluster_size...), vectorizer type
**Metrics:** n_documents, avg_doc_length, n_topics, n_outliers, avg/max/min_topic_size, vocab_size, embedding_dim, diversity, coherence (c_v, c_npmi, u_mass via gensim), per-topic coherence.
Artifacts: topic_info.csv, metrics.json, per_topic_coherence.csv, embeddings.npy and the full model as PyFunc.
**Flavor Support:** Registered as an MLflow flavor (`bertopic`) with `@autologging_integration`enabling mlflow.autolog()


**Mlflow issue** : https://github.com/mlflow/mlflow/issues/16792#issuecomment-3094634324



### Your contribution

Seeking Feedback: 

Is this something you would be interested in merging into BERTopic's core as an optional MLflow submodule or would it be better as an external package like mlflow-scikit-learn or mlflow-txtai.

Should I pursue adding this as a separate repo with a lazy import in MLflow's __init__.py via PR or integrate it directly into BERTopic? Pros/cons from your perspective?

I would like to hear your thoughts on this. Thank you! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MLflow Autologging Integration for BERTopic #2429

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

MLflow Autologging Integration for BERTopic #2429

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions