AI Model Distillation for Financial Data Developer Example

A production-ready developer example demonstrating how to distill large language models into smaller, cost-efficient models for financial workloads using the NVIDIA Data Flywheel Blueprint.

Built on NVIDIA NeMo Microservices, this example shows how to automatically fine-tune and evaluate student models for financial news classification, achieving teacher-model accuracy while reducing inference costs by up to 98%.

The purpose of this Developer Example is two-fold:

To provide a working reference implementation demonstrating how to use the Data Flywheel Blueprint for financial services use cases.
To educate the community on practical model distillation techniques: what works, what doesn't, and how to apply these methods to your own domain.

You can get started quickly and achieve similar results using your own infrastructure by following the Quickstart guide.

Financial Use Case: News Event Classification
What is a Data Flywheel?
How to Use This Developer Example
Real-World Results: Financial News Classification
Technical Details
Next Steps
Contributing
License
Disclaimer

Financial Use Case: News Event Classification

Demonstrates model distillation on financial news headlines classification (13 event categories: market movements, earnings, regulatory changes, etc.).

The Workflow:

Teacher: Generate labeled data using Llama 3.3 Nemotron 49B (or Llama 3.3 70B).
Distill: Transfer knowledge to smaller models (Llama 3.2 1B/3B, Llama 3.1 8B).
Evaluate: Use F1-score metrics to measure classification accuracy.
Deploy: Serve cost-efficient models matching teacher performance.

What is a Data Flywheel?

Note: Built on the NVIDIA Data Flywheel Blueprint. Code adapted for financial services with F1-score evaluation.

A data flywheel uses production data (LLM logs, feedback, labels) to reduce latency and cost of GenAI systems:

flowchart TD
  app[Your App] --prompts/responses/feedback--> logs[Log service]
  logs --Create Datasets--> orch["Orchestrator"]
  orch --> exp1["Exp #1"]
  orch --> exp2["Exp #2"]
  orch --> expN["Exp #N"]
  exp1 --> results
  exp2 --> results
  expN --> results

Production traffic flows to a centralized logging service. From there, evaluation and fine-tuning datasets are created for offline experiments. Key decisions include model selection, dataset curation, fine-tuning techniques, and evaluation metrics.

Where the NeMo microservices Come In

NeMo Microservices provides programmatic control of datasets, fine-tuning, evaluation, and inference. Automates experiment exploration with sensible defaults, surfacing promising candidates for further analysis.

flowchart TD

app["Your application"] --Prompt/completion logs--> log_store["Log Store"]
log_store --Datasets--> datasets["NeMo Datastore"]
datasets --"Fine-tuning datasets"--> customizer["NeMo Customizer"]
datasets --"Eval datasets"--> evaluator["NeMo Evaluator"]

subgraph NIMs["Loop across ALL NIMs"]
  customizer --"Customized model"--> NIM
  evaluator --> NIM
  NIM --> evaluator
end

evaluator --> results["Flywheel Results"]

Automated process using NeMo microservices:

Ingest: Pull data from log store and de-duplicate by task.
Curate: Create eval/fine-tuning datasets using stratified splitting for balanced representation.
Store: Manage datasets in NeMo Datastore.
Train: Launch fine-tuning jobs (NeMo Customizer using LoRA).
Score: Run F1-score evaluations (NeMo Evaluator).

How to Use This Developer Example

This implementation uses an effective approach: routing production traffic to fine-tuning, using teacher model responses as ground truth, with no manual labeling required. This works well for classification tasks, structured outputs, and domain-specific workflows with consistent patterns, but may not suit open-ended creative generation or highly regulated outputs requiring human review.

To get started: Follow the Quickstart Guide to deploy with the provided financial news dataset and see the workflow in action.

To adapt for your use case: Learn how to instrument your application and prepare your data by reading the Data Logging Guide. This covers the required log schema, application instrumentation examples, and data preparation steps.

Real-World Results: Financial News Classification

Results from financial news headlines dataset with 13 event categories:

Dataset Size	Model	Base F1-Score	Customized F1-Score
5K samples	Llama 3.2 1B	0.36	0.85
10K samples	Llama 3.2 1B	0.34	0.89
25K samples	Llama 3.2 1B	0.32	0.95
25K samples	Llama 3.2 3B	0.72	0.95

Key Findings:

Fine-tuned 1B models achieve 0.95+ F1-score, matching 70B teacher model performance
Approximately 98% inference cost reduction by replacing 70B with fine-tuned 1B models
Performance improves with more training data (flywheel effect)
Similar cost reductions observed in NVIDIA internal testing (HR chatbot: 98.6% reduction, Qwen-2.5-32b replacing Llama-3.1-70b: 50%+ reduction)

Techniques demonstrated here apply to other financial workloads: document analysis, compliance checking, trade analysis, customer support.

Technical Details

This developer example demonstrates NeMo Microservices capabilities on financial classification tasks, providing a foundation for production-ready model distillation. The system orchestrates multi-stage workflows including dataset creation, model deployment, F1-score evaluation, LoRA fine-tuning, and automated resource management.

For complete technical architecture, software components, workflow details, and design philosophy, see the Architecture Overview.

Minimum System Requirements

Requirement Type	Details
Minimum GPU	Self-hosted LLM Judge: 6× (NVIDIA H100, or A100 GPUs) Remote LLM Judge: 2× (NVIDIA H100, or A100 GPUs)
Cluster	Single-node NVIDIA GPU cluster on Linux with cluster-admin permissions
Disk Space	At least 200 GB free
Software	Python 3.11 Docker Engine Docker Compose v2
Services	Elasticsearch 8.12.2 MongoDB 7.0 Redis 7.2 FastAPI (API server) Celery (task processing)
Resource	Minimum Memory: 1GB (512MB reserved for Elasticsearch) Storage: Varies by log volume/model size Network: Ports 8000 (API), 9200 (Elasticsearch), 27017 (MongoDB), 6379 (Redis)
Development	Docker Compose for local dev with hot reloading Supports macOS (Darwin) and Linux Optional: GPU support for model inference
Production	Kubernetes cluster (recommended) Resources scale with workload Persistent volume support for data storage

Security and Compliance for Financial Services

When deploying this example in financial services production environments, consider:

Data Privacy: The reference implementation logs raw production traffic. For financial data, implement PII redaction and data governance controls before production use.
Model Validation: F1-scores measure statistical similarity, not business correctness. Always validate model outputs against compliance requirements.
Audit Trails: All experiments are logged in MLflow. Implement additional audit logging for regulatory compliance.
Access Control: Secure API endpoints, Elasticsearch, and MLflow with appropriate authentication and authorization.

See SECURITY.md for reporting security vulnerabilities.

Task Serialization Safeguard

Why only one Flywheel run at a time? When the Flywheel kicks off a run it may need to spin up multiple NIMs and customization jobs, each of which can claim one or more GPUs. The reference implementation does not yet discover the number of free GPUs in the cluster, so it uses a simple guardrail: all invocations of run_nim_workflow_dag are serialized.

The task is bound to a dedicated Celery queue (parent_queue). In the docker-compose.yaml there is a worker dedicated to this queue whose concurrency is set to 1. There is a second worker bound to the default celery queue which can handle running other tasks (e.g. evals) in parallel.
Inside the task we wait for the full DAG to complete via async_result.get(...) before returning.
The call to create a job (i.e. POST /jobs) will not block, however. It will return immediately with a Job ID

This ensures that only one Flywheel experiment can allocate GPUs at any given time, preventing accidental overallocation that would lead to failed NIM deployments or customizations.

Roadmap – Automatic GPU introspection and smarter scheduling are planned for a future version of the Blueprint so multiple Flywheel runs can execute in parallel when resources permit.

Next Steps

Getting Started

Follow the Quickstart Guide to run the financial news classification example
Review the Architecture Overview to understand the system design
Check the Audience Guide for role-specific guidance

Documentation & Resources

Complete Documentation: Documentation Guide for role-based navigation and comprehensive documentation index
Configuration: Configuration Guide for environment variables, model integration, and evaluation settings
Integration: Data Logging for AI Apps for instrumenting your application
Production: Production Deployment Guide and Helm Installation
Troubleshooting: FAQ & Troubleshooting for common issues and solutions

External Resources

Contributing

Install development dependencies:
```
uv sync --dev
```
This command installs all dependencies needed to build the container and run the tests.
Start required services:
```
./scripts/run.sh
```
This starts the necessary services via docker compose that are required for testing.
Run the tests:
- For unit tests (requires MongoDB from docker compose):
```
uv run pytest
```
- For integration tests (with mocked NeMo microservices components):
```
uv run pytest -m integration
```
Clean up after development:
- Stop all services:
```
./scripts/stop.sh
```
- (Optional) Clear all database volumes:
```
./scripts/clear_all_volumes.sh
```

If you modify the API, regenerate the openapi.json with the following command:

uv run python scripts/generate_openapi.py

License

Use of the jupyter notebook and scripts is governed by the Apache 2.0 License. Use of the Nemo Evaluator and Customizer microservices is governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products. Use of the Llama-3.1-8B-Instruct, Llama-3.1-70B-Instruct, Llama-3.2-1B-Instruct, and Llama 3.3-70B-Instruct models is governed by the NVIDIA Community Model License. Use of the Llama-3.3-Nemotron-Super-49B-v1 and Llama-3.2-3B-Instruct models is governed by the NVIDIA Open Model License Agreement.

Additional Information

Llama 3.1 Community License Agreement for Llama-3.1-8B-Instruct and Llama-3.1-70B-Instruct. Llama 3.2 Community License Agreement for Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct. Llama 3.3 Community License Agreement for Llama 3.3-70B-Instruct and Llama-3.3-Nemotron-Super-49B-v1. Built with Llama.

Disclaimer

The AI Model Distillation for Financial Data developer example and Data Flywheel Blueprint are shared as reference and is provided "as is". The security in the production environment is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats; define the trust boundaries, implement logging and monitoring capabilities,secure the communication channels, integrate AuthN & AuthZ with appropriate controls.

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.github		.github
Maintainers		Maintainers
architecture		architecture
config		config
customize		customize
data		data
deploy		deploy
docs		docs
evaluate		evaluate
notebooks		notebooks
scripts		scripts
src		src
tests		tests
third_party		third_party
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE-3rd-party.txt		LICENSE-3rd-party.txt
LICENSE-dataset		LICENSE-dataset
README.md		README.md
SECURITY.md		SECURITY.md
conftest.py		conftest.py
demo-values.yaml		demo-values.yaml
kyverno-wandb-injection.yaml		kyverno-wandb-injection.yaml
logs.txt		logs.txt
openapi.json		openapi.json
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Model Distillation for Financial Data Developer Example

Financial Use Case: News Event Classification

What is a Data Flywheel?

Where the NeMo microservices Come In

How to Use This Developer Example

Real-World Results: Financial News Classification

Technical Details

Minimum System Requirements

Security and Compliance for Financial Services

Task Serialization Safeguard

Next Steps

Getting Started

Documentation & Resources

External Resources

Contributing

License

Additional Information

Disclaimer

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI Model Distillation for Financial Data Developer Example

Financial Use Case: News Event Classification

What is a Data Flywheel?

Where the NeMo microservices Come In

How to Use This Developer Example

Real-World Results: Financial News Classification

Technical Details

Minimum System Requirements

Security and Compliance for Financial Services

Task Serialization Safeguard

Next Steps

Getting Started

Documentation & Resources

External Resources

Contributing

License

Additional Information

Disclaimer

About

Resources

License

Licenses found

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages