Skip to content

Move pandas, datasets and optuna as optional deps #8274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,16 @@ DSPy stands for Declarative Self-improving Python. Instead of brittle prompts, y

## Installation


Basic installation:
```bash
pip install dspy
```

To include packages necessary for DSPy optimizers,
```bash
pip install dspy[optimize]
```

To install the very latest from `main`:

```bash
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Instead of wrangling prompts or training jobs, DSPy (Declarative Self-improving
!!! info "Getting Started I: Install DSPy and set up your LM"

```bash
> pip install -U dspy
> pip install -U "dspy[optimize]"
```

=== "OpenAI"
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/agents/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"Let's walk through a quick example of setting up a `dspy.ReAct` agent with a couple of tools and optimizing it to conduct advanced browsing for multi-hop search.\n",
"\n",
"Install the latest DSPy via `pip install -U dspy` and follow along.\n",
"Install the latest DSPy via `pip install -U dspy[optimize]` and follow along.\n",
"\n",
"<details>\n",
"<summary>Recommended: Set up MLflow Tracing to understand what's happening under the hood.</summary>\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/audio/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"Ensure you're using the latest DSPy version:\n",
"\n",
"```shell\n",
"pip install -U dspy\n",
"pip install -U dspy[optimize]\n",
"```\n",
"\n",
"To handle audio data, install the following dependencies:\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/classification_finetuning/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"\n",
"### Install dependencies and download data\n",
"\n",
"Install the latest DSPy via `pip install -U dspy>=2.6.0` and follow along (or `uv pip`, if you prefer). This tutorial depends on DSPy >= 2.6.0.\n",
"Install the latest DSPy via `pip install -U dspy[optimize]` and follow along (or `uv pip`, if you prefer). This tutorial depends on DSPy >= 2.6.0.\n",
"\n",
"This tutorial requires a local GPU at the moment for inference, though we plan to support ollama serving for finetuned models as well.\n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/entity_extraction/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
"outputs": [],
"source": [
"# Install the latest version of DSPy\n",
"%pip install -U dspy-ai\n",
"%pip install -U dspy[optimize]\n",
"# Install the Hugging Face datasets library to load the CoNLL-2003 dataset\n",
"%pip install datasets"
]
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/image_generation_prompting/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
"source": [
"You can install DSPy via:\n",
"```bash\n",
"pip install -U dspy\n",
"pip install -U dspy[optimize]\n",
"```\n",
"\n",
"For this example, we'll use Flux Pro from FAL. You can get an API key [here](https://fal.com/flux-pro)\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/math/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"Let's walk through a quick example of setting up a `dspy.ChainOfThought` module and optimizing it for answering algebra questions.\n",
"\n",
"Install the latest DSPy via `pip install -U dspy` and follow along.\n",
"Install the latest DSPy via `pip install -U dspy[optimize]` and follow along.\n",
"\n",
"<details>\n",
"<summary>Recommended: Set up MLflow Tracing to understand what's happening under the hood.</summary>\n",
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/mcp/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ to [MCP servers built by the community](https://modelcontextprotocol.io/examples
Before starting, let's install the required dependencies:

```shell
pip install -U dspy mcp
pip install -U dspy[mcp]
```

## MCP Server Setup
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/multihop_search/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"Let's walk through a quick example of building a `dspy.Module` with multiple sub-modules. We'll do this for the task for multi-hop search.\n",
"\n",
"Install the latest DSPy via `pip install -U dspy` and follow along."
"Install the latest DSPy via `pip install -U dspy[optimize]` and follow along."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/rag/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"Let's walk through a quick example of **basic question answering** with and without **retrieval-augmented generation** (RAG) in DSPy. Specifically, let's build **a system for answering Tech questions**, e.g. about Linux or iPhone apps.\n",
"\n",
"Install the latest DSPy via `pip install -U dspy` and follow along. If you're looking instead for a conceptual overview of DSPy, this [recent lecture](https://www.youtube.com/live/JEMYuzrKLUw) is a good place to start."
"Install the latest DSPy via `pip install -U dspy[optimize]` and follow along. If you're looking instead for a conceptual overview of DSPy, this [recent lecture](https://www.youtube.com/live/JEMYuzrKLUw) is a good place to start."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorials/tool_use/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"\n",
"Let's walk through a quick example of building and prompt-optimizing a DSPy agent for advanced tool use. We'll do this for the challenging task [ToolHop](https://arxiv.org/abs/2501.02506) but with an even stricter evaluation criteria.\n",
"\n",
"Install the latest DSPy via `pip install -U dspy` and follow along. You will also need to `pip install func_timeout`."
"Install the latest DSPy via `pip install -U dspy[optimize]` and follow along. You will also need to `pip install func_timeout`."
]
},
{
Expand Down
16 changes: 10 additions & 6 deletions dspy/evaluate/evaluate.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import logging
import types
import importlib
from typing import TYPE_CHECKING, Any, Callable, List, Optional, Tuple, Union

if TYPE_CHECKING:
Expand Down Expand Up @@ -178,12 +179,15 @@ def process_item(example):
logger.info(f"Average Metric: {ncorrect} / {ntotal} ({round(100 * ncorrect / ntotal, 1)}%)")

if display_table:
# Rename the 'correct' column to the name of the metric object
metric_name = metric.__name__ if isinstance(metric, types.FunctionType) else metric.__class__.__name__
# Construct a pandas DataFrame from the results
result_df = self._construct_result_table(results, metric_name)

self._display_result_table(result_df, display_table, metric_name)
if importlib.util.find_spec("pandas") is not None:
# Rename the 'correct' column to the name of the metric object
metric_name = metric.__name__ if isinstance(metric, types.FunctionType) else metric.__class__.__name__
# Construct a pandas DataFrame from the results
result_df = self._construct_result_table(results, metric_name)

self._display_result_table(result_df, display_table, metric_name)
else:
logger.warning("Skipping table display since `pandas` is not installed.")

if return_all_scores and return_outputs:
return round(100 * ncorrect / ntotal, 2), results, [score for *_, score in results]
Expand Down
11 changes: 8 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,10 @@ dependencies = [
"backoff>=2.2",
"joblib~=1.3",
"openai>=0.28.1",
"pandas>=2.1.1",
"regex>=2023.10.3",
"ujson>=5.8.0",
"tqdm>=4.66.1",
"datasets>=2.14.6",
"requests>=2.31.0",
"optuna>=3.4.0",
"pydantic>=2.0",
"magicattr>=0.1.6",
"litellm>=1.60.3",
Expand All @@ -48,6 +45,11 @@ dependencies = [
]

[project.optional-dependencies]
optimize = [
"datasets>=2.14.6",
"pandas>=2.1.1",
"optuna>=3.4.0",
]
anthropic = ["anthropic>=0.18.0,<1.0.0"]
weaviate = ["weaviate-client~=4.5.4"]
aws = ["boto3~=1.34.78"]
Expand All @@ -66,6 +68,9 @@ dev = [
]
test_extras = [
"mcp; python_version >= '3.10'",
"datasets>=2.14.6",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open for discussion - do you think it's better to have these go to dev or test_extras? My first impression is they belong to dev better, but not very sure

Copy link
Collaborator Author

@TomeHirata TomeHirata May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think test_extras should be a combination of all extra dependencies that are installed when running pytest. To my understanding, dev usually only contains packages that help development such as linter, test libraries.

"pandas>=2.1.1",
"optuna>=3.4.0",
]

[tool.setuptools.packages.find]
Expand Down
36 changes: 18 additions & 18 deletions tests/datasets/test_dataset.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
import pytest
import tempfile
import unittest
import uuid

import pandas as pd

from dspy import Example
from dspy.datasets.dataset import Dataset

Expand All @@ -15,6 +13,7 @@

class CSVDataset(Dataset):
def __init__(self, file_path, input_keys=None, *args, **kwargs) -> None:
import pandas as pd
super().__init__(input_keys=input_keys, *args, **kwargs)
df = pd.read_csv(file_path)
data = df.to_dict(orient="records")
Expand All @@ -28,21 +27,22 @@ def __init__(self, file_path, input_keys=None, *args, **kwargs) -> None:
]


class TestCSVDataset(unittest.TestCase):
def test_input_keys(self):
with tempfile.NamedTemporaryFile(mode="w+", suffix=".csv") as tmp_file:
tmp_file.write(dummy_data)
tmp_file.flush()
dataset = CSVDataset(tmp_file.name, input_keys=["content", "question"])
self.assertIsNotNone(dataset.train)
@pytest.fixture
def csv_file():
with tempfile.NamedTemporaryFile(mode="w+", suffix=".csv") as tmp_file:
tmp_file.write(dummy_data)
tmp_file.flush()
yield tmp_file.name

for example in dataset.train:
inputs = example.inputs()
self.assertIsNotNone(inputs)
self.assertIn("content", inputs)
self.assertIn("question", inputs)
self.assertEqual(set(example._input_keys), {"content", "question"})

@pytest.mark.extra
def test_input_keys(csv_file):
dataset = CSVDataset(csv_file, input_keys=["content", "question"])
assert dataset.train is not None

if __name__ == "__main__":
unittest.main()
for example in dataset.train:
inputs = example.inputs()
assert inputs is not None
assert "content" in inputs
assert "question" in inputs
assert set(example._input_keys) == {"content", "question"}
4 changes: 3 additions & 1 deletion tests/evaluate/test_evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
import threading
from unittest.mock import patch

import pandas as pd
import pytest

import dspy
Expand Down Expand Up @@ -55,7 +54,9 @@ def test_evaluate_call():
assert score == 100.0


@pytest.mark.extra
def test_construct_result_df():
import pandas as pd
devset = [new_example("What is 1+1?", "2"), new_example("What is 2+2?", "4")]
ev = Evaluate(
devset=devset,
Expand Down Expand Up @@ -145,6 +146,7 @@ def test_evaluate_call_bad():
assert score == 0.0


@pytest.mark.extra
@pytest.mark.parametrize(
"program_with_example",
[
Expand Down
1 change: 1 addition & 0 deletions tests/primitives/test_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ def test_save_and_load_with_json(tmp_path):
assert new_model.predict.demos[0] == model.predict.demos[0].toDict()


@pytest.mark.extra
def test_save_and_load_with_pkl(tmp_path):
import datetime

Expand Down
2 changes: 2 additions & 0 deletions tests/utils/test_saving.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import logging
import pytest
from unittest.mock import patch

import dspy
Expand Down Expand Up @@ -55,6 +56,7 @@ class MySignature(dspy.Signature):
assert predict.signature == loaded_predict.signature


@pytest.mark.extra
def test_save_compiled_model(tmp_path):
predict = dspy.Predict("question->answer")
dspy.settings.configure(lm=DummyLM([{"answer": "blue"}, {"answer": "white"}] * 10))
Expand Down
Loading