SciDER: Scientific Data-centric End-to-end Researcher

Installation

You can install the project using pip:

# from git
pip install git+https://github.com/leonardodalinky/SciDER
# locally
pip install -e .

Example Usage:

from scider.default.models import register_gemini_medium_high_models
from scider.workflows import run_full_workflow

# 1. Register the models you want to use
register_gemini_medium_high_models()
# 2. Run the full workflow
wf = run_full_workflow(
    data_path="/path/to/data/",
    workspace_path="/path/to/workspace/",
    user_query="Discover insights about RAG",
)
# 3. The final state after the workflow
print(wf.final_summary)

Workflows

SciDER provides seven workflows in scider.workflows:

Workflow	Description
`IdeationWorkflow`	Generate research ideas from literature search.
`DataWorkflow`	Analyze a dataset and produce a structured summary.
`HypoDataWorkflow`	Generate synthetic data from a feature description, then analyze it.
`ExperimentWorkflow`	Implement and run an experiment given a data summary.
`WritingWorkflow`	Turn SciDER outputs (data summary, experiment log, ideas) into a venue-formatted LaTeX/PDF paper.
`FullWorkflow`	Data analysis -> experiment execution -> (optional) paper writing.
`FullWorkflowWithIdeation`	Ideation -> (optional) data analysis -> (optional) experiment -> (optional) paper writing. Each phase can be skipped via flags.

Each workflow has a class form (FooWorkflow) and a convenience function (run_foo_workflow).

Paper Writing & Templates

The WritingWorkflow (and the optional paper-writing phase of the full workflows) generates a publication-ready LaTeX paper. SciDER ships 7 venue templates — NeurIPS, ACL, ICML, ICLR, AAAI, IEEE, and ACM. The official style files (.sty/.bst/.cls) are auto-downloaded and cached on first use, and matching \usepackage lines are activated automatically. Select a template in the Web UI or pass paper_template_dir_path to the workflow.

Configuration

The project is configured using environment variables. You can set these variables in a .env file at the root of the project. A template .env.template is provided for reference. You can also set them directly in your shell or terminal session.

Model catalog

SciDER uses a unified model catalog so you can mix providers per agent role. model_settings/catalog.yaml is the single source of truth for every model SciDER knows about (provider, LiteLLM id, capabilities, required env vars). model_settings/role_defaults.yaml then assigns a model to each role — e.g. Claude for experiment, Gemini for data — with inline param overrides:

defaults:
  experiment: claude-opus-4-6[reasoning_effort=medium]
  data: gemini-2.5-pro[reasoning_effort=medium]

Provide any combination of provider keys (ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY, ...) in .env. Frontend selections on the Settings page override these defaults.

Web UI

The web UI is a Streamlit application. It supports:

Live token/cost usage tracking (updated during runs and pauses)
Cancel and pause controls for in-progress workflows
Test-API-connection buttons on the Settings page
Evolutionary idea search toggle
Venue paper-template picker for the full workflow

Run locally

From the project root:

bash streamlit-client/run.sh

Or manually:

uv sync --extra streamlit
uv run python -m streamlit run streamlit-client/app.py --server.port 7860

Then open http://localhost:7860. On first launch a Settings page appears for configuring providers, API keys, and per-role model assignments.

Run with Docker

Deploy it using the Dockerfile at the project root.

Create a .env file at the project root (copy from .env.template) and fill in your API keys.
Build the image:

docker build -t scider:latest .

Run the container:

docker run -d \
  --name scider \
  -p 7860:7860 \
  --env-file .env \
  scider:latest

Access the UI at http://localhost:7860.

UI Example:


Select workflow type and Get started	Case study selection and Full workflow

Coding Backend

The experiment agent delegates code implementation to a coding subagent. Two backends are available, selectable via the CODING_AGENT_VERSION environment variable:

Backend	Value	Description
Claude Agent SDK (default)	`claude_sdk`	Delegates to Claude Agent SDK. Requires `pip install claude-agent-sdk` and `ANTHROPIC_API_KEY`.
Native	`native`	SciDER's built-in coding agent. Uses the `experiment_coding` model role with any LiteLLM-supported provider. No external dependencies. Pick this if you want a non-Claude provider (Gemini, GPT, etc.).

Set CODING_AGENT_VERSION in .env to switch backends.

Skills

Skills are markdown files with YAML frontmatter that inject domain-specific guidance into an agent's system prompt. Modeled after Claude Code, they can be either preloaded (full content injected) or on-demand (listed by name, loaded via the Skill tool when needed).

Discovery

On startup, SciDER walks up from the workspace directory to the filesystem root (plus ~), scanning .scider/skills/ at each level. Closer directories override identically-named skills from parents. Supported layouts:

.scider/skills/
├── my-skill/
│   ├── SKILL.md              # directory format — can bundle reference files
│   └── references/
│       └── usage.md
└── another.md                # single-file format

Frontmatter fields:

---
name: my-skill
description: One-line summary shown in the on-demand listing.
allowed_agents: [data, experiment]   # omit → available to all agents
preload_for: [data]                  # omit → on-demand only (must be called via Skill tool)
---

For directory-format skills, SciDER automatically injects Base directory for this skill: <absolute path> at the top of the content so the model can resolve relative file references (e.g. references/usage.md) via the Read tool.

Dynamic Registration

You can also register skills programmatically, overriding frontmatter fields:

from scider.core.skills import SkillRegistry

# Single directory
SkillRegistry.instance().register_skill_dirs(
    "path/to/my-skill",
    allow=["experiment", "native_coding"],
    preload_for=["experiment"],
)

# Multiple directories at once
SkillRegistry.instance().register_skill_dirs(
    ["path/to/skill-a", "path/to/skill-b"],
    allow=["data"],
)

allow restricts which agents see the skill; preload_for controls which agents get the full content in their system prompt. Both accept a Literal of the valid agent names (ideation, data, experiment, experiment_coding, native_coding, critic, paper_search, writing, approval) for static type checking. Passing None for either keeps the value from the SKILL.md frontmatter.

Development Guide

First, install pre-commit:

pip install pre-commit

Install pre-commit to format code:

pre-commit install

Then, copy .env.template to .env and fill in the necessary values.

Finally, run the following command to sync dependencies:

# for cpu
uv sync --extra cpu

# for mac
uv sync --extra mac

# for gpu
uv sync --extra cu128

# streamlit client
uv sync --extra streamlit

Run tests with:

uv run pytest tests/

Benchmarks

See BENCHMARKS for details on the benchmarks we have conducted to evaluate SciDER's performance.

Feedback and Contributions

We welcome contributions to improve SciDER. Please open an issue or submit a pull request on our GitHub repository.

Also, any feedback on the project is greatly appreciated. You can fill the feedback form to rate this app and help to improve the project.

Reference

If you find SciDER useful in your research, please consider citing our paper:

@article{lin2026scider,
  title={SciDER: Scientific Data-centric End-to-end Researcher},
  author={Lin, Ke and Aijaz, Owais and Lu, Yilin and Bhat, Shreyas and Guo, Xuehang and Oliva, Junier},
  journal={arXiv preprint arXiv:2603.01421},
  year={2026},
  doi={10.48550/arXiv.2603.01421}
}

Paper: arXiv:2603.01421 · DOI: 10.48550/arXiv.2603.01421

Name		Name	Last commit message	Last commit date
Latest commit History 243 Commits
.scider		.scider
.vscode		.vscode
bench_workflows		bench_workflows
benchmarks		benchmarks
case-study-memory		case-study-memory
data_generation		data_generation
docs		docs
evals		evals
model_settings		model_settings
scider		scider
static		static
streamlit-client		streamlit-client
tests		tests
train		train
.dockerignore		.dockerignore
.env.template		.env.template
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciDER: Scientific Data-centric End-to-end Researcher

Table of Contents

Installation

Workflows

Paper Writing & Templates

Configuration

Model catalog

Web UI

Run locally

Run with Docker

Coding Backend

Skills

Discovery

Dynamic Registration

Development Guide

Benchmarks

Feedback and Contributions

Reference

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SciDER: Scientific Data-centric End-to-end Researcher

Table of Contents

Installation

Workflows

Paper Writing & Templates

Configuration

Model catalog

Web UI

Run locally

Run with Docker

Coding Backend

Skills

Discovery

Dynamic Registration

Development Guide

Benchmarks

Feedback and Contributions

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages