| name | research-publishing |
|---|---|
| description | Use when the user wants to prepare code for open-source release, create reproducible research artifacts, or structure a repository for publication. Triggers on phrases like "publish code", "open source release", "reproducibility", "research repository", "code release", or "prepare for publication". |
You are helping a researcher prepare their code and artifacts for public release alongside a paper submission.
Before any changes, audit the current state:
-
Sensitive content scan:
- API keys, tokens, credentials (grep for common patterns)
- Hardcoded paths specific to the researcher's machine
- Internal URLs or private infrastructure references
- Personal identifiable information in comments or data
-
Dependency audit:
- List all dependencies with pinned versions
- Identify any proprietary or restricted-license dependencies
- Check for abandoned/unmaintained dependencies
- Verify all dependencies are pip/conda installable
-
Code organization:
- Identify dead code, debugging artifacts, scratch files
- Find duplicated code that should be unified
- Check for overly complex code that can be simplified
A publishable research repository should have:
project/
README.md # Installation, usage, citation
LICENSE # Must have an explicit license
requirements.txt # or pyproject.toml with pinned deps
setup.py / setup.cfg # Package installation
src/ # Source code
scripts/ # Training, evaluation, inference scripts
configs/ # Configuration files
data/ # Sample data or download instructions
checkpoints/ # Download instructions (not actual weights)
results/ # Key result files referenced in paper
For each experiment in the paper:
- Configuration file exists and matches paper's hyperparameters
- Random seeds are set and documented
- Training command is documented end-to-end
- Evaluation command produces the reported numbers
- Data preprocessing steps are scripted (not manual)
- Hardware requirements are documented (GPU type, memory, time)
- Dependencies are version-pinned
A research README must include:
- Title + one-line description
- Paper link (arXiv, venue page)
- Visual (architecture diagram, key result figure, or demo GIF)
- Installation (step-by-step, tested on clean environment)
- Quick start (inference on a single example, < 5 commands)
- Training (full reproduction commands)
- Evaluation (reproduce paper numbers)
- Model zoo / checkpoints (download links with expected metrics)
- Citation (BibTeX block)
- License
Apply minimal, targeted cleanup:
- Remove debugging prints, commented-out code, scratch experiments
- Replace hardcoded paths with configurable paths (env vars or args)
- Add docstrings to public functions (not internal helpers)
- Ensure the main entry points are clearly documented
- Do NOT refactor working code for style — it adds risk for no benefit
Guide the user through license choice:
| License | Allows commercial use | Requires attribution | Copyleft |
|---|---|---|---|
| MIT | Yes | Yes | No |
| Apache 2.0 | Yes | Yes | No (patent grant) |
| GPL 3.0 | Yes | Yes | Yes (derivative works) |
| CC BY 4.0 | Yes | Yes | No (for non-code) |
| CC BY-NC 4.0 | No | Yes | No (for non-code) |
Default recommendation: MIT for code, CC BY 4.0 for datasets/models.
Before publishing:
- Clone into a fresh directory
- Follow README installation steps exactly
- Run quick start commands
- Run evaluation to verify numbers match paper
- Check that no sensitive information is in git history
Produce:
- Audit report: sensitive content found, dependency issues, dead code
- Action list: specific files to modify/remove/add
- README draft: following the structure above
- Reproducibility checklist: per-experiment verification status