Skip to content

Latest commit

 

History

History
686 lines (539 loc) · 27.5 KB

File metadata and controls

686 lines (539 loc) · 27.5 KB

R Programming Skills for Claude Code

A comprehensive collection of Claude Code skills for R programming, data science, and statistical computing. Transform Claude into an expert R data scientist with 26 production-ready skills achieving 100% detection accuracy, now including cutting-edge audio analysis, deep learning, and interactive visualization capabilities.

🎯 Overview

This repository contains 26 production-ready skills that enhance Claude Code's capabilities for complete data science workflows in R. From data wrangling to machine learning, reproducible research to interactive dashboards, audio bioacoustics to deep learning to interactive plotly visualizations, these skills provide expert guidance with perfect detection accuracy.

🏆 Quality Metrics

✅ 100% Recall - Never misses a relevant query
✅ 100% Precision - Zero false activations
✅ 26/26 Skills at 100% - Complete perfection
✅ 251+ Test Cases - All passing
✅ Bilingual Support - Portuguese + English
✅ 63,400+ Lines - Production-ready code & documentation

Proven Results: Improved from 48.2% to 100% recall through systematic optimization using the skillMaker pattern with bilingual triggers and language filters.

📦 Complete Skill Collection

Core Data Science (tidyverse, tidymodels, visualization)

  • r-datascience, tidyverse-expert, tidyverse-patterns, ggplot2, r-plotly ⭐ NEW, dm-relational

Publishing & Communication (reports, dashboards, presentations)

  • quarto

Specialized Analysis (ML, time series, text, Bayesian, feature engineering)

  • r-tidymodels, r-feature-engineering, r-timeseries, r-text-mining, r-bayes, learning-paradigms ⭐ NEW

Big Data & Distributed Computing 🔥 NEW

  • r-databricks-sparklyr - R with Databricks and Apache Spark for big data workflows

Interactive Visualization ⭐ NEW

  • r-plotly (13,938 lines) - Interactive plots, animations, 3D, maps, Shiny integration

Audio & Deep Learning ⭐ NEW SUITE

  • r-bioacoustics (5,667 lines), r-deeplearning (7,562 lines), r-audio-multiclass, keras3 (8,618 lines), r-tensorflow, torch-r (3,451 lines) 🔥 NEW

Advanced R (performance, OOP, packages, metaprogramming)

  • r-performance, r-oop, r-package-development, rlang-patterns

Development (testing, style, web apps)

  • tdd-workflow, r-style-guide, r-shiny

Meta (skill creation)

  • skillMaker

✨ Key Features

🎯 Perfect Detection

  • Auto-triggers on relevant code patterns
  • Bilingual (Portuguese + English) support
  • Strong language filters prevent false positives
  • 100% recall and precision on 228 test cases

📚 Comprehensive Coverage

  • Complete tidyverse ecosystem (dplyr, tidyr, purrr, stringr, forcats, lubridate)
  • Professional publishing with Quarto (reports, dashboards, presentations, websites, books)
  • Machine learning with tidymodels
  • Time series forecasting with fable/tsibble
  • Text mining and NLP with tidytext
  • Advanced topics (Bayesian, OOP, metaprogramming)

🔧 Production Ready

  • Tested and validated
  • Modular architecture
  • Rich examples and templates
  • Continuous improvement methodology

→ See complete documentation

📦 Available Skills (26 Total)

All skills achieve 100% recall and 100% precision on comprehensive test suites.

🎵 Audio Analysis & Deep Learning ⭐ NEW SUITE

  • r-bioacoustics - Expert bioacoustic analysis (5,667 lines)

    • 6 packages: tuneR, seewave, warbleR, bioacoustics, ohun, soundecology
    • 6 complete workflows: exploration, spectrograms, detection (4 methods), features (~160), ecoacoustic indices, PAM pipeline
    • Production templates for preprocessing, detection, and feature extraction
    • Comprehensive references for features, methods, and indices
    • Perfect for passive acoustic monitoring and species classification
  • r-deeplearning - Deep learning in R with torch/keras3 (7,562 lines)

    • Frameworks: torch, keras3, torchaudio, luz
    • All domains: computer vision, NLP, audio (emphasis), time series, tabular
    • Audio DL: CNN/CRNN architectures, SpecAugment, class imbalance (focal loss), continuous inference
    • 10 training recipes (manual loops, luz, keras3, multi-GPU, mixed precision, transfer learning)
    • 4 complete examples per domain with best practices
    • Based on 8,000+ lines of research from CRAN, academic papers, torch.mlverse.org
  • r-audio-multiclass - Multi-label audio classification

    • Multi-label for bioacoustics and ecological monitoring
    • BCEWithLogitsLoss, audio augmentation, time-frequency masking
    • Overlapping species detection in soundscapes
  • keras3 - Keras3-first deep learning in R (8,618 lines)

    • Multi-backend support: TensorFlow, JAX, PyTorch (switch dynamically!)
    • Comprehensive preprocessing: 50+ layers (audio, image, text, categorical)
    • Keras3-native audio: layer_mel_spectrogram(), layer_stft_spectrogram() (no torch!)
    • 30+ pretrained applications: ResNet, EfficientNet, MobileNet, ConvNeXt
    • Transfer learning patterns: freeze → train head → fine-tune workflow
    • Complete examples: Functional API, custom layers/models, audio, NLP, deployment
    • Production templates: simple classifier, custom training loop
    • 14 files with executable R code and comprehensive documentation
  • r-tensorflow - TensorFlow for R infrastructure

    • TensorFlow backend setup, GPU configuration, SavedModel deployment
    • TensorFlow-specific features, deployment pipelines
    • Complements keras3 (handles infrastructure while keras3 handles models)
  • torch-r - PyTorch for R with maximum flexibility 🔥 NEW (3,451 lines)

    • Maximum control: custom training loops, gradient manipulation, dynamic graphs
    • Complete nn_module patterns: CNN, LSTM, GRU, attention, custom layers
    • Domain examples: audio (spectrograms), NLP (LSTM+attention), time series, vision
    • Advanced: custom losses, transfer learning, multi-GPU, performance optimization
    • Full comparison: torch vs keras3 decision matrix (587 lines)
    • Production templates: ready-to-use training pipeline (365 lines)
    • 6 files with complete workflows and executable R code
    • Perfect for research, experimentation, and PyTorch ecosystem integration
  • learning-paradigms - ML paradigm selection guide

    • Self-supervised learning (SSL), few-shot learning (FSL), weak supervision
    • Transfer learning, meta-learning strategies
    • Decision trees for paradigm selection
    • Essential for data-limited scenarios (rare species, understudied ecosystems)

Core Data Science & Tidyverse

  • r-datascience - Complete data science orchestrator

    • Tidyverse + tidymodels workflows
    • Statistical modeling and ML best practices
    • 100% recall on all data science queries
  • tidyverse-expert - Comprehensive data manipulation

    • Complete coverage: dplyr, tidyr, purrr, stringr, forcats, lubridate
    • Advanced patterns and problem-solving
    • Bilingual triggers (PT + EN)
  • tidyverse-patterns - Modern dplyr 1.1+ patterns

    • Native pipe (|>), .by grouping, join_by()
    • Latest tidyverse features

Publishing & Communication ⭐ NEW

  • quarto - Professional document publishing with R
    • Reports (HTML, PDF, Word) with reproducible research
    • Interactive dashboards with value boxes and visualizations
    • RevealJS presentations with code and plots
    • Complete websites and blogs with R content
    • Books and manuscripts with computational content
    • Publishing workflows (Quarto Pub, GitHub Pages, Netlify)
    • 100% R integration (ggplot2, gt, DT, plotly, shiny)
    • Comprehensive templates, examples, and references (~7,500 lines)
    • Bilingual support with exclusions for Python/Julia

Machine Learning & Statistics

  • r-tidymodels - Advanced ML with tidymodels

    • Complete ML workflows (recipes, models, tuning)
    • Gradient boosting, random forests, ensembles
    • 100% recall on ML queries
  • r-feature-engineering - Strategic feature engineering

    • Categorical encoding methods (dummy, likelihood, embeddings, hashing)
    • Numeric transformations (Box-Cox, Yeo-Johnson, PCA, splines)
    • Interaction detection (4 systematic approaches)
    • Feature selection (filter, wrapper, embedded methods)
    • Missing data strategies (MCAR/MAR/MNAR handling)
    • Based on "Feature Engineering and Selection" by Kuhn & Johnson
    • 11 files, 6,568 lines of strategic guidance
    • Bilingual support with quantified case studies
  • r-bayes - Bayesian inference with brms

    • Multilevel models, priors, diagnostics
    • Strong filter: ONLY R (NOT PyMC3, Pyro, Julia)

Specialized Analysis

  • r-timeseries - Time series forecasting

    • ARIMA, ETS, Prophet with fable/tsibble
    • Strong filter: ONLY R (NOT Python statsmodels)
  • r-text-mining - Text mining and NLP

    • Sentiment analysis, topic modeling, classification
    • Strong filter: ONLY R (NOT spaCy, NLTK)
  • r-databricks-sparklyr - R with Databricks and Apache Spark 🔥 NEW

    • Big data processing with sparklyr on Databricks platform
    • Distributed data manipulation with dplyr → Spark SQL translation (dbplyr)
    • Spark MLlib machine learning (ml_* functions: regression, classification, clustering)
    • Delta Lake operations (ACID transactions, time travel, optimization)
    • Production deployment patterns (notebooks, jobs, scheduling)
    • Performance optimization (partitioning, broadcast joins, caching)
    • Complete workflows: ETL pipelines, ML at scale, real-time streaming
    • 4 comprehensive references (platform, API, translation, advanced topics)
    • Strong filter: ONLY R + Spark (NOT Python/PySpark, Scala Spark)
    • Perfect complement to r-datascience (distributed vs local data)
  • ggplot2 - Expert static data visualization

    • Complete ggplot2 reference with examples
  • r-plotly - Interactive data visualization ⭐ NEW

    • 50+ chart types: scatter, line, bar, heatmap, 3D, maps, animations
    • Complete interactivity: hover tooltips, zoom, pan, click, brush
    • Shiny integration: event handling, proxy updates, linked plots
    • Frame-based animations with controls
    • 100+ executable examples + 30+ copy-paste templates
    • ggplotly() conversion from ggplot2
    • Performance optimization for large datasets
  • dm-relational - Relational data modeling

    • Multi-table models with primary/foreign keys

Advanced R Programming

  • r-performance - Performance optimization

    • Profiling, benchmarking, vectorization
  • r-oop - Object-oriented programming

    • S7, S3, S4, vctrs systems
  • r-package-development - Package development

    • devtools, usethis, roxygen2, testthat
  • rlang-patterns - Metaprogramming

    • Tidy evaluation, data-masking, injection

Development Tools

Meta

🚀 Installation

Project-Level Installation

Clone this repository into your R project:

cd your-r-project/
git clone https://github.com/yourusername/claude-r-skills.git .claude/skills

Or add as a git submodule:

git submodule add https://github.com/yourusername/claude-r-skills.git .claude/skills

System-Wide Installation

Install skills globally for use across all projects:

# Clone the repository
git clone https://github.com/yourusername/claude-r-skills.git

# Copy skills to Claude's system directory
cp -r claude-r-skills/.claude/skills/* ~/.claude/skills/

📖 Usage

Manual Invocation

Most skills can be invoked directly using slash commands:

/r-style-guide     # Get R style guidance
/ggplot2           # ggplot2 visualization help
/tdd-workflow      # Start test-driven development
/skillMaker        # Create a new skill

Automatic Triggering

Skills automatically activate based on context:

  • Import detection: Skills trigger when relevant packages are imported
  • File patterns: Activated by file types (.R, .Rmd, tests/, etc.)
  • Keywords: Mentioning specific terms (e.g., "ggplot", "shiny app", "package development")
  • Code patterns: Detecting tidyverse pipes, ggplot layers, test files, etc.

Example Workflows

Creating visualizations:

# Static plots with ggplot2
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point()  # Claude provides expert ggplot2 guidance

# Interactive plots with plotly ⭐ NEW
library(plotly)
plot_ly(mtcars, x = ~wt, y = ~mpg, type = "scatter", mode = "markers")
# Claude provides interactive visualization patterns

# Convert ggplot2 to interactive
ggplotly(ggplot_object)  # Best of both worlds!

Developing an R package:

/r-package-development  # Manually invoke for comprehensive guidance

Building a Shiny app:

library(shiny)  # Auto-triggers r-shiny skill
# Claude provides reactive programming patterns, UI best practices, etc.

Machine learning with tidymodels:

library(tidymodels)  # Auto-triggers r-tidymodels skill

# Claude provides expert guidance on:
# - Data splitting and resampling
# - Feature engineering with recipes
# - Model specification and tuning
# - Hyperparameter optimization
# - Model deployment patterns

Time series forecasting:

library(fable)  # Auto-triggers r-timeseries skill
# Expert forecasting workflows with ARIMA, ETS, Prophet

Text mining and NLP:

library(tidytext)  # Auto-triggers r-text-mining skill
# Sentiment analysis, topic modeling, text classification

Data science workflows:

library(tidyverse)  # Auto-triggers r-datascience skill
# Complete data wrangling, visualization, and modeling guidance

Creating reproducible reports:

# Auto-triggers quarto skill when working with .qmd files
# or mentioning "quarto report", "quarto dashboard"

# Create professional HTML/PDF reports with R analysis
# Build interactive dashboards with value boxes
# Make RevealJS presentations with live code
# Publish to Quarto Pub, GitHub Pages, or Netlify

Interactive data visualization: ⭐ NEW

library(plotly)      # Auto-triggers r-plotly skill

# Create interactive plots with hover, zoom, pan
plot_ly(iris, x = ~Sepal.Length, y = ~Sepal.Width,
        color = ~Species, type = "scatter", mode = "markers")

# Frame-based animations
plot_ly(gapminder, x = ~gdpPercap, y = ~lifeExp,
        size = ~pop, color = ~continent,
        frame = ~year, type = "scatter", mode = "markers")

# Shiny integration with events
plotlyOutput("plot")  # In UI
renderPlotly({ plot_ly(...) })  # In server
observeEvent(event_data("plotly_click"), { ... })

# 3D visualizations
plot_ly(z = ~volcano, type = "surface")

# Convert ggplot2 to interactive
ggplotly(ggplot_object)

Audio analysis and bioacoustics: ⭐ NEW

library(tuneR)      # Auto-triggers r-bioacoustics skill
library(seewave)

# Extract MFCCs from bird recordings
# Detect events in continuous soundscape recordings
# Calculate ecoacoustic indices (ACI, ADI, AEI)
# Complete PAM (Passive Acoustic Monitoring) workflows

Deep learning for audio: ⭐ NEW

library(torch)      # Auto-triggers r-deeplearning skill
library(torchaudio)

# Train CNN/CRNN on spectrograms
# Handle class imbalance with focal loss
# Audio augmentation (SpecAugment, mixup)
# Inference on continuous audio streams
# Transfer learning and few-shot learning

PyTorch for R with maximum control: 🔥 NEW

library(torch)      # Auto-triggers torch-r skill

# Custom training loop with full control
model <- nn_module(
  "MyNet",
  initialize = function(input_dim, hidden_dim, output_dim) {
    self$fc1 <- nn_linear(input_dim, hidden_dim)
    self$fc2 <- nn_linear(hidden_dim, output_dim)
  },
  forward = function(x) {
    x |> self$fc1() |> nn_relu() |> self$fc2()
  }
)

# Explicit training loop
for (epoch in 1:num_epochs) {
  optimizer$zero_grad()
  output <- model(batch$x)
  loss <- criterion(output, batch$y)
  loss$backward()
  nn_utils_clip_grad_norm_(model$parameters, max_norm = 1.0)
  optimizer$step()
}

# Perfect for: research, custom architectures, gradient manipulation
# Complete examples: audio CNN, LSTM+attention, custom losses
# Comparison guide: torch vs keras3 decision matrix

Big data with R + Databricks + Spark: 🔥 NEW

library(sparklyr)   # Auto-triggers r-databricks-sparklyr skill
library(dplyr)

# Connect to Databricks
sc <- spark_connect(method = "databricks")

# Process billions of rows with dplyr syntax
result <- spark_read_table(sc, "huge_table") %>%
  filter(date >= "2024-01-01") %>%
  group_by(category, region) %>%
  summarize(
    total_sales = sum(amount),
    avg_order = mean(amount)
  ) %>%
  collect()  # Brings only aggregated result to R

# Machine learning at scale
model <- ml_random_forest_classifier(
  training_data,
  label ~ feature1 + feature2 + feature3
)

# Delta Lake operations
spark_df %>%
  spark_write_delta("dbfs:/gold/output", mode = "overwrite")

# Perfect for: distributed computing, big data ETL, production ML pipelines
# Complete workflows: platform setup, data manipulation, ML at scale
# Complements r-datascience (distributed vs local analysis)

🏗️ Repository Structure

.claude/skills/          # 26 production-ready skills
├── r-datascience/       # Core data science orchestrator
├── tidyverse-expert/    # Complete tidyverse (dplyr, tidyr, purrr, stringr, forcats, lubridate)
├── tidyverse-patterns/  # Modern dplyr 1.1+ patterns
├── quarto/              # Professional publishing (reports, dashboards, presentations)
├── r-tidymodels/        # Machine learning with tidymodels
├── r-feature-engineering/ # Strategic feature engineering (6,568 lines)
├── r-bioacoustics/      # ⭐ NEW - Audio bioacoustic analysis (5,667 lines)
├── r-deeplearning/      # ⭐ NEW - Deep learning with torch/keras3 (7,562 lines)
├── r-audio-multiclass/  # ⭐ NEW - Multi-label audio classification
├── torch-r/             # 🔥 NEW - PyTorch for R with maximum flexibility (3,451 lines)
├── keras3/              # ⭐ NEW - Keras3 multi-backend deep learning (8,618 lines)
├── r-tensorflow/        # ⭐ NEW - TensorFlow infrastructure for R
├── learning-paradigms/  # ⭐ NEW - ML paradigm selection (SSL, few-shot, weak supervision)
├── r-timeseries/        # Time series forecasting (fable/tsibble)
├── r-text-mining/       # Text mining and NLP (tidytext)
├── r-bayes/             # Bayesian inference (brms)
├── r-databricks-sparklyr/ # 🔥 NEW - R + Databricks + Spark for big data (112KB knowledge)
├── ggplot2/             # Static data visualization
├── r-plotly/            # ⭐ NEW - Interactive data visualization (13,938 lines)
├── r-shiny/             # Shiny app development
├── r-performance/       # Performance optimization
├── r-oop/               # Object-oriented programming (S7/S3/S4)
├── r-package-development/ # Package development
├── r-style-guide/       # R style conventions
├── rlang-patterns/      # Metaprogramming and tidy evaluation
├── dm-relational/       # Relational data modeling
├── tdd-workflow/        # Test-driven development
└── skillMaker/          # Skill creation tool
README_AUDIO_SKILLS.md   # ⭐ Complete audio skills documentation

docs/                    # Comprehensive documentation
├── README.md            # Documentation index
├── guides/              # Improvement guides and patterns
│   ├── FIXING_SKILLS_GUIDE.md
│   ├── SKILLMAKER_PATTERN.md
│   └── MIGRATION_GUIDE.md
├── testing/             # Testing framework and results
│   ├── TESTING_STRATEGY.md
│   ├── TEST_RESULTS_ANALYSIS.md
│   └── INTERPRETING_RESULTS.md
├── sprints/             # Sprint execution reports
│   ├── SPRINT1-2_REPORT.md
│   ├── SPRINT3_REPORT.md
│   └── SPRINT4_REPORT.md
├── archive/             # Historical documents
└── test-reports/        # 40+ test execution reports

tests/                   # Validation system
├── validate-skills.sh   # Skill validation script
├── README.md           # Testing documentation
├── SPECIFICATION.md     # Technical specification
└── IMPLEMENTATION_REPORT.md

test_triggers.py         # Trigger detection test suite (251 test cases)

🛠️ Creating Custom Skills

Use the included skillMaker skill to create your own:

/skillMaker

Follow the guided workflow to generate production-ready skills. See CLAUDE.md for detailed skill development guidelines.

🤝 Contributing

Contributions are welcome! To add or improve a skill:

  1. Fork this repository
  2. Create a feature branch: git checkout -b feature/new-skill
  3. Use /skillMaker to generate your skill following best practices
  4. Test thoroughly in real-world scenarios
  5. Submit a pull request with clear description

Contribution Guidelines

  • Follow the skill creation conventions
  • Include concrete examples and trigger phrases
  • Test both manual and automatic invocation
  • Document all features in README.md
  • Keep SKILL.md under 500 lines (use supporting files for larger skills)

📋 Requirements

  • Claude Code CLI - Install from claude.ai/code
  • R (version 4.0+) - For R-specific skills
  • RStudio (optional) - Enhanced integration with R skills

📊 Quality Assurance

This repository includes comprehensive testing infrastructure:

  • test_triggers.py - 251 test cases validating trigger detection
  • tests/validate-skills.sh - YAML, file references, and R syntax validation
  • GitHub Actions - Automated testing on every push
  • 100% Success Rate - All 18 skills pass all validation tests

See docs/testing/ for complete testing documentation.

📄 License

MIT License - See LICENSE for details

🔗 Resources

Claude Code

R Data Science

Publishing & Reproducibility

Audio Analysis & Deep Learning ⭐ NEW

Specialized Topics

Community

🙏 Acknowledgments

  • Built with Claude Code by Anthropic
  • Achieved 100% perfection through systematic optimization using the skillMaker pattern
  • Knowledge extracted from authoritative R resources:
    • R for Data Science by Hadley Wickham & Garrett Grolemund
    • Feature Engineering and Selection by Max Kuhn & Kjell Johnson
    • Forecasting: Principles and Practice by Rob J Hyndman & George Athanasopoulos
    • Text Mining with R by Julia Silge & David Robinson
    • Tidy Modeling with R by Max Kuhn & Julia Silge
    • Modern Data Science with R by Benjamin S. Baumer, Daniel T. Kaplan & Nicholas J. Horton
    • An Introduction to Statistical Learning by Gareth James et al.
  • Audio & Deep Learning Suite built from:
    • 4 autonomous research agents analyzing 8,000+ lines from CRAN documentation
    • Academic papers on weak supervision, SSL, few-shot learning, BirdCLEF challenges
    • torch.mlverse.org, keras3.posit.co, Posit AI Blog
    • AnuraSet dataset methodology and bioacoustic best practices
  • Based on conventions from tidyverse, tidymodels, fable, tidytext, brms, shiny, torch, keras3, and other excellent R packages
  • Inspired by the R community's best practices and ecological monitoring workflows

📈 Project Metrics

Skills: 26 total, 26 perfect (100%)
Test Cases: 251+ passing (100%)
Lines of Code: 107,400+ across all skills
Recall: 100% (was 48.2% at baseline)
Precision: 100% (was 90.8% at baseline)
Languages: English + Portuguese
Validation: Automated (YAML, syntax, triggers)
Latest Addition: r-plotly - Interactive data visualization (13,938 lines) ⭐
Audio & Deep Learning Suite: 16,680 lines total
  - r-bioacoustics: 5,667 lines
  - r-deeplearning: 7,562 lines
  - torch-r: 3,451 lines
  - 4 autonomous agents, 8,000+ lines researched
Big Data Suite: 112KB knowledge base
  - r-databricks-sparklyr: Platform, API, translation, advanced topics
  - Complete Spark workflows: ETL, ML at scale, Delta Lake
Interactive Visualization: 13,938 lines ⭐ NEW
  - r-plotly: 50+ chart types, animations, Shiny integration
  - Complete interactivity: hover, zoom, click, brush
  - 100+ examples + 30+ templates

See docs/sprints/ for detailed improvement journey and README_AUDIO_SKILLS.md for complete audio suite documentation.

📮 Support

  • Documentation: See docs/ for comprehensive guides
  • Issues: Report bugs or request features via GitHub Issues
  • Updates: Watch this repository for new skills and improvements

🔗 Quick Links


Made with ❤️ for the R community | Powered by Claude Code | 100% Perfect Detection