Random helpers for the mskilab ecosystem - A collection of utility functions for bioinformatics analysis and system monitoring.
You can install the development version of jrtools from GitHub:
# Install devtools if you haven't already
if (!require("devtools")) install.packages("devtools")
# Install jrtools
devtools::install_github("jrafailov/jrtools")
Assess local memory usage on mskilab servers. This function provides a tree view of running processes and their memory consumption.
Parameters:
huntdown
: Optional username to filter results for a specific user
Returns:
- If
huntdown
is NULL: A data.table of all users and their total memory usage in GB - If
huntdown
is specified: A detailed data.table of that user's processes with PID, status, and memory usage
Examples:
library(jrtools)
# Get memory usage for all users
hunt()
# Get detailed process information for a specific user
hunt("username")
Parallelize concatenation of data.tables stored as RDS files. Useful for combining large datasets split across multiple files.
Parameters:
filepaths
: Vector of RDS file paths containing data.tablescores
: Number of cores to use for parallel processing (default: 1)
Returns: Combined data.table from all input files
Examples:
# Combine multiple RDS files containing data.tables
files <- c("data1.rds", "data2.rds", "data3.rds")
combined_data <- concat_file_paths(files, cores = 4)
Parse VCF files with SnpEff annotations, with special support for methylation analysis fields. This function processes VCF files and extracts mutation annotations, allelic depth information, and methylation-related metadata.
Key Features:
- Supports both standard VCF processing and methylation analysis workflows
- Handles multiple genotype formats (AD, AU/GU/CU/TU/TAR/TIR)
- Extracts SnpEff annotations including gene, transcript, and consequence information
- Supports filtering for coding variants only
- Includes special handling for methylation analysis fields (CpG, METH_PROB, V_HAT, etc.)
Generate random alphanumeric strings.
Parameters:
n
: Number of strings to generate (default: 1)length
: Length of each string (default: 12)
Returns: Vector of random strings
Robust wrapper around tryCatch
that works well with parallel processing functions.
Utility function for path normalization that handles special cases like /dev/null
.
data.table
: For efficient data manipulationps
: For process monitoringmagrittr
: For pipe operator support
Additional suggested packages for full functionality:
VariantAnnotation
: For VCF processingparallel
: For multicore processingstringr
: For string operations
A comprehensive collection of bash functions and aliases for managing SLURM jobs on HPC clusters. These tools enhance the standard SLURM commands with additional functionality for monitoring and managing computational workflows.
Key Features:
- Queue Monitoring: Enhanced
squeue
aliases with custom formatting and real-time watchingsqwatch
: Watch all user jobs with detailed formattingsqrun
: Monitor only running jobssqpend
: Monitor pending jobssqueuelab
: Display queue status for all lab members
- Job Management:
scancelgrep
: Cancel jobs matching a pattern with confirmationreassign_jobs
: Reassign jobs to different partitions (useful for Nextflow workflows)
- Job Inspection:
job_stderr
: Follow stderr output of running jobs in real-timejob_cwd
: Change to the working directory of a specific job
- Resource Monitoring:
- Enhanced
hunt
function for memory and CPU usage analysis - Per-user resource summaries with core-equivalent calculations
- Enhanced
Usage:
# Source the tools in your bash profile
source /path/to/jrtools/scripts/slurmtools
# Monitor your running jobs
sqrun
# Cancel all jobs matching a pattern
scancelgrep "my_analysis"
# Follow stderr of a specific job
job_stderr 12345
# Check memory usage across the cluster
hunt --mem
Curated VS Code settings and extensions optimized for bioinformatics and data science workflows, particularly on remote HPC systems.
Configuration Files:
settings.json
: Core editor settings with R/Python optimizationkeybindings.json
: Custom keyboard shortcuts for efficient codingvscode-extensions.json
: Recommended extensions list for bioinformaticsnyu___settings.json
: NYU-specific configuration settings
Key Extensions Included:
- Language Support: R, Python, MATLAB, LaTeX, Groovy
- Data Science: CSV editing, PDF viewing, debugging tools
- Development: GitHub integration, Copilot, GitLens
- Remote Work: SSH remote development tools
- Themes: Multiple dark themes optimized for long coding sessions
Setup:
# Copy settings to your VS Code config
cp vscode-config/settings.json ~/.config/Code/User/
cp vscode-config/keybindings.json ~/.config/Code/User/
# Install recommended extensions (requires VS Code CLI)
code --install-extension-list vscode-config/vscode-extensions.json
This package is particularly useful for:
- System Administration: Monitor memory usage and processes on shared computing servers
- Bioinformatics Pipelines: Process and combine large genomic datasets
- VCF Analysis: Parse mutation calls with rich annotation data
- Methylation Analysis: Special support for TAPS and other methylation sequencing workflows
- HPC Cluster Management: Streamlined SLURM job monitoring and management
- Remote Development: Optimized VS Code setup for bioinformatics workflows
MIT License
Johnathan Rafailov ([email protected])
This package is designed for internal use within the mskilab ecosystem, but contributions and suggestions are welcome. Please feel free to open issues or submit pull requests.