NCBI BLASTp Wrapper (R) — WIP

Small R pipeline that runs offline BLASTp pairwise alignments for curated pairs of protein sequences.

Project status: early setup / scaffold. Expect breaking changes.

Requirements

NCBI BLAST+ (tested with 2.12.0+)
R ≥ 4.0 with package: Biostrings
(Optional) RStudio

The original scaffold targeted BLAST+ 2.12.0+, R 4.0.3.
See library(Biostrings) in the script for package needs.

Repo layout

.
├─ ncbi\_blastp\_wrap.R # main pipeline
├─ prepare\_input.sh # builds ./data from *.fasta in repo root
├─ transcript\_type\_info.csv # mapping of pairs: name, prin, alt
└─ data/ # example multi-FASTA files

Input expectations

transcript_type_info.csv with columns:

name	prin	alt
ESRRB_1	ENST00000512784_domains.fasta	ENST00000505752_domains.fasta
ESRRB_2	ENST00000512784_domains.fasta	ENST00000644823_domains.fasta

./data/<name>.fasta multi-FASTA files whose sequence headers contain the transcript IDs used above (e.g., >zf-C4_1_ENST00000512784).

Quick start

Install BLAST+ and R deps (Biostrings).
Prepare data (either):

Put your multi-FASTA files under ./data, or
Place *.fasta in repo root and run:
```
bash prepare_input.sh
```

Configure ncbi_blastp_wrap.R if needed:

path_to_domain_files <- "data/" (default)

Run:

Rscript ncbi_blastp_wrap.R

What it does (pipeline)

Splits each multi-FASTA into per-sequence files and tags them as query or subject based on transcript_type_info.csv.

For each name, finds the matching query/subject pair and runs:

blastp -query <query.fasta> -subject <subject.fasta> -outfmt 0

Saves text outputs under ./alignment/ (folder name may appear as alingment/ in early versions).

Outputs

For each domain/family name, a folder is created containing:
- alignment/Alignment_<query>_<subject>_out.txt (pairwise BLASTp report, -outfmt 0).

Roadmap / known issues (early stage)

Define or replace open_input_files(); fix save_metafile() variable names/scope.
Normalize output folder to alignment/.
Add argument parsing (input dir, CSV path, outdir, -outfmt).
Add unit tests and CI for R/BLAST+ presence.
Example notebook / vignette.

License

TBD.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
README.md		README.md
ncbi_blastp_wrap.R		ncbi_blastp_wrap.R
prepare_input.sh		prepare_input.sh
transcript_type_info.csv		transcript_type_info.csv
tutorial_env.RData		tutorial_env.RData

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NCBI BLASTp Wrapper (R) — WIP

Requirements

Repo layout

Input expectations

Quick start

What it does (pipeline)

Outputs

Roadmap / known issues (early stage)

License

About

Uh oh!

Releases

Packages

Languages

shahnawazkcl/ncbi_blast_wrapper

Folders and files

Latest commit

History

Repository files navigation

NCBI BLASTp Wrapper (R) — WIP

Requirements

Repo layout

Input expectations

Quick start

What it does (pipeline)

Outputs

Roadmap / known issues (early stage)

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages