Skip to content

πŸš€ A lightweight and convenient application that downloads any GitHub repository as a single, well-structured text file.

License

Notifications You must be signed in to change notification settings

mezentsevs/github-repo-to-single-file

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GitHub repo to single file

About 'GitHub repo to single file'

This is a GitHub repository to single file downloader, written in and for educational and demonstrational purposes.

A lightweight and convenient application that downloads any GitHub repository as a single, well-structured text file. The output format is: === filename === followed by the complete file content.

Based on tech stack:

Getting Started

  1. Install:
git clone [repository-url]
cd /path/to/github-repo-to-single-file/
npm install
  1. Configuration (copy the .env.example file to .env and configure your settings):
cp .env.example .env
GITHUB_TOKEN=your_github_token_here (optional)

INCLUDE_PATTERNS=
EXCLUDE_PATTERNS=*.log,*lock*,*.tmp,*.temp,*.swp,*.swo,*.jpg,*.jpeg,*.png,*.gif,*.bmp,*.webp,*.svg,*.ico,*.mp4,*.avi,*.mov,*.mkv,*.mp3,*.wav,*.pdf,*.doc,*.docx,*.xls,*.xlsx,*.zip,*.tar,*.gz,*.rar
OUTPUT_DIR=./output
API_DELAY_MS=100

Important Notes

  • The target repository is NOT specified here. It is provided as a command-line argument when running the application.
  • The .env file should never be committed to version control.

About GitHub Token

The GITHUB_TOKEN is optional but recommended for two main reasons:

  1. Access to Private Repositories:

    • Without a token: Can only download PUBLIC repositories.
    • With a token (with 'repo' scope): Can download BOTH public and PRIVATE repositories.
  2. Higher API Rate Limits:

    Condition Requests per Hour Experience
    Without Token 60 requests Frequent blocks, slower downloads
    With Token 5000 requests Stable, faster, more reliable

Using a token allows you to decrease the API_DELAY_MS setting for faster downloads without hitting rate limits.

Additional Benefits:

  • Priority in API queue: Authenticated requests are processed faster.
  • Traceability: GitHub logs requests under your account.

When You DON'T Need a Token:

  • You are downloading only 1-2 small public repositories per day.
  • The repository has very few files (less than 60).
  • You don't mind longer delays between requests (API_DELAY_MS=1000+).

How to Create a Token (2 minutes):

  1. Go to: https://github.com/settings/tokens
  2. Click "Generate new token" β†’ "Generate new token (classic)"
  3. Set expiration (e.g., 30 days)
  4. Select scopes:
    • For PUBLIC repos only: Check public_repo
    • For PUBLIC & PRIVATE repos: Check repo
  5. Generate token and IMMEDIATELY copy it to your .env file.

Usage

The application provides two main commands for downloading repositories:

  1. BUILD AND RUN (for development or first use after changes):
npm start facebook/react

This command will: 1) build the TypeScript code, 2) run the downloader.

  1. FAST DOWNLOAD (for repeated use after initial build):
npm run download facebook/react

This command runs the already built application directly, skipping the build step for faster execution.

Both commands accept the same arguments:

npm start <repository> [branch] [output-file]
npm run download <repository> [branch] [output-file]
  • repository: Required. Must be in the format "owner/repo_name".
  • branch: Optional. The branch to download (defaults to "main").
  • output-file: Optional. The name for the output text file (defaults to "owner_repo_branch.txt").

Examples:

npm start facebook/react
npm run download facebook/react main
npm run download facebook/react main react_source_code.txt
npm start vuejs/core next
npm run download microsoft/TypeScript main typescript_code.txt

What the application does

  • Retrieves the file tree of the specified repository and branch via the GitHub API.
  • Applies the include and exclude pattern filters to the file list.
  • Downloads the content of all matching files.
  • Creates a single output file with a clear structure, saving it to the configured OUTPUT_DIR.
  • Displays download progress and a final summary.

Pattern System - Complete Guide

The application supports flexible pattern matching system with three pattern types:

  1. SIMPLE PATTERNS (file name matching):
  • *.log - Matches files ending with .log
  • lock - Matches files containing "lock" anywhere in name (package-lock.json, yarn.lock)
  • *.jpg - Matches files ending with .jpg
  1. GLOB PATTERNS (path matching with wildcards):
  • **/*.js - Matches .js files in ANY directory and subdirectory
  • src/**/* - Matches ALL files inside src/ and all its subdirectories
  • docs/*.md - Matches .md files in docs/ directory (not in subdirectories)
  • tests/**/*.test.js - Matches .test.js files in tests/ and all subdirectories
  1. EXACT PATTERNS (full path matching):
  • package.json - Matches exactly package.json in root
  • README.md - Matches exactly README.md in root

Pattern Priority and Logic

  • If INCLUDE_PATTERNS is empty β†’ ALL files are considered for inclusion
  • If INCLUDE_PATTERNS has patterns β†’ file must match AT LEAST ONE pattern to be included
  • If file matches ANY pattern in EXCLUDE_PATTERNS β†’ it's excluded (even if matches include patterns)
  • Include patterns are checked first, then exclude patterns

Common Pattern Examples

Example 1: Download only source code files

INCLUDE_PATTERNS=**/*.js,**/*.ts,**/*.jsx,**/*.tsx,**/*.py,**/*.java,**/*.go
EXCLUDE_PATTERNS=**/*.test.*,**/*.spec.*,**/node_modules/**,**/dist/**

Example 2: Download documentation only

INCLUDE_PATTERNS=**/*.md,**/*.txt,docs/**/*
EXCLUDE_PATTERNS=

Example 3: Exclude all media and binary files (default setup)

INCLUDE_PATTERNS=
EXCLUDE_PATTERNS=*.log,*lock*,*.tmp,*.temp,*.swp,*.swo,*.jpg,*.jpeg,*.png,*.gif,*.bmp,*.webp,*.svg,*.ico,*.mp4,*.avi,*.mov,*.mkv,*.mp3,*.wav,*.pdf,*.doc,*.docx,*.xls,*.xlsx,*.zip,*.tar,*.gz,*.rar

Example 4: Download specific project structure

INCLUDE_PATTERNS=src/**/*.js,src/**/*.ts,public/**/*,package.json,README.md
EXCLUDE_PATTERNS=**/*.test.js,**/*.spec.js,**/node_modules/**

Example 5: Complex filtering

INCLUDE_PATTERNS=src/**/*
EXCLUDE_PATTERNS=src/**/*.test.*,src/**/*.spec.*,src/**/test/**,src/**/*.min.*

Advanced Usage Examples

Example 1: Download React repository source files only

INCLUDE_PATTERNS=src/**/*.js,src/**/*.jsx,src/**/*.ts,src/**/*.tsx
EXCLUDE_PATTERNS=**/*.test.*,**/*.spec.*,**/node_modules/**,**/dist/**
Command: npm run download facebook/react

Example 2: Download Python project without tests and virtual environments

INCLUDE_PATTERNS=**/*.py
EXCLUDE_PATTERNS=**/*test*.py,**/tests/**,**/venv/**,**/.env**
Command: npm start django/django

Example 3: Download specific folder structure

INCLUDE_PATTERNS=lib/**/*,include/**/*,src/**/*.cpp,src/**/*.hpp,CMakeLists.txt,README.md
EXCLUDE_PATTERNS=**/test/**,**/examples/**,**/*.o,**/*.so
Command: npm run download openssl/openssl

Output File Structure

The generated file will have the following format:

GitHub Repository: owner/repo
Branch: main
Generated: 2024-01-15T10:30:00.000Z
Total Files: 42
================================================================================

=== src/index.js ===
Size: 1234 characters
------------------------------------------------------------
// Actual file content here...
...

=== README.md ===
Size: 567 characters
------------------------------------------------------------
# Project Title
...

Environment Variables (.env) Summary

  • GITHUB_TOKEN: Your GitHub Personal Access Token (optional).
  • INCLUDE_PATTERNS: Comma-separated list of patterns. If empty, all files are included (subject to exclude rules).
  • EXCLUDE_PATTERNS: Comma-separated list of patterns for files/directories to exclude.
  • OUTPUT_DIR: Directory where the output file will be saved (default: ./output).
  • API_DELAY_MS: Delay in milliseconds between consecutive API requests to avoid rate limiting (default: 100).

Pattern Tips

  • Use ** to match any directory level (including zero levels)
  • Use * to match any characters within a file/directory name
  • Patterns are case-sensitive on Linux/macOS
  • You can mix simple and glob patterns in the same list
  • Test patterns with simple examples before applying to large repositories

Notes

  • The application automatically builds after installation (postinstall script).
  • The application handles GitHub API rate limits and will automatically wait if necessary.
  • Files that fail to download will have a placeholder message in the output.
  • The output directory will be created if it doesn't exist.
  • Ensure your .env file is listed in your .gitignore.
  • Standard development artifacts (like .git, node_modules, dist, build, coverage) are excluded from the GitHub tree API response and therefore do not need to be filtered by this application.

Troubleshooting

  • If no files are downloaded: Check your INCLUDE_PATTERNS - if set, files must match at least one pattern.
  • If too many files are downloaded: Adjust your EXCLUDE_PATTERNS to be more specific.
  • If pattern doesn't work: Use simpler patterns and test incrementally.
  • If you get "Cannot find module" errors: Run npm run build to compile TypeScript code.

That's it! Thank you!

Screenshots

2025-12-25_11-00-39

License

The 'GitHub repo to single file' is open-sourced software licensed under the MIT license.

Third-Party Licenses

This project uses third-party software components. Their respective licenses can be found in the LICENSE-3rd-party.md file.

About

πŸš€ A lightweight and convenient application that downloads any GitHub repository as a single, well-structured text file.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published