Azpype 🚀 [beta]

A Python wrapper for AzCopy that feels native and gets out of your way.

Why Azpype?

Performance: AzCopy, written in Go, significantly outperforms Python's Azure SDK for bulk transfers. Go's goroutines provide true parallelism for file I/O and network operations, while Python's GIL limits concurrency. For large-scale transfers, AzCopy can be 5-10x faster.

Python Integration: But switching between Python and bash scripts breaks your workflow. Azpype solves this by wrapping AzCopy in a native Python interface. Now you can:

Write pure Python scripts with data processing before and after transfers
Capture and parse output programmatically
Handle errors with try/except blocks
Integrate with your existing Python data pipeline

Additional Benefits:

Zero-configuration setup - Bundles the right AzCopy binary for your platform
Smart defaults - YAML config for common settings, override with kwargs when needed
Rich logging - Structured logs with loguru, daily rotation, and visual command output
Built-in validation - Checks auth, network, and paths before executing
Job management - List, resume, and recover failed transfers programmatically

Installation

pip install azpype

That's it. Azpype automatically:

Downloads the appropriate AzCopy binary (v10.18.1) for your platform
Creates a config directory at ~/.azpype/
Sets up a default configuration file

Quick Start

Basic Copy Operation

from azpype.commands.copy import Copy

# Upload a local directory to Azure Blob Storage
Copy(
    source="./data",
    destination="https://myaccount.blob.core.windows.net/mycontainer/"
).execute()

# Download from Azure to local
Copy(
    source="https://myaccount.blob.core.windows.net/mycontainer/data/",
    destination="./downloads"
).execute()

Working with Return Values

The execute() method returns an AzCopyStdoutParser object with parsed attributes - no manual string parsing needed!

# Execute returns a parsed object with useful attributes
result = Copy(
    source="./data",
    destination="https://myaccount.blob.core.windows.net/mycontainer/"
).execute()

# Access structured data directly
print(f"Job ID: {result.job_id}")
print(f"Files transferred: {result.number_of_file_transfers_completed}")
print(f"Files skipped: {result.number_of_file_transfers_skipped}")
print(f"Bytes transferred: {result.total_bytes_transferred}")
print(f"Elapsed time: {result.elapsed_time} minutes")
print(f"Final status: {result.final_job_status}")

# Use exit code for flow control
if result.exit_code == 0:
    print("Transfer successful!")
else:
    print(f"Transfer failed: {result.stdout}")

Available Attributes

The parser automatically extracts these attributes from AzCopy output:

Attribute	Type	Description
`exit_code`	int	Command exit code (0 = success)
`job_id`	str	Unique job identifier for resuming
`elapsed_time`	float	Transfer duration in minutes
`final_job_status`	str	Status like "Completed", "CompletedWithSkipped", "Failed"
`number_of_file_transfers`	int	Total files attempted
`number_of_file_transfers_completed`	int	Successfully transferred files
`number_of_file_transfers_skipped`	int	Files skipped (already exist, etc.)
`number_of_file_transfers_failed`	int	Failed file transfers
`total_bytes_transferred`	int	Total data transferred in bytes
`total_number_of_transfers`	int	Total transfer operations
`stdout`	str	Raw command output if needed
`raw_stdout`	str	Unprocessed output with ANSI codes

Real-World Example: Pipeline Integration

def smart_sync_with_monitoring(local_path, remote_path):
    """
    Sync data and monitor transfer metrics
    """
    result = Copy(
        source=local_path,
        destination=remote_path,
        overwrite="ifSourceNewer",
        recursive=True
    ).execute()
    
    # Make decisions based on parsed results
    if result.exit_code != 0:
        raise Exception(f"Transfer failed: {result.final_job_status}")
    
    if result.number_of_file_transfers_failed > 0:
        print(f"Warning: {result.number_of_file_transfers_failed} files failed")
        # Could trigger retry logic here
    
    if result.number_of_file_transfers_skipped == result.number_of_file_transfers:
        print("All files already up-to-date")
        return "NO_CHANGES"
    
    # Report transfer metrics
    gb_transferred = result.total_bytes_transferred / (1024**3)
    transfer_rate = gb_transferred / (result.elapsed_time / 60)  # GB/hour
    
    print(f"Transferred {gb_transferred:.2f} GB at {transfer_rate:.2f} GB/hour")
    print(f"Completed: {result.number_of_file_transfers_completed} files")
    
    return result.job_id  # Return for potential resume operations

Authentication

Service Principal (Recommended)

Set these environment variables:

import os

os.environ["AZCOPY_TENANT_ID"] = "your-tenant-id"
os.environ["AZCOPY_SPA_APPLICATION_ID"] = "your-app-id"  
os.environ["AZCOPY_SPA_CLIENT_SECRET"] = "your-secret"
os.environ["AZCOPY_AUTO_LOGIN_TYPE"] = "SPN"

Or use a .env file:

# .env
AZCOPY_TENANT_ID=your-tenant-id
AZCOPY_SPA_APPLICATION_ID=your-app-id
AZCOPY_SPA_CLIENT_SECRET=your-secret
AZCOPY_AUTO_LOGIN_TYPE=SPN

from dotenv import load_dotenv
load_dotenv()

from azpype.commands.copy import Copy
Copy(source, destination).execute()

SAS Token

Pass the token directly (without the leading ?):

Copy(
    source="./data",
    destination="https://myaccount.blob.core.windows.net/mycontainer/",
    sas_token="sv=2021-12-02&ss=b&srt=sco&sp=rwdlacyx..."
).execute()

Configuration System

Azpype uses a two-level configuration system:

1. YAML Config File (Defaults)

Located at ~/.azpype/copy_config.yaml:

# Overwrite strategy at destination
overwrite: 'ifSourceNewer'  # Options: 'true', 'false', 'prompt', 'ifSourceNewer'

# Recursive copy for directories
recursive: true

# Create MD5 hashes during upload
put-md5: true

# Number of parallel transfers
concurrency: 16

2. Runtime Overrides (kwargs)

Override any config value at runtime:

Copy(
    source="./data",
    destination="https://...",
    overwrite="true",           # Override YAML setting
    concurrency=32,              # Increase parallelism
    dry_run=True,               # Test without copying
    exclude_pattern="*.tmp"     # Add exclusion pattern
).execute()

Common Usage Patterns

Upload with Patterns

# Upload only Python files
Copy(
    source="./project",
    destination="https://myaccount.blob.core.windows.net/code/",
    include_pattern="*.py",
    recursive=True
).execute()

# Exclude temporary files
Copy(
    source="./data",
    destination="https://myaccount.blob.core.windows.net/backup/",
    exclude_pattern="*.tmp;*.log;*.cache",
    recursive=True
).execute()

Sync with Overwrite Control

# Only upload newer files
Copy(
    source="./local-data",
    destination="https://myaccount.blob.core.windows.net/data/",
    overwrite="ifSourceNewer",
    recursive=True
).execute()

# Never overwrite existing files
Copy(
    source="./archive",
    destination="https://myaccount.blob.core.windows.net/archive/",
    overwrite="false"
).execute()

Dry Run Testing

# See what would be copied without actually transferring
Copy(
    source="./large-dataset",
    destination="https://myaccount.blob.core.windows.net/datasets/",
    dry_run=True
).execute()

Job Management

Resume failed or cancelled transfers:

from azpype.commands.jobs import Jobs

jobs = Jobs()

# List all jobs
exit_code, output = jobs.list()

# Resume a specific job
jobs.resume(job_id="abc123-def456")

# Find and resume the last failed job
job_id = jobs.last_failed()
if job_id:
    jobs.resume(job_id=job_id)

# Auto-recover (find and resume last failed)
jobs.recover_last_failed()

Logging

Azpype provides rich logging with automatic rotation:

Location: ~/.azpype/azpype_YYYY-MM-DD.log
Rotation: Daily, with 7-day retention and gzip compression
Console output: Color-coded with progress indicators
Command details: Full command, exit codes, and stdout/stderr captured

Example log output:

2025-08-15 19:09:29 | INFO | COPY | Starting copy operation
2025-08-15 19:09:29 | INFO | COPY | ========== COMMAND EXECUTION ==========
2025-08-15 19:09:29 | INFO | COPY | Command: azcopy copy ./data https://...
2025-08-15 19:09:29 | INFO | COPY | Exit Code: 0
2025-08-15 19:09:29 | INFO | COPY | STDOUT:
2025-08-15 19:09:29 | INFO | COPY |   Job abc123 has started
2025-08-15 19:09:29 | INFO | COPY |   100.0%, 10 Done, 0 Failed, 0 Pending

Available Options

Common options for the Copy command:

Option	Type	Description
`overwrite`	str	How to handle existing files: 'true', 'false', 'prompt', 'ifSourceNewer'
`recursive`	bool	Include subdirectories
`include_pattern`	str	Include only matching files (wildcards supported)
`exclude_pattern`	str	Exclude matching files (wildcards supported)
`dry_run`	bool	Preview what would be copied without transferring
`concurrency`	int	Number of parallel transfers
`block_size_mb`	float	Block size for large files (in MiB)
`put_md5`	bool	Create MD5 hashes during upload
`check_length`	bool	Verify file sizes after transfer
`as_subdir`	bool	Place folder sources as subdirectories

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.github/workflows		.github/workflows
azpype		azpype
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTICE.txt		NOTICE.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azpype 🚀 [beta]

Why Azpype?

Installation

Quick Start

Basic Copy Operation

Working with Return Values

Available Attributes

Real-World Example: Pipeline Integration

Authentication

Service Principal (Recommended)

SAS Token

Configuration System

1. YAML Config File (Defaults)

2. Runtime Overrides (kwargs)

Common Usage Patterns

Upload with Patterns

Sync with Overwrite Control

Dry Run Testing

Job Management

Logging

Available Options

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

yusuf-jkhan1/azpype

Folders and files

Latest commit

History

Repository files navigation

Azpype 🚀 [beta]

Why Azpype?

Installation

Quick Start

Basic Copy Operation

Working with Return Values

Available Attributes

Real-World Example: Pipeline Integration

Authentication

Service Principal (Recommended)

SAS Token

Configuration System

1. YAML Config File (Defaults)

2. Runtime Overrides (kwargs)

Common Usage Patterns

Upload with Patterns

Sync with Overwrite Control

Dry Run Testing

Job Management

Logging

Available Options

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages