HuggingFace Integration Guide

This guide covers the comprehensive HuggingFace Hub integration in pure-tokenizers, including loading tokenizers, authentication, caching, and troubleshooting.

Overview
Supported Models
Basic Usage
Authentication
Token Security Best Practices
Configuration Options
Cache System
Rate Limiting and Retry Logic
Migration from Python
Advanced Usage
Troubleshooting
Performance Considerations

Overview

Pure-tokenizers provides seamless integration with HuggingFace Hub, allowing you to load tokenizers directly from any HuggingFace model repository without manual downloads or file management.

Key Benefits

Zero Configuration: Automatically downloads and caches tokenizers
Offline Support: Cached tokenizers work without internet connection
Authentication: Access private and gated models
Version Control: Load specific model revisions (branches/tags/commits)
Smart Caching: Efficient storage with automatic cache management

Supported Models

Pure-tokenizers supports any model on HuggingFace Hub that includes a tokenizer.json file. This includes:

Popular Models

BERT Family: bert-base-uncased, bert-large-cased, distilbert-base-uncased
GPT Family: gpt2, gpt2-medium, gpt2-large, gpt2-xl
T5 Family: google/flan-t5-base, google/flan-t5-large
Sentence Transformers: sentence-transformers/all-MiniLM-L6-v2
RoBERTa: roberta-base, roberta-large
BART: facebook/bart-base, facebook/bart-large
Llama: meta-llama/Llama-2-7b-hf (requires authentication)

Model ID Format

Model IDs follow the pattern owner/model-name or just model-name for official models:

bert-base-uncased (official model)
google/flan-t5-base (organization model)
username/custom-model (user model)

Basic Usage

Loading a Public Model

package main

import (
    "fmt"
    "log"
    "github.com/amikos-tech/pure-tokenizers"
)

func main() {
    // Load a public model tokenizer
    tokenizer, err := tokenizers.FromHuggingFace("bert-base-uncased")
    if err != nil {
        log.Fatal(err)
    }
    defer tokenizer.Close()

    // Tokenize text
    text := "Hello, how are you?"
    encoding, err := tokenizer.Encode(text, tokenizers.WithAddSpecialTokens())
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Tokens: %v\n", encoding.Tokens)
    fmt.Printf("Token IDs: %v\n", encoding.IDs)
}

Loading with Options

tokenizer, err := tokenizers.FromHuggingFace("gpt2",
    tokenizers.WithHFRevision("main"),              // Specific branch/tag
    tokenizers.WithHFTimeout(30 * time.Second),     // Custom timeout
    tokenizers.WithHFCacheDir("/custom/cache"),     // Custom cache location
)

Authentication

Setting Up Authentication

HuggingFace uses tokens for authentication. Get your token from HuggingFace Settings.

Environment Variable (Recommended)

export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxx

The library automatically reads the HF_TOKEN environment variable.

Programmatic Authentication

tokenizer, err := tokenizers.FromHuggingFace("meta-llama/Llama-2-7b-hf",
    tokenizers.WithHFToken("hf_xxxxxxxxxxxxxxxxxxxxxxxxx"),
)

Accessing Gated Models

Some models (like Llama) require accepting terms on HuggingFace before access:

Visit the model page (e.g., https://huggingface.co/meta-llama/Llama-2-7b-hf)
Accept the license terms
Use your HF token for authentication

// Ensure you have accepted the model's terms on HuggingFace
tokenizer, err := tokenizers.FromHuggingFace("meta-llama/Llama-2-7b-hf",
    tokenizers.WithHFToken(os.Getenv("HF_TOKEN")),
)
if err != nil {
    // Common errors:
    // - "401 Unauthorized": Invalid token
    // - "403 Forbidden": Terms not accepted
    log.Fatal(err)
}

Token Security Best Practices

Secure Token Storage

Never commit tokens to version control. Follow these security practices:

1. Use Environment Variables

# .env file (add to .gitignore)
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxx

# Load in your application
import "github.com/joho/godotenv"

func init() {
    if err := godotenv.Load(); err != nil {
        log.Println("No .env file found")
    }
}

2. Use Secret Management Systems

// Example with AWS Secrets Manager
func getTokenFromSecretsManager() (string, error) {
    // Implementation depends on your cloud provider
    // AWS, GCP, Azure, Vault, etc.
    return secretsManager.GetSecret("hf-token")
}

// Use in production
token, err := getTokenFromSecretsManager()
if err != nil {
    return fmt.Errorf("failed to retrieve token: %w", err)
}
tokenizer, err := tokenizers.FromHuggingFace("model-id",
    tokenizers.WithHFToken(token),
)

3. Git Security

# Add to .gitignore
.env
*.token
*_token.txt
secrets/

# Use git-secrets to prevent accidental commits
git secrets --install
git secrets --register-aws  # Registers common token patterns

4. Token Rotation

// Implement token rotation for production systems
type TokenProvider interface {
    GetToken() (string, error)
    RefreshToken() error
}

type RotatingTokenProvider struct {
    mu            sync.RWMutex
    currentToken  string
    lastRotated   time.Time
    rotationPeriod time.Duration
}

func (p *RotatingTokenProvider) GetToken() (string, error) {
    p.mu.RLock()
    defer p.mu.RUnlock()

    if time.Since(p.lastRotated) > p.rotationPeriod {
        go p.RefreshToken() // Async refresh
    }

    return p.currentToken, nil
}

5. Audit and Monitoring

// Log token usage (without exposing the token)
func logTokenUsage(modelID string, tokenHash string) {
    log.Printf("Token %s used for model %s at %s",
        tokenHash[:8], // Only log first 8 chars
        modelID,
        time.Now().Format(time.RFC3339),
    )
}

Configuration Options

All Available Options

tokenizer, err := tokenizers.FromHuggingFace("model-id",
    // Authentication
    tokenizers.WithHFToken(token),

    // Version control
    tokenizers.WithHFRevision("main"),       // branch, tag, or commit hash

    // Cache management
    tokenizers.WithHFCacheDir("/path/to/cache"),
    tokenizers.WithHFOfflineMode(true),      // Use cached only, no downloads

    // Network configuration
    tokenizers.WithHFTimeout(60 * time.Second),

    // Library configuration (if needed)
    tokenizers.WithLibraryPath("/path/to/libtokenizers.so"),
)

Option Details

WithHFToken

Provides authentication for private or gated models.

tokenizers.WithHFToken("hf_xxxxxxxxx")

WithHFRevision

Loads a specific version of the model. Can be:

Branch name: "main", "development"
Tag: "v1.0.0"
Commit hash: "abc123def456"

tokenizers.WithHFRevision("v2.0.0")

WithHFCacheDir

Overrides the default cache directory.

tokenizers.WithHFCacheDir("/custom/cache/path")

WithHFOfflineMode

Forces offline mode - only uses cached tokenizers, no network requests.

tokenizers.WithHFOfflineMode(true)

WithHFTimeout

Sets custom timeout for downloads (default: 30 seconds).

tokenizers.WithHFTimeout(60 * time.Second)

Cache System

Cache Locations

Tokenizers are cached in platform-specific directories for optimal performance:

Default Locations

macOS: ~/Library/Caches/tokenizers/lib/hf/models/
Linux: ~/.cache/tokenizers/lib/hf/models/ or $XDG_CACHE_HOME/tokenizers/lib/hf/models/
Windows: %APPDATA%/tokenizers/lib/hf/models/

Cache Permissions

HuggingFace cache directories are created with 0750 permissions.
Cached tokenizer.json files are written with 0600 (owner read/write only).

Cache Structure

~/.cache/tokenizers/lib/hf/models/
├── bert-base-uncased/
│   ├── main/
│   │   └── tokenizer.json
│   └── metadata.json
├── gpt2/
│   ├── main/
│   │   └── tokenizer.json
│   └── metadata.json
└── meta-llama--Llama-2-7b-hf/  # Note: "/" replaced with "--"
    ├── main/
    │   └── tokenizer.json
    └── metadata.json

Cache Management

Query Cache Information

// Get cache info for a specific model
info, err := tokenizers.GetHFCacheInfo("bert-base-uncased")
if err != nil {
    log.Fatal(err)
}

// info contains:
// - "path": Full path to cached tokenizer
// - "size": File size in bytes
// - "modified": Last modification time
// - "revision": Cached revision
fmt.Printf("Cache info: %+v\n", info)

Clear Cache

// Clear cache for specific model
if err := tokenizers.ClearHFModelCache("bert-base-uncased"); err != nil {
    log.Printf("Failed to clear model cache: %v", err)
    // Handle error appropriately - cache clearing is often non-critical
}

// Clear entire HuggingFace cache
if err := tokenizers.ClearHFCache(); err != nil {
    log.Printf("Failed to clear HuggingFace cache: %v", err)
    // Continue execution - cache operations are typically non-blocking
}

Offline Mode

Use cached tokenizers without network access:

// This will return an error if the model is not already cached
tokenizer, err := tokenizers.FromHuggingFace("bert-base-uncased",
    tokenizers.WithHFOfflineMode(true),
)

HuggingFace Hub Cache Integration

Pure-tokenizers can also read from the standard HuggingFace cache if present:

// Set HF_HOME to use existing HuggingFace cache
if err := os.Setenv("HF_HOME", "/path/to/huggingface/cache"); err != nil {
    log.Printf("Warning: Failed to set HF_HOME: %v", err)
}

// The library will check this cache before downloading
tokenizer, err := tokenizers.FromHuggingFace("model-id")
if err != nil {
    return fmt.Errorf("failed to load tokenizer: %w", err)
}
defer tokenizer.Close()

Cache Management Strategies for Production

Cache Size Management

import (
    "path/filepath"
    "os"
    "time"
)

// GetCacheSize calculates total cache size
func GetCacheSize(cacheDir string) (int64, error) {
    var size int64
    err := filepath.Walk(cacheDir, func(_ string, info os.FileInfo, err error) error {
        if err != nil {
            return err
        }
        if !info.IsDir() {
            size += info.Size()
        }
        return nil
    })
    return size, err
}

// CleanOldCache removes models not accessed in the last N days
func CleanOldCache(cacheDir string, maxAgeDays int) error {
    cutoff := time.Now().AddDate(0, 0, -maxAgeDays)

    return filepath.Walk(cacheDir, func(path string, info os.FileInfo, err error) error {
        if err != nil {
            return err
        }

        // Check tokenizer.json files
        if filepath.Base(path) == "tokenizer.json" && info.ModTime().Before(cutoff) {
            modelDir := filepath.Dir(filepath.Dir(path)) // Go up two levels
            log.Printf("Removing old cached model: %s", modelDir)
            return os.RemoveAll(modelDir)
        }
        return nil
    })
}

// Production cache management example
func ManageCacheInProduction() error {
    cacheDir := tokenizers.GetHFCacheDir()

    // Check cache size
    size, err := GetCacheSize(cacheDir)
    if err != nil {
        return fmt.Errorf("failed to get cache size: %w", err)
    }

    // If cache exceeds 10GB, clean models older than 30 days
    const maxCacheSize = 10 * 1024 * 1024 * 1024 // 10GB
    if size > maxCacheSize {
        if err := CleanOldCache(cacheDir, 30); err != nil {
            log.Printf("Cache cleanup failed: %v", err)
            // Don't fail the application for cache cleanup errors
        }
    }

    return nil
}

Cache Warming Strategy

// WarmCache pre-loads frequently used models during application startup
func WarmCache(models []string) {
    var wg sync.WaitGroup

    for _, modelID := range models {
        wg.Add(1)
        go func(model string) {
            defer wg.Done()

            // Try to load the model to ensure it's cached
            tok, err := tokenizers.FromHuggingFace(model)
            if err != nil {
                log.Printf("Failed to warm cache for %s: %v", model, err)
                return
            }
            tok.Close()
            log.Printf("Successfully cached %s", model)
        }(modelID)
    }

    wg.Wait()
}

// Use during application initialization
func init() {
    criticalModels := []string{
        "bert-base-uncased",
        "gpt2",
        "distilbert-base-uncased",
    }
    WarmCache(criticalModels)
}

Rate Limiting and Retry Logic

HuggingFace API Rate Limits

HuggingFace Hub implements rate limiting to ensure fair usage:

Anonymous requests: ~100 requests per hour
Authenticated requests: ~1000 requests per hour (varies by account type)
Rate limit headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset

Built-in Retry Strategy

Pure-tokenizers implements intelligent retry logic with exponential backoff:

// Default retry configuration
const (
    HFMaxRetries = 3                    // Maximum number of retry attempts
    HFRetryDelay = 1 * time.Second      // Base delay between retries
    HFMaxRetryAfterDelay = 5 * time.Minute // Maximum delay from Retry-After header
)

How Retry Logic Works

Exponential Backoff with Jitter

// The library automatically implements this retry strategy:
// Attempt 1: Immediate
// Attempt 2: 1s + jitter (0-250ms)
// Attempt 3: 2s + jitter (0-500ms)
// Attempt 4: 4s + jitter (0-1s)

Retry-After Header Handling

// When HuggingFace returns 429 (Too Many Requests), it may include a Retry-After header
// The library respects this header and waits accordingly:
// - Numeric value: delay in seconds
// - HTTP date: specific time to retry after
// - Maximum wait capped at 5 minutes to prevent abuse

Non-Retryable Errors The following errors are not retried:

401 Unauthorized - Invalid or missing token
403 Forbidden - No access to model
404 Not Found - Model doesn't exist

Custom Retry Implementation

For advanced use cases, implement custom retry logic:

type RetryConfig struct {
    MaxAttempts int
    BaseDelay   time.Duration
    MaxDelay    time.Duration
    Multiplier  float64
}

func LoadWithCustomRetry(modelID string, config RetryConfig) (*tokenizers.Tokenizer, error) {
    var lastErr error

    for attempt := 0; attempt < config.MaxAttempts; attempt++ {
        // Calculate delay with exponential backoff
        if attempt > 0 {
            delay := time.Duration(float64(config.BaseDelay) * math.Pow(config.Multiplier, float64(attempt-1)))
            if delay > config.MaxDelay {
                delay = config.MaxDelay
            }

            // Add jitter to prevent thundering herd
            jitter := time.Duration(rand.Float64() * float64(delay) * 0.1)
            time.Sleep(delay + jitter)

            log.Printf("Retry attempt %d/%d after %v", attempt+1, config.MaxAttempts, delay+jitter)
        }

        tokenizer, err := tokenizers.FromHuggingFace(modelID,
            tokenizers.WithHFTimeout(30*time.Second),
        )
        if err == nil {
            return tokenizer, nil
        }

        lastErr = err

        // Check if error is retryable
        if strings.Contains(err.Error(), "401") ||
           strings.Contains(err.Error(), "403") ||
           strings.Contains(err.Error(), "404") {
            return nil, err // Don't retry these errors
        }
    }

    return nil, fmt.Errorf("failed after %d attempts: %w", config.MaxAttempts, lastErr)
}

Monitoring Rate Limits

// Track rate limit usage in production
type RateLimitMonitor struct {
    mu          sync.Mutex
    requests    []time.Time
    windowSize  time.Duration
}

func (m *RateLimitMonitor) RecordRequest() {
    m.mu.Lock()
    defer m.mu.Unlock()

    now := time.Now()
    m.requests = append(m.requests, now)

    // Clean old requests outside the window
    cutoff := now.Add(-m.windowSize)
    i := 0
    for i < len(m.requests) && m.requests[i].Before(cutoff) {
        i++
    }
    m.requests = m.requests[i:]
}

func (m *RateLimitMonitor) GetRequestRate() float64 {
    m.mu.Lock()
    defer m.mu.Unlock()

    if len(m.requests) == 0 {
        return 0
    }

    elapsed := time.Since(m.requests[0])
    if elapsed.Seconds() == 0 {
        return float64(len(m.requests))
    }
    return float64(len(m.requests)) / elapsed.Seconds()
}

// Use in production
monitor := &RateLimitMonitor{
    windowSize: time.Hour,
}

// Before each request
monitor.RecordRequest()
if monitor.GetRequestRate() > 15 { // 15 requests per second threshold
    log.Printf("Warning: High request rate: %.2f req/s", monitor.GetRequestRate())
    time.Sleep(time.Second) // Throttle
}

Best Practices for Rate Limiting

Cache Aggressively: Reduce API calls by caching tokenizers locally
Batch Operations: Load multiple models in parallel when possible
Use Offline Mode: For production, pre-cache models and use offline mode
Implement Circuit Breakers: Prevent cascading failures

type CircuitBreaker struct {
    mu            sync.Mutex
    failures      int
    lastFailTime  time.Time
    state         string // "closed", "open", "half-open"
    threshold     int
    timeout       time.Duration
}

func (cb *CircuitBreaker) Call(fn func() error) error {
    cb.mu.Lock()
    defer cb.mu.Unlock()

    // Check circuit state
    if cb.state == "open" {
        if time.Since(cb.lastFailTime) > cb.timeout {
            cb.state = "half-open"
            cb.failures = 0
        } else {
            return fmt.Errorf("circuit breaker is open")
        }
    }

    // Execute function
    err := fn()
    if err != nil {
        cb.failures++
        cb.lastFailTime = time.Now()

        if cb.failures >= cb.threshold {
            cb.state = "open"
            return fmt.Errorf("circuit breaker opened after %d failures: %w", cb.failures, err)
        }
        return err
    }

    // Success - reset failures
    if cb.state == "half-open" {
        cb.state = "closed"
    }
    cb.failures = 0
    return nil
}

Migration from Python

Python (Transformers Library)

from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# With authentication
tokenizer = AutoTokenizer.from_pretrained(
    "private-model",
    token="hf_xxxxx"
)

# With specific revision
tokenizer = AutoTokenizer.from_pretrained(
    "model-id",
    revision="v2.0.0"
)

# Tokenize
tokens = tokenizer("Hello world", return_tensors="pt")

Go (Pure-Tokenizers)

import "github.com/amikos-tech/pure-tokenizers"

// Load tokenizer
tokenizer, err := tokenizers.FromHuggingFace("bert-base-uncased")

// With authentication
tokenizer, err := tokenizers.FromHuggingFace("private-model",
    tokenizers.WithHFToken("hf_xxxxx"),
)

// With specific revision
tokenizer, err := tokenizers.FromHuggingFace("model-id",
    tokenizers.WithHFRevision("v2.0.0"),
)

// Tokenize
encoding, err := tokenizer.Encode("Hello world",
    tokenizers.WithAddSpecialTokens(),
)

Key Differences

Error Handling: Go requires explicit error checking
Resource Management: Use defer tokenizer.Close() in Go
Options: Go uses functional options pattern
Return Format: Go returns structured Encoding type

Advanced Usage

Batch Processing

func processBatch(tokenizer *tokenizers.Tokenizer, texts []string) error {
    for _, text := range texts {
        encoding, err := tokenizer.Encode(text,
            tokenizers.WithAddSpecialTokens(),
            tokenizers.WithReturnAttentionMask(),
        )
        if err != nil {
            return err
        }

        // Process encoding
        processEncoding(encoding)
    }
    return nil
}

Custom Retry Logic

func loadWithRetry(modelID string, maxRetries int) (*tokenizers.Tokenizer, error) {
    var lastErr error

    for i := 0; i < maxRetries; i++ {
        tokenizer, err := tokenizers.FromHuggingFace(modelID,
            tokenizers.WithHFTimeout(30 * time.Second),
        )
        if err == nil {
            return tokenizer, nil
        }

        lastErr = err

        // Check if error is retryable
        if strings.Contains(err.Error(), "rate limit") {
            time.Sleep(time.Duration(i+1) * 5 * time.Second)
            continue
        }

        return nil, err
    }

    return nil, fmt.Errorf("failed after %d retries: %w", maxRetries, lastErr)
}

Preloading Models

// Preload models during initialization
func init() {
    models := []string{
        "bert-base-uncased",
        "gpt2",
        "distilbert-base-uncased",
    }

    for _, model := range models {
        go func(m string) {
            tok, err := tokenizers.FromHuggingFace(m)
            if err != nil {
                log.Printf("Failed to preload %s: %v", m, err)
                return
            }
            tok.Close()
            log.Printf("Preloaded %s", m)
        }(model)
    }
}

Troubleshooting

Common Issues and Solutions

1. Authentication Errors

Error: 401 Unauthorized

Solution: Check your HF token is valid
- Verify token at https://huggingface.co/settings/tokens
- Ensure token has read permissions
- Check token is correctly set in environment or code

Error: 403 Forbidden

Solution: Accept model terms on HuggingFace
- Visit the model page on HuggingFace
- Accept the license/terms
- Wait a few minutes for propagation

2. Network Issues

Error: timeout or connection refused

Solution: Check network and proxy settings
- Verify internet connectivity
- Check if behind corporate proxy
- Increase timeout with WithHFTimeout
- Use offline mode if model is cached

3. Cache Issues

Error: permission denied

Solution: Check cache directory permissions
- Verify write permissions to cache directory
- Use WithHFCacheDir to specify writable location
- Clear corrupted cache with ClearHFModelCache

4. Model Not Found

Error: 404 Not Found

Solution: Verify model ID and availability
- Check model exists on HuggingFace
- Verify correct model ID format (owner/model)
- Check if model is public or requires authentication

5. Rate Limiting

Error: 429 Too Many Requests

Solution: Handle rate limits
- Implement exponential backoff
- Cache models locally for reuse
- Consider using offline mode when possible
- Spread requests over time

Debug Tips

Enable Verbose Logging

import "log"

// Set debug logging
log.SetFlags(log.LstdFlags | log.Lshortfile)

// Log all operations
tokenizer, err := tokenizers.FromHuggingFace("model-id")
if err != nil {
    log.Printf("Failed to load tokenizer: %+v", err)
}

Check Cache Status

// Verify what's cached
info, err := tokenizers.GetHFCacheInfo("model-id")
if err != nil {
    log.Printf("Model not cached: %v", err)
} else {
    log.Printf("Model cached at: %s", info["path"])
}

Test Connectivity

// Test with a small, public model first
testTokenizer, err := tokenizers.FromHuggingFace("bert-base-uncased")
if err != nil {
    log.Fatal("Cannot connect to HuggingFace: ", err)
}
testTokenizer.Close()

Performance Considerations

Caching Strategy

Models are cached after first download
Subsequent loads are near-instantaneous
Cache is persistent across application restarts

Memory Management

// Always close tokenizers when done
tokenizer, err := tokenizers.FromHuggingFace("model-id")
if err != nil {
    return err
}
defer tokenizer.Close()  // Important: releases memory

Concurrent Usage

// Tokenizers are thread-safe for reading
var tokenizer *tokenizers.Tokenizer

func init() {
    var err error
    tokenizer, err = tokenizers.FromHuggingFace("bert-base-uncased")
    if err != nil {
        log.Fatal(err)
    }
}

// Safe to use from multiple goroutines
func processText(text string) (*tokenizers.Encoding, error) {
    return tokenizer.Encode(text)
}

Best Practices

Cache frequently used models: Load once, reuse many times
Use offline mode in production: Avoid network dependencies
Implement proper error handling: Network calls can fail
Set appropriate timeouts: Balance between reliability and speed
Clean up resources: Always use defer tokenizer.Close()

FilesExpand file tree

HUGGINGFACE.md

Latest commit

History