This guide covers the comprehensive HuggingFace Hub integration in pure-tokenizers, including loading tokenizers, authentication, caching, and troubleshooting.
- Overview
- Supported Models
- Basic Usage
- Authentication
- Token Security Best Practices
- Configuration Options
- Cache System
- Rate Limiting and Retry Logic
- Migration from Python
- Advanced Usage
- Troubleshooting
- Performance Considerations
Pure-tokenizers provides seamless integration with HuggingFace Hub, allowing you to load tokenizers directly from any HuggingFace model repository without manual downloads or file management.
- Zero Configuration: Automatically downloads and caches tokenizers
- Offline Support: Cached tokenizers work without internet connection
- Authentication: Access private and gated models
- Version Control: Load specific model revisions (branches/tags/commits)
- Smart Caching: Efficient storage with automatic cache management
Pure-tokenizers supports any model on HuggingFace Hub that includes a tokenizer.json file. This includes:
- BERT Family:
bert-base-uncased,bert-large-cased,distilbert-base-uncased - GPT Family:
gpt2,gpt2-medium,gpt2-large,gpt2-xl - T5 Family:
google/flan-t5-base,google/flan-t5-large - Sentence Transformers:
sentence-transformers/all-MiniLM-L6-v2 - RoBERTa:
roberta-base,roberta-large - BART:
facebook/bart-base,facebook/bart-large - Llama:
meta-llama/Llama-2-7b-hf(requires authentication)
Model IDs follow the pattern owner/model-name or just model-name for official models:
bert-base-uncased(official model)google/flan-t5-base(organization model)username/custom-model(user model)
package main
import (
"fmt"
"log"
"github.com/amikos-tech/pure-tokenizers"
)
func main() {
// Load a public model tokenizer
tokenizer, err := tokenizers.FromHuggingFace("bert-base-uncased")
if err != nil {
log.Fatal(err)
}
defer tokenizer.Close()
// Tokenize text
text := "Hello, how are you?"
encoding, err := tokenizer.Encode(text, tokenizers.WithAddSpecialTokens())
if err != nil {
log.Fatal(err)
}
fmt.Printf("Tokens: %v\n", encoding.Tokens)
fmt.Printf("Token IDs: %v\n", encoding.IDs)
}tokenizer, err := tokenizers.FromHuggingFace("gpt2",
tokenizers.WithHFRevision("main"), // Specific branch/tag
tokenizers.WithHFTimeout(30 * time.Second), // Custom timeout
tokenizers.WithHFCacheDir("/custom/cache"), // Custom cache location
)HuggingFace uses tokens for authentication. Get your token from HuggingFace Settings.
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxThe library automatically reads the HF_TOKEN environment variable.
tokenizer, err := tokenizers.FromHuggingFace("meta-llama/Llama-2-7b-hf",
tokenizers.WithHFToken("hf_xxxxxxxxxxxxxxxxxxxxxxxxx"),
)Some models (like Llama) require accepting terms on HuggingFace before access:
- Visit the model page (e.g., https://huggingface.co/meta-llama/Llama-2-7b-hf)
- Accept the license terms
- Use your HF token for authentication
// Ensure you have accepted the model's terms on HuggingFace
tokenizer, err := tokenizers.FromHuggingFace("meta-llama/Llama-2-7b-hf",
tokenizers.WithHFToken(os.Getenv("HF_TOKEN")),
)
if err != nil {
// Common errors:
// - "401 Unauthorized": Invalid token
// - "403 Forbidden": Terms not accepted
log.Fatal(err)
}Never commit tokens to version control. Follow these security practices:
# .env file (add to .gitignore)
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxx
# Load in your application
import "github.com/joho/godotenv"
func init() {
if err := godotenv.Load(); err != nil {
log.Println("No .env file found")
}
}// Example with AWS Secrets Manager
func getTokenFromSecretsManager() (string, error) {
// Implementation depends on your cloud provider
// AWS, GCP, Azure, Vault, etc.
return secretsManager.GetSecret("hf-token")
}
// Use in production
token, err := getTokenFromSecretsManager()
if err != nil {
return fmt.Errorf("failed to retrieve token: %w", err)
}
tokenizer, err := tokenizers.FromHuggingFace("model-id",
tokenizers.WithHFToken(token),
)# Add to .gitignore
.env
*.token
*_token.txt
secrets/
# Use git-secrets to prevent accidental commits
git secrets --install
git secrets --register-aws # Registers common token patterns// Implement token rotation for production systems
type TokenProvider interface {
GetToken() (string, error)
RefreshToken() error
}
type RotatingTokenProvider struct {
mu sync.RWMutex
currentToken string
lastRotated time.Time
rotationPeriod time.Duration
}
func (p *RotatingTokenProvider) GetToken() (string, error) {
p.mu.RLock()
defer p.mu.RUnlock()
if time.Since(p.lastRotated) > p.rotationPeriod {
go p.RefreshToken() // Async refresh
}
return p.currentToken, nil
}// Log token usage (without exposing the token)
func logTokenUsage(modelID string, tokenHash string) {
log.Printf("Token %s used for model %s at %s",
tokenHash[:8], // Only log first 8 chars
modelID,
time.Now().Format(time.RFC3339),
)
}tokenizer, err := tokenizers.FromHuggingFace("model-id",
// Authentication
tokenizers.WithHFToken(token),
// Version control
tokenizers.WithHFRevision("main"), // branch, tag, or commit hash
// Cache management
tokenizers.WithHFCacheDir("/path/to/cache"),
tokenizers.WithHFOfflineMode(true), // Use cached only, no downloads
// Network configuration
tokenizers.WithHFTimeout(60 * time.Second),
// Library configuration (if needed)
tokenizers.WithLibraryPath("/path/to/libtokenizers.so"),
)Provides authentication for private or gated models.
tokenizers.WithHFToken("hf_xxxxxxxxx")Loads a specific version of the model. Can be:
- Branch name:
"main","development" - Tag:
"v1.0.0" - Commit hash:
"abc123def456"
tokenizers.WithHFRevision("v2.0.0")Overrides the default cache directory.
tokenizers.WithHFCacheDir("/custom/cache/path")Forces offline mode - only uses cached tokenizers, no network requests.
tokenizers.WithHFOfflineMode(true)Sets custom timeout for downloads (default: 30 seconds).
tokenizers.WithHFTimeout(60 * time.Second)Tokenizers are cached in platform-specific directories for optimal performance:
- macOS:
~/Library/Caches/tokenizers/lib/hf/models/ - Linux:
~/.cache/tokenizers/lib/hf/models/or$XDG_CACHE_HOME/tokenizers/lib/hf/models/ - Windows:
%APPDATA%/tokenizers/lib/hf/models/
- HuggingFace cache directories are created with
0750permissions. - Cached
tokenizer.jsonfiles are written with0600(owner read/write only).
~/.cache/tokenizers/lib/hf/models/
├── bert-base-uncased/
│ ├── main/
│ │ └── tokenizer.json
│ └── metadata.json
├── gpt2/
│ ├── main/
│ │ └── tokenizer.json
│ └── metadata.json
└── meta-llama--Llama-2-7b-hf/ # Note: "/" replaced with "--"
├── main/
│ └── tokenizer.json
└── metadata.json
// Get cache info for a specific model
info, err := tokenizers.GetHFCacheInfo("bert-base-uncased")
if err != nil {
log.Fatal(err)
}
// info contains:
// - "path": Full path to cached tokenizer
// - "size": File size in bytes
// - "modified": Last modification time
// - "revision": Cached revision
fmt.Printf("Cache info: %+v\n", info)// Clear cache for specific model
if err := tokenizers.ClearHFModelCache("bert-base-uncased"); err != nil {
log.Printf("Failed to clear model cache: %v", err)
// Handle error appropriately - cache clearing is often non-critical
}
// Clear entire HuggingFace cache
if err := tokenizers.ClearHFCache(); err != nil {
log.Printf("Failed to clear HuggingFace cache: %v", err)
// Continue execution - cache operations are typically non-blocking
}Use cached tokenizers without network access:
// This will return an error if the model is not already cached
tokenizer, err := tokenizers.FromHuggingFace("bert-base-uncased",
tokenizers.WithHFOfflineMode(true),
)Pure-tokenizers can also read from the standard HuggingFace cache if present:
// Set HF_HOME to use existing HuggingFace cache
if err := os.Setenv("HF_HOME", "/path/to/huggingface/cache"); err != nil {
log.Printf("Warning: Failed to set HF_HOME: %v", err)
}
// The library will check this cache before downloading
tokenizer, err := tokenizers.FromHuggingFace("model-id")
if err != nil {
return fmt.Errorf("failed to load tokenizer: %w", err)
}
defer tokenizer.Close()import (
"path/filepath"
"os"
"time"
)
// GetCacheSize calculates total cache size
func GetCacheSize(cacheDir string) (int64, error) {
var size int64
err := filepath.Walk(cacheDir, func(_ string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.IsDir() {
size += info.Size()
}
return nil
})
return size, err
}
// CleanOldCache removes models not accessed in the last N days
func CleanOldCache(cacheDir string, maxAgeDays int) error {
cutoff := time.Now().AddDate(0, 0, -maxAgeDays)
return filepath.Walk(cacheDir, func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
// Check tokenizer.json files
if filepath.Base(path) == "tokenizer.json" && info.ModTime().Before(cutoff) {
modelDir := filepath.Dir(filepath.Dir(path)) // Go up two levels
log.Printf("Removing old cached model: %s", modelDir)
return os.RemoveAll(modelDir)
}
return nil
})
}
// Production cache management example
func ManageCacheInProduction() error {
cacheDir := tokenizers.GetHFCacheDir()
// Check cache size
size, err := GetCacheSize(cacheDir)
if err != nil {
return fmt.Errorf("failed to get cache size: %w", err)
}
// If cache exceeds 10GB, clean models older than 30 days
const maxCacheSize = 10 * 1024 * 1024 * 1024 // 10GB
if size > maxCacheSize {
if err := CleanOldCache(cacheDir, 30); err != nil {
log.Printf("Cache cleanup failed: %v", err)
// Don't fail the application for cache cleanup errors
}
}
return nil
}// WarmCache pre-loads frequently used models during application startup
func WarmCache(models []string) {
var wg sync.WaitGroup
for _, modelID := range models {
wg.Add(1)
go func(model string) {
defer wg.Done()
// Try to load the model to ensure it's cached
tok, err := tokenizers.FromHuggingFace(model)
if err != nil {
log.Printf("Failed to warm cache for %s: %v", model, err)
return
}
tok.Close()
log.Printf("Successfully cached %s", model)
}(modelID)
}
wg.Wait()
}
// Use during application initialization
func init() {
criticalModels := []string{
"bert-base-uncased",
"gpt2",
"distilbert-base-uncased",
}
WarmCache(criticalModels)
}HuggingFace Hub implements rate limiting to ensure fair usage:
- Anonymous requests: ~100 requests per hour
- Authenticated requests: ~1000 requests per hour (varies by account type)
- Rate limit headers:
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset
Pure-tokenizers implements intelligent retry logic with exponential backoff:
// Default retry configuration
const (
HFMaxRetries = 3 // Maximum number of retry attempts
HFRetryDelay = 1 * time.Second // Base delay between retries
HFMaxRetryAfterDelay = 5 * time.Minute // Maximum delay from Retry-After header
)- Exponential Backoff with Jitter
// The library automatically implements this retry strategy:
// Attempt 1: Immediate
// Attempt 2: 1s + jitter (0-250ms)
// Attempt 3: 2s + jitter (0-500ms)
// Attempt 4: 4s + jitter (0-1s)- Retry-After Header Handling
// When HuggingFace returns 429 (Too Many Requests), it may include a Retry-After header
// The library respects this header and waits accordingly:
// - Numeric value: delay in seconds
// - HTTP date: specific time to retry after
// - Maximum wait capped at 5 minutes to prevent abuse- Non-Retryable Errors The following errors are not retried:
401 Unauthorized- Invalid or missing token403 Forbidden- No access to model404 Not Found- Model doesn't exist
For advanced use cases, implement custom retry logic:
type RetryConfig struct {
MaxAttempts int
BaseDelay time.Duration
MaxDelay time.Duration
Multiplier float64
}
func LoadWithCustomRetry(modelID string, config RetryConfig) (*tokenizers.Tokenizer, error) {
var lastErr error
for attempt := 0; attempt < config.MaxAttempts; attempt++ {
// Calculate delay with exponential backoff
if attempt > 0 {
delay := time.Duration(float64(config.BaseDelay) * math.Pow(config.Multiplier, float64(attempt-1)))
if delay > config.MaxDelay {
delay = config.MaxDelay
}
// Add jitter to prevent thundering herd
jitter := time.Duration(rand.Float64() * float64(delay) * 0.1)
time.Sleep(delay + jitter)
log.Printf("Retry attempt %d/%d after %v", attempt+1, config.MaxAttempts, delay+jitter)
}
tokenizer, err := tokenizers.FromHuggingFace(modelID,
tokenizers.WithHFTimeout(30*time.Second),
)
if err == nil {
return tokenizer, nil
}
lastErr = err
// Check if error is retryable
if strings.Contains(err.Error(), "401") ||
strings.Contains(err.Error(), "403") ||
strings.Contains(err.Error(), "404") {
return nil, err // Don't retry these errors
}
}
return nil, fmt.Errorf("failed after %d attempts: %w", config.MaxAttempts, lastErr)
}// Track rate limit usage in production
type RateLimitMonitor struct {
mu sync.Mutex
requests []time.Time
windowSize time.Duration
}
func (m *RateLimitMonitor) RecordRequest() {
m.mu.Lock()
defer m.mu.Unlock()
now := time.Now()
m.requests = append(m.requests, now)
// Clean old requests outside the window
cutoff := now.Add(-m.windowSize)
i := 0
for i < len(m.requests) && m.requests[i].Before(cutoff) {
i++
}
m.requests = m.requests[i:]
}
func (m *RateLimitMonitor) GetRequestRate() float64 {
m.mu.Lock()
defer m.mu.Unlock()
if len(m.requests) == 0 {
return 0
}
elapsed := time.Since(m.requests[0])
if elapsed.Seconds() == 0 {
return float64(len(m.requests))
}
return float64(len(m.requests)) / elapsed.Seconds()
}
// Use in production
monitor := &RateLimitMonitor{
windowSize: time.Hour,
}
// Before each request
monitor.RecordRequest()
if monitor.GetRequestRate() > 15 { // 15 requests per second threshold
log.Printf("Warning: High request rate: %.2f req/s", monitor.GetRequestRate())
time.Sleep(time.Second) // Throttle
}- Cache Aggressively: Reduce API calls by caching tokenizers locally
- Batch Operations: Load multiple models in parallel when possible
- Use Offline Mode: For production, pre-cache models and use offline mode
- Implement Circuit Breakers: Prevent cascading failures
type CircuitBreaker struct {
mu sync.Mutex
failures int
lastFailTime time.Time
state string // "closed", "open", "half-open"
threshold int
timeout time.Duration
}
func (cb *CircuitBreaker) Call(fn func() error) error {
cb.mu.Lock()
defer cb.mu.Unlock()
// Check circuit state
if cb.state == "open" {
if time.Since(cb.lastFailTime) > cb.timeout {
cb.state = "half-open"
cb.failures = 0
} else {
return fmt.Errorf("circuit breaker is open")
}
}
// Execute function
err := fn()
if err != nil {
cb.failures++
cb.lastFailTime = time.Now()
if cb.failures >= cb.threshold {
cb.state = "open"
return fmt.Errorf("circuit breaker opened after %d failures: %w", cb.failures, err)
}
return err
}
// Success - reset failures
if cb.state == "half-open" {
cb.state = "closed"
}
cb.failures = 0
return nil
}from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# With authentication
tokenizer = AutoTokenizer.from_pretrained(
"private-model",
token="hf_xxxxx"
)
# With specific revision
tokenizer = AutoTokenizer.from_pretrained(
"model-id",
revision="v2.0.0"
)
# Tokenize
tokens = tokenizer("Hello world", return_tensors="pt")import "github.com/amikos-tech/pure-tokenizers"
// Load tokenizer
tokenizer, err := tokenizers.FromHuggingFace("bert-base-uncased")
// With authentication
tokenizer, err := tokenizers.FromHuggingFace("private-model",
tokenizers.WithHFToken("hf_xxxxx"),
)
// With specific revision
tokenizer, err := tokenizers.FromHuggingFace("model-id",
tokenizers.WithHFRevision("v2.0.0"),
)
// Tokenize
encoding, err := tokenizer.Encode("Hello world",
tokenizers.WithAddSpecialTokens(),
)- Error Handling: Go requires explicit error checking
- Resource Management: Use
defer tokenizer.Close()in Go - Options: Go uses functional options pattern
- Return Format: Go returns structured
Encodingtype
func processBatch(tokenizer *tokenizers.Tokenizer, texts []string) error {
for _, text := range texts {
encoding, err := tokenizer.Encode(text,
tokenizers.WithAddSpecialTokens(),
tokenizers.WithReturnAttentionMask(),
)
if err != nil {
return err
}
// Process encoding
processEncoding(encoding)
}
return nil
}func loadWithRetry(modelID string, maxRetries int) (*tokenizers.Tokenizer, error) {
var lastErr error
for i := 0; i < maxRetries; i++ {
tokenizer, err := tokenizers.FromHuggingFace(modelID,
tokenizers.WithHFTimeout(30 * time.Second),
)
if err == nil {
return tokenizer, nil
}
lastErr = err
// Check if error is retryable
if strings.Contains(err.Error(), "rate limit") {
time.Sleep(time.Duration(i+1) * 5 * time.Second)
continue
}
return nil, err
}
return nil, fmt.Errorf("failed after %d retries: %w", maxRetries, lastErr)
}// Preload models during initialization
func init() {
models := []string{
"bert-base-uncased",
"gpt2",
"distilbert-base-uncased",
}
for _, model := range models {
go func(m string) {
tok, err := tokenizers.FromHuggingFace(m)
if err != nil {
log.Printf("Failed to preload %s: %v", m, err)
return
}
tok.Close()
log.Printf("Preloaded %s", m)
}(model)
}
}Error: 401 Unauthorized
Solution: Check your HF token is valid
- Verify token at https://huggingface.co/settings/tokens
- Ensure token has read permissions
- Check token is correctly set in environment or code
Error: 403 Forbidden
Solution: Accept model terms on HuggingFace
- Visit the model page on HuggingFace
- Accept the license/terms
- Wait a few minutes for propagation
Error: timeout or connection refused
Solution: Check network and proxy settings
- Verify internet connectivity
- Check if behind corporate proxy
- Increase timeout with WithHFTimeout
- Use offline mode if model is cached
Error: permission denied
Solution: Check cache directory permissions
- Verify write permissions to cache directory
- Use WithHFCacheDir to specify writable location
- Clear corrupted cache with ClearHFModelCache
Error: 404 Not Found
Solution: Verify model ID and availability
- Check model exists on HuggingFace
- Verify correct model ID format (owner/model)
- Check if model is public or requires authentication
Error: 429 Too Many Requests
Solution: Handle rate limits
- Implement exponential backoff
- Cache models locally for reuse
- Consider using offline mode when possible
- Spread requests over time
import "log"
// Set debug logging
log.SetFlags(log.LstdFlags | log.Lshortfile)
// Log all operations
tokenizer, err := tokenizers.FromHuggingFace("model-id")
if err != nil {
log.Printf("Failed to load tokenizer: %+v", err)
}// Verify what's cached
info, err := tokenizers.GetHFCacheInfo("model-id")
if err != nil {
log.Printf("Model not cached: %v", err)
} else {
log.Printf("Model cached at: %s", info["path"])
}// Test with a small, public model first
testTokenizer, err := tokenizers.FromHuggingFace("bert-base-uncased")
if err != nil {
log.Fatal("Cannot connect to HuggingFace: ", err)
}
testTokenizer.Close()- Models are cached after first download
- Subsequent loads are near-instantaneous
- Cache is persistent across application restarts
// Always close tokenizers when done
tokenizer, err := tokenizers.FromHuggingFace("model-id")
if err != nil {
return err
}
defer tokenizer.Close() // Important: releases memory// Tokenizers are thread-safe for reading
var tokenizer *tokenizers.Tokenizer
func init() {
var err error
tokenizer, err = tokenizers.FromHuggingFace("bert-base-uncased")
if err != nil {
log.Fatal(err)
}
}
// Safe to use from multiple goroutines
func processText(text string) (*tokenizers.Encoding, error) {
return tokenizer.Encode(text)
}- Cache frequently used models: Load once, reuse many times
- Use offline mode in production: Avoid network dependencies
- Implement proper error handling: Network calls can fail
- Set appropriate timeouts: Balance between reliability and speed
- Clean up resources: Always use
defer tokenizer.Close()
- Examples - Working code examples
- Cache Management - Detailed cache documentation
- HuggingFace Hub - Browse available models
- API Reference - Complete API documentation