Skip to content

Modern async Python SDK for Pharia Data API with full TypedDict support. Type-safe, developer-friendly interface for managing stages, datasets, files, connectors, and repositories. Python 3.12+

License

Notifications You must be signed in to change notification settings

Aleph-Alpha/pharia_data_sdk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Pharia Data SDK

A black and white logo for the async python pharia data sdk

Modern Python SDK for the Pharia Data API

Type-safe β€’ Async-first β€’ Made for humans


CI CodeQL Python 3.12+ Async Type Safe

πŸ“˜ Quick Start β€’ πŸ“š Examples β€’ πŸ§ͺ Tests


✨ Features

  • πŸš€ Async/await - Built on modern async Python
  • 🎯 Type-safe - Full TypedDict support for autocomplete
  • 🧩 Intuitive API - Clean, resource-based interface
  • πŸ“¦ Batteries included - Stages, files, datasets, connectors, and more
  • πŸ”§ Flexible - Easy configuration and customization

⚠️ Stability

This SDK follows the same stability guarantees as the Go programming language:

  • Before 1.0.0: Breaking changes may occur between minor versions
  • After 1.0.0: Code that works with 1.x will continue to work with all future 1.x releases
  • Semantic versioning will be strictly followed after 1.0.0

πŸ“¦ Installation

# Install directly from GitHub using uv (recommended)
uv pip install git+https://github.com/Aleph-Alpha/pharia_data_sdk.git

# Or add to your project dependencies
uv add git+https://github.com/Aleph-Alpha/pharia_data_sdk.git

# For development (clone and install)
git clone https://github.com/Aleph-Alpha/pharia_data_sdk.git
cd pharia_data_sdk
uv sync

βš™οΈ Configuration

The SDK requires two environment variables:

Variable Description
PHARIA_DATA_API_BASE_URL API base URL
PHARIA_API_KEY API authentication key

πŸš€ Quick Start

import asyncio
from pharia import Client

async def main():
    # Client automatically reads from environment variables
    client = Client()

    # List all stages (v1 API)
    stages = await client.v1.stages.list(page=0, size=10)
    print(f"Found {stages['total']} stages")

    # Create a stage with embedding
    stage = await client.v1.stages.semantic.create(
        name="My Semantic Search Stage",
        embedding_model="luminous-base",
        representation="asymmetric"
    )

    # Create a search store (beta API)
    search_store = await client.beta.search_stores.semantic.create(
        name="My Search Store",
        embedding_model="luminous-base",
        representation="asymmetric",
        max_chunk_size_tokens=512,
        chunk_overlap_tokens=128
    )

asyncio.run(main())

🎯 API Resources

V1 API (client.v1.*)

Resource Description
client.v1.stages Create and manage data stages
client.v1.files Upload and manage files
client.v1.datasets Dataset operations
client.v1.repositories Repository management
client.v1.connectors External data connectors

Beta API (client.beta.*)

Resource Description
client.beta.search_stores Create and manage search stores

πŸ’‘ Creating Stages with Embeddings

The SDK provides specialized methods for different embedding types:

client = Client()  # Reads PHARIA_DATA_API_BASE_URL and PHARIA_API_KEY from env

# Simple stage (no embedding)
stage = await client.v1.stages.create(name="Simple Stage")

# Instruct embedding
stage = await client.v1.stages.instruct.create(
    name="Instruct Stage",
    embedding_model="pharia-1-embedding-256-control",
    instruction_document="Represent this document for retrieval",
    instruction_query="Represent this query for retrieval"
)

# Semantic embedding
stage = await client.v1.stages.semantic.create(
    name="Semantic Stage",
    embedding_model="luminous-base",
    representation="asymmetric"
)

# VLLM embedding
stage = await client.v1.stages.vllm.create(
    name="VLLM Stage",
    embedding_model="qwen3-embedding-8b"
)

πŸ” Creating Search Stores (Beta)

Search stores provide standalone semantic search capabilities:

client = Client()

# Semantic search store
search_store = await client.beta.search_stores.semantic.create(
    name="My Semantic Search Store",
    embedding_model="luminous-base",
    representation="asymmetric",
    max_chunk_size_tokens=512,
    chunk_overlap_tokens=128
)

# Instruct search store
search_store = await client.beta.search_stores.instruct.create(
    name="My Instruct Search Store",
    embedding_model="pharia-1-embedding-256-control",
    instruction_document="Represent this document for retrieval",
    instruction_query="Represent this query for retrieval",
    max_chunk_size_tokens=512,
    chunk_overlap_tokens=128
)

πŸ›‘οΈ Type Safety

Full TypedDict support for type checking and IDE autocomplete:

from pharia import CreateRepositoryInput
from pharia import CreateStageInput
from pharia import DestinationType
from pharia import MediaType
from pharia import Modality
from pharia import TransformationName

# Type-safe inputs (all snake_case with enums)
stage_input: CreateStageInput = {
    "name": "My Stage",
    "triggers": [{
        "name": "my-trigger",
        "transformation_name": TransformationName.DOCUMENT_TO_TEXT,
        "destination_type": DestinationType.DATA_PLATFORM_REPOSITORY,
        "repository_id": "repo-id"
    }]
}

stage = await client.v1.stages.create(**stage_input)

# Type-safe repository creation with enums
repository = await client.v1.repositories.create(
    name="My Repository",
    media_type=MediaType.JSONLINES,
    modality=Modality.TEXT
)

All types and enums are defined in pharia/models.py.

πŸ“š Examples

Check out the examples directory for comprehensive guides:

Run any example:

cd examples
python create_stages.py

πŸ§ͺ Testing

Run integration tests:

# Set your API credentials
export PHARIA_DATA_API_BASE_URL="https://<base-url>"
export PHARIA_API_KEY="your-api-key"

# Run tests
pytest tests/

πŸ”§ Advanced Configuration

# Override environment variables
client = Client(
    base_url="https://custom-api.example.com",
    api_key="custom-key",
    timeout=30.0
)

# Clone client with new options
new_client = client.with_options(timeout=60.0)

πŸ“– API Reference

See models.py for all available types and their fields.


Built with ❀️ for the Pharia platform

About

Modern async Python SDK for Pharia Data API with full TypedDict support. Type-safe, developer-friendly interface for managing stages, datasets, files, connectors, and repositories. Python 3.12+

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages