Skip to content

h2oai/text-2-everything-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text2Everything SDK

The official Python SDK for the Text2Everything API, providing easy access to text-to-SQL conversion, project management, and data operations.

Features

  • Unified Client: Single entry point for all API operations
  • Type Safety: Full Pydantic model integration with IDE support
  • Error Handling: Comprehensive exception hierarchy with detailed error information
  • Retry Logic: Automatic retry with exponential backoff for failed requests
  • Pagination: Automatic handling of paginated responses
  • Resource Management: Organized clients for each API resource type
  • Context Manager: Proper resource cleanup with context manager support
  • Custom Tools: Upload and manage custom Python tools with directory-based creation
  • Multipart File Uploads: Native support for file uploads with proper Content-Type handling
  • Nested Validation: Comprehensive schema validation with nested field requirements
  • Environment Configuration: Support for .env files for easy local development setup

Installation

Install from PyPI:

pip install h2o-text-2-everything

# With optional dependencies
pip install h2o-text-2-everything[integrations]  # pandas, jupyter, h2o-drive
pip install h2o-text-2-everything[dev]          # development tools
pip install h2o-text-2-everything[docs]         # documentation tools

For development installation and other options, see INSTALLATION.md

Quick Start

from text2everything_sdk import Text2EverythingClient

# Initialize the client
client = Text2EverythingClient(
    base_url="https://your-api-endpoint.com",
    access_token="your-access-token",
    workspace_name="workspaces/my-workspace"
)

# Create a project
project = client.projects.create(
    name="My Project",
    description="A sample project for text-to-SQL conversion"
)

# Add context information
context = client.contexts.create(
    project_id=project.id,
    name="Business Rules",
    content="Important business context and rules...",
    is_always_displayed=True
)

# Add schema metadata
schema = client.schema_metadata.create(
    project_id=project.id,
    name="Customers Table",
    description="Customer information table",
    schema_data={
        "table": {
            "name": "customers",
            "columns": [
                {"name": "id", "type": "INTEGER"},
                {"name": "name", "type": "VARCHAR(100)"},
                {"name": "email", "type": "VARCHAR(255)"}
            ]
        }
    }
)

# Create a chat session
session = client.chat_sessions.create(project_id=project.id)

# Generate SQL for a query
response = client.chat.chat_to_sql(
    project_id=project.id,
    chat_session_id=session.id,
    query="Show me all customers from California",
)
print(f"Generated SQL: {response.sql_query}")

API Resources

The SDK provides clients for all Text2Everything API resources:

Projects

# List projects
projects = client.projects.list()

# Get project by ID
project = client.projects.get("project_id")

# Create project
project = client.projects.create(name="New Project")

# Update project
project = client.projects.update("project_id", name="Updated Name")

# Delete project
client.projects.delete("project_id")

Contexts

# Add business context
context = client.contexts.create(
    project_id="project_id",
    name="Business Rules",
    content="Context content...",
    is_always_displayed=True
)

# List contexts for a project
contexts = client.contexts.list(project_id="project_id")

Schema Metadata

# Add table schema
table = client.schema_metadata.create(
    project_id="project_id",
    name="Users Table",
    schema_data={
        "table": {
            "name": "users",
            "columns": [...]
        }
    }
)

# Add dimension
dimension = client.schema_metadata.create(
    project_id="project_id",
    name="User Status",
    schema_data={
        "table": {
            "dimension": {
                "name": "status",
                "content": {...}
            }
        }
    }
)

Golden Examples

# Add example query-SQL pairs
example = client.golden_examples.create(
    project_id="project_id",
    name="High Value Customers",
    user_query="Show me customers with orders over $1000",
    sql_query="SELECT * FROM customers WHERE total_orders > 1000",
    description="Example for high-value customer queries"
)

Chat Sessions and Chat

# Create chat session
session = client.chat_sessions.create(project_id="project_id")

# Convert natural language to SQL
resp = client.chat.chat_to_sql(
    project_id="project_id",
    chat_session_id=session.id,
    query="Your natural language query here",
)

# Or convert and execute
ans = client.chat.chat_to_answer(
    project_id="project_id",
    chat_session_id=session.id,
    query="Top 10 customers by revenue",
    connector_id="your-connector-id"
)

Custom Tools

# Create custom tool from individual files
with open("my_tool.py", "rb") as f:
    tool = client.custom_tools.create(
        name="My Custom Tool",
        description="A custom Python tool for data processing",
        files=[f]
    )

# Create custom tool from directory (uploads all Python files)
tool = client.custom_tools.create_from_directory(
    name="Data Processing Suite",
    description="Complete data processing toolkit",
    directory_path="/path/to/tool/directory"
)

# List custom tools
tools = client.custom_tools.list()

# Get custom tool details
tool = client.custom_tools.get("tool_id")

# Update custom tool
updated_tool = client.custom_tools.update(
    "tool_id",
    name="Updated Tool Name",
    description="Updated description"
)

# Delete custom tool
client.custom_tools.delete("tool_id")

Connectors

# Add database connector
connector = client.connectors.create(
    name="Production DB",
    db_type="postgres",
    host="localhost",
    port=5432,
    username="user",
    password="password",
    database="mydb"
)

# Test connection
result = client.connectors.test_connection(connector.id)

Bulk Operations

The SDK provides efficient bulk delete operations for managing multiple resources at once:

Bulk Delete Contexts

# Delete multiple contexts in one operation
context_ids = ["id1", "id2", "id3"]
result = client.contexts.bulk_delete(project_id="project_id", context_ids=context_ids)

print(f"Deleted: {result['deleted_count']}")
print(f"Failed: {result.get('failed_ids', [])}")

Bulk Delete Schema Metadata

# Bulk delete schemas (automatically handles split groups)
schema_ids = ["schema1", "schema2", "schema3"]
result = client.schema_metadata.bulk_delete(project_id="project_id", schema_ids=schema_ids)

# Returns structured response with success/failure details
print(f"Successfully deleted {result['deleted_count']} schemas")

Bulk Delete Golden Examples

# Delete multiple examples at once
example_ids = ["ex1", "ex2", "ex3"]
result = client.golden_examples.bulk_delete(project_id="project_id", example_ids=example_ids)

Bulk Delete Feedback

# Clean up multiple feedback items
feedback_ids = ["fb1", "fb2", "fb3"]
result = client.feedback.bulk_delete(project_id="project_id", feedback_ids=feedback_ids)

Chat Presets

Chat presets allow you to create reusable chat configurations with predefined settings, connectors, and prompt templates:

Creating and Managing Presets

# Create a basic chat preset with existing template
preset = client.chat_presets.create(
    project_id="project_id",
    name="Production Analytics",
    collection_name="analytics_collection",
    description="Preset for production data analysis",
    prompt_template_id="template_id",
    connector_id="connector_id",
    chat_settings={
        "llm": "gpt-4",
        "include_chat_history": "auto"
    }
)

# NOTE: Inline template creation - API limitation
# The prompt_template parameter is accepted for API parity but not currently processed.
# To use a custom template, create it first then reference by ID:
template = client.chat_presets.create_prompt_template(
    project_id="project_id",
    name="Custom Analytics Template",
    system_prompt="You are an expert data analyst specializing in...",
    description="Template for advanced analytics queries"
)

preset = client.chat_presets.create(
    project_id="project_id",
    name="Advanced Analytics",
    collection_name="advanced_collection",
    prompt_template_id=template["id"],  # Use the created template ID
    connector_id="connector_id",
    workspace_id="workspace_123"
)

# Create preset with sharing and workspace settings
preset = client.chat_presets.create(
    project_id="project_id",
    name="Shared Team Preset",
    collection_name="team_collection",
    prompt_template={
        "name": "Team Template",
        "system_prompt": "You are a helpful assistant for the team..."
    },
    share_prompt_with_usernames=["[email protected]", "[email protected]"],
    workspace_id="workspace_123",
    t2e_url="https://custom-t2e.example.com"
)

# List all presets
presets = client.chat_presets.list(project_id="project_id")

# Search for specific presets
support_presets = client.chat_presets.list(
    project_id="project_id",
    search="support"
)

# Get specific preset by collection ID
preset = client.chat_presets.get(
    project_id="project_id",
    collection_id="collection_id"
)

# Update preset
updated = client.chat_presets.update(
    project_id="project_id",
    collection_id="collection_id",
    name="Updated Analytics Preset",
    description="Updated description",
    chat_settings={
        "llm": "gpt-4-turbo",
        "include_chat_history": "true"
    }
)

# Delete preset
client.chat_presets.delete(
    project_id="project_id",
    collection_id="collection_id"
)

Managing Prompt Templates

# Add prompt template to preset
template = client.chat_presets.add_prompt_template(
    project_id="project_id",
    preset_id="preset_id",
    template_name="Analysis Template",
    template_content="Analyze the following data: {query}"
)

# List templates for a preset
templates = client.chat_presets.list_prompt_templates(
    project_id="project_id",
    preset_id="preset_id"
)

# Delete template
client.chat_presets.delete_prompt_template(
    project_id="project_id",
    preset_id="preset_id",
    template_id="template_id"
)

Using Presets in Chat Sessions

# Activate a preset for use
client.chat_presets.activate(project_id="project_id", preset_id="preset_id")

# Get currently active preset
active = client.chat_presets.get_active(project_id="project_id")

# Create chat session from preset
session = client.chat_sessions.create_from_preset(
    project_id="project_id",
    preset_id="preset_id"
)

# Or use the active preset
session = client.chat_sessions.create_from_active_preset(project_id="project_id")

Advanced Features

Project Collections

Access and manage H2OGPTE collections for your project resources:

# List all collections for a project
collections = client.projects.list_collections(project_id="project_id")

for collection in collections:
    print(f"{collection.component_type}: {collection.h2ogpte_collection_id}")

# Get collection by type
contexts_collection = client.projects.get_collection_by_type(
    project_id="project_id",
    component_type="contexts"
)

Execution Cache Lookup

Query the execution cache to find similar past queries for performance optimization:

# Look up cached executions for a query
cache_result = client.chat.execution_cache_lookup(
    project_id="project_id",
    user_query="Show me top 10 customers",
    connector_id="connector_id",
    similarity_threshold=0.8,  # 0.0 to 1.0
    top_n=5,  # Return top 5 matches
    only_positive_feedback=True  # Only include positively rated executions
)

# Check if we got a cache hit
if cache_result.cache_hit:
    print(f"Found {len(cache_result.matches)} similar executions")
    for match in cache_result.matches:
        print(f"Similarity: {match.similarity_score}")
        print(f"SQL: {match.execution.sql_query}")
        print(f"Results: {match.execution.results}")

Schema Splitting for Large Tables

Tables with more than 8 columns are automatically split into multiple parts. The create() method returns:

  • Single SchemaMetadataResponse for small schemas (≤8 columns)
  • List[SchemaMetadataResponse] for large schemas (>8 columns)
result = client.schema_metadata.create(
    project_id="project_id",
    name="My Table",
    schema_data=my_schema_data
)

# Always check the return type
if isinstance(result, list):
    print(f"Schema split into {len(result)} parts")
    # All parts share the same split_group_id
else:
    print(f"Created single schema: {result.id}")

📖 For complete documentation on working with split schemas, see:

Error Handling

The SDK provides comprehensive error handling:

from text2everything_sdk import (
    Text2EverythingClient,
    AuthenticationError,
    ValidationError,
    NotFoundError,
    RateLimitError
)

try:
    project = client.projects.get("invalid_id")
except NotFoundError as e:
    print(f"Project not found: {e.message}")
except AuthenticationError as e:
    print(f"Authentication failed: {e.message}")
except ValidationError as e:
    print(f"Validation error: {e.message}")
    print(f"Details: {e.response_data}")
except RateLimitError as e:
    print(f"Rate limit exceeded. Retry after: {e.retry_after} seconds")

Configuration

Environment Variables

You can configure the SDK using environment variables:

export TEXT2EVERYTHING_BASE_URL="https://your-api-endpoint.com"
export T2E_ACCESS_TOKEN="your-oidc-access-token"
export T2E_WORKSPACE_NAME="workspaces/my-workspace"
import os
from text2everything_sdk import Text2EverythingClient

client = Text2EverythingClient(
    base_url=os.getenv("TEXT2EVERYTHING_BASE_URL"),
    access_token=os.getenv("T2E_ACCESS_TOKEN"),
    workspace_name=os.getenv("T2E_WORKSPACE_NAME")
)

.env File Support

For local development, create a .env file in your project root:

# .env file
T2E_BASE_URL=https://your-api-endpoint.com
T2E_ACCESS_TOKEN=your-oidc-access-token
T2E_WORKSPACE_NAME=workspaces/my-workspace

The SDK will automatically load these variables when running tests:

# The SDK automatically loads .env files for testing
from text2everything_sdk import Text2EverythingClient

# These will be loaded from .env file automatically
client = Text2EverythingClient()

Advanced Configuration

client = Text2EverythingClient(
    base_url="https://your-api-endpoint.com",
    access_token="your-oidc-access-token",
    workspace_name="workspaces/my-workspace",
    timeout=60,  # Request timeout in seconds
    max_retries=5,  # Maximum retry attempts
    retry_delay=2.0  # Initial retry delay in seconds
)

Context Manager

Use the client as a context manager for proper resource cleanup:

with Text2EverythingClient(base_url="...", access_token="...", workspace_name="workspaces/dev") as client:
    projects = client.projects.list()
    # Client will be automatically closed when exiting the context

Pagination

The SDK automatically handles pagination for list operations:

# Get all projects (automatically handles pagination)
all_projects = client.projects.list()

# Manual pagination control
page1_projects = client.projects.list(page=1, per_page=10)
page2_projects = client.projects.list(page=2, per_page=10)

Schema Validation

The SDK includes comprehensive nested field validation for schema metadata:

Required Nested Fields

Different schema types require specific nested fields:

  • Tables: schema_metadata.table and schema_metadata.table.columns
  • Dimensions: schema_metadata.table, schema_metadata.table.dimension, and schema_metadata.table.dimension.content
  • Metrics: schema_metadata.table, schema_metadata.table.metric, and schema_metadata.table.metric.content
  • Relationships: schema_metadata.relationship

Validation Examples

# Valid table schema
table_schema = {
    "table": {
        "name": "customers",
        "columns": [
            {"name": "id", "type": "INTEGER"},
            {"name": "name", "type": "VARCHAR(100)"}
        ]
    }
}

# Valid dimension schema
dimension_schema = {
    "table": {
        "name": "customers",
        "dimension": {
            "name": "customer_status",
            "content": {
                "type": "categorical",
                "values": ["active", "inactive", "pending"]
            }
        }
    }
}

# Valid metric schema
metric_schema = {
    "table": {
        "name": "orders",
        "metric": {
            "name": "total_revenue",
            "content": {
                "aggregation": "sum",
                "column": "amount"
            }
        }
    }
}

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite
  6. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Changelog

v0.1.7 (Current)

  • 100% API Parity Achieved: Complete coverage of all Text2Everything API endpoints
  • Bulk Delete Operations: Added bulk delete support for contexts, schema metadata, golden examples, and feedback
  • Chat Presets: Full CRUD operations for chat presets with prompt templates and active preset management
  • Project Collections: List and retrieve project collections by type
  • Execution Cache Lookup: Query execution cache for performance optimization
  • Schema Split Groups: Automatic handling of large table schemas (>8 columns)
  • Custom Tools Support: Full CRUD operations for custom Python tools
  • Directory-based Tool Creation: Upload entire directories as custom tools
  • Multipart File Upload: Native support for file uploads with proper Content-Type handling
  • Enhanced Validation: Comprehensive nested field validation for schema metadata
  • Environment Configuration: Added .env file support for local development
  • Improved Testing: Enhanced test suite with automatic environment loading
  • Bug Fixes: Resolved Content-Type header conflicts in multipart requests

v0.1.0

  • Initial release
  • Complete API coverage for all Text2Everything endpoints
  • Type-safe Pydantic models
  • Comprehensive error handling
  • Automatic pagination and retry logic
  • Context manager support
  • Integration with existing H2O Drive SDK

About

SDK for text-2-everything

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •