Skip to content

Latest commit

 

History

History
490 lines (366 loc) · 12.1 KB

File metadata and controls

490 lines (366 loc) · 12.1 KB

SQL Operations

This guide covers SQL query execution, including file-based queries, context management, async execution, and query analysis.

Commands

Execute SQL Query

Execute a SQL query and return results.

alt-dremio-cli sql execute <QUERY> [OPTIONS]
alt-dremio-cli sql execute --file <FILE> [OPTIONS]

Arguments:

  • QUERY - SQL query string (optional if using --file)

Options:

  • --file PATH - Execute SQL from file
  • --context TEXT - Query context (comma-separated path)
  • --async - Execute asynchronously (return job ID immediately). Default behavior waits for results.
  • --output-file PATH - Save results to file (supports .json, .yaml, .csv, .parquet)

Note: .csv and .parquet export require pandas and pyarrow installed.

Examples:

# Execute simple query (Waits for results)
alt-dremio-cli sql execute "SELECT * FROM customers LIMIT 10"

# Execute from file (Waits for results)
# Can contain multiple statements separated by semicolons (;)
alt-dremio-cli sql execute --file query.sql

# Execute asynchronously (Returns Job ID immediately)
# Note: --async is ignored for multi-statement files (runs sequentially)
alt-dremio-cli sql execute "SELECT * FROM large_table" --async

# Save results to file
alt-dremio-cli sql execute "SELECT * FROM table" --output-file results.json

# Combine options
alt-dremio-cli sql execute --file complex_query.sql --context "Analytics" --output-file results.json

Explain Query

Generate and display the execution plan for a query.

alt-dremio-cli sql explain <QUERY> [OPTIONS]
alt-dremio-cli sql explain --file <FILE> [OPTIONS]

Arguments:

  • QUERY - SQL query string (optional if using --file)

Options:

  • --file PATH - Explain SQL from file
  • --context TEXT - Query context

Examples:

# Explain simple query
alt-dremio-cli sql explain "SELECT * FROM customers WHERE region = 'US'"

# Explain from file
alt-dremio-cli sql explain --file query.sql

# Explain with context
alt-dremio-cli sql explain "SELECT * FROM table" --context "MySpace"

Validate Query

Validate SQL query syntax without executing.

alt-dremio-cli sql validate <QUERY> [OPTIONS]
alt-dremio-cli sql validate --file <FILE> [OPTIONS]

Arguments:

  • QUERY - SQL query string (optional if using --file)

Options:

  • --file PATH - Validate SQL from file
  • --context TEXT - Query context

Examples:

# Validate query syntax
alt-dremio-cli sql validate "SELECT * FROM customers"

# Validate from file
alt-dremio-cli sql validate --file query.sql

# Validate with context
alt-dremio-cli sql validate "SELECT * FROM table" --context "MySpace"

Scenarios

Interactive Query Development

# 1. Start with a simple query
alt-dremio-cli sql execute "SELECT * FROM customers LIMIT 5"

# 2. Validate more complex query
alt-dremio-cli sql validate "SELECT c.*, o.total FROM customers c JOIN orders o ON c.id = o.customer_id"

# 3. Explain to check performance
alt-dremio-cli sql explain "SELECT c.*, o.total FROM customers c JOIN orders o ON c.id = o.customer_id"

# 4. Execute and save results
alt-dremio-cli sql execute "SELECT c.*, o.total FROM customers c JOIN orders o ON c.id = o.customer_id" --output-file results.json

File-Based Query Management

# Create query file
cat > monthly_sales.sql <<EOF
SELECT 
  DATE_TRUNC('month', order_date) as month,
  SUM(amount) as total_sales,
  COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1
ORDER BY 1 DESC
EOF

# Validate the query
alt-dremio-cli sql validate --file monthly_sales.sql

# Execute and save results
alt-dremio-cli sql execute --file monthly_sales.sql --output-file monthly_sales.json

# Explain for optimization
alt-dremio-cli sql explain --file monthly_sales.sql

Async Execution for Long Queries

# Submit long-running query
alt-dremio-cli sql execute "SELECT * FROM huge_table" --async
# Output: Job ID: abc-123-def-456

# Check job status
alt-dremio-cli job get abc-123-def-456

# Get results when ready
alt-dremio-cli job results abc-123-def-456 --output-file results.json

Context-Aware Queries

# Set context to avoid fully-qualified names
alt-dremio-cli sql execute "SELECT * FROM customers" --context "Sales"

# Instead of:
alt-dremio-cli sql execute "SELECT * FROM Sales.customers"

# Multi-level context
alt-dremio-cli sql execute "SELECT * FROM table" --context "Analytics,Reports"

Batch Query Execution

# Execute multiple queries
for query_file in queries/*.sql; do
  echo "Executing $query_file..."
  alt-dremio-cli sql execute --file "$query_file" --output-file "results/$(basename $query_file .sql).json"
done

Common Workflows

1. Query Development Cycle

# Step 1: Validate syntax
alt-dremio-cli sql validate "SELECT * FROM customers WHERE region = 'US'"

# Step 2: Check execution plan
alt-dremio-cli sql explain "SELECT * FROM customers WHERE region = 'US'"

# Step 3: Test with small dataset
alt-dremio-cli sql execute "SELECT * FROM customers WHERE region = 'US' LIMIT 10"

# Step 4: Execute full query
alt-dremio-cli sql execute "SELECT * FROM customers WHERE region = 'US'" --output-file us_customers.json

2. Performance Analysis

# Get execution plan
alt-dremio-cli sql explain "SELECT c.*, SUM(o.amount) FROM customers c JOIN orders o ON c.id = o.customer_id GROUP BY c.id" > plan.txt

# Execute and time
time alt-dremio-cli sql execute "SELECT c.*, SUM(o.amount) FROM customers c JOIN orders o ON c.id = o.customer_id GROUP BY c.id" --async

# Get job details for analysis
alt-dremio-cli job get <job-id>

# Download profile
alt-dremio-cli job profile <job-id> --download profile.zip

3. Data Export

# Export to JSON
alt-dremio-cli sql execute "SELECT * FROM customers" --output-file customers.json

# Export to YAML
alt-dremio-cli --output yaml sql execute "SELECT * FROM customers" --output-file customers.yaml

# Convert to CSV using jq
alt-dremio-cli --output json sql execute "SELECT * FROM customers" | jq -r '.rows[] | @csv' > customers.csv

4. Scheduled Queries

#!/bin/bash
# daily_report.sh

# Execute daily sales query
alt-dremio-cli sql execute --file daily_sales.sql --output-file "reports/sales_$(date +%Y%m%d).json"

# Execute customer metrics
alt-dremio-cli sql execute --file customer_metrics.sql --output-file "reports/customers_$(date +%Y%m%d).json"

# Send notification
echo "Daily reports generated" | mail -s "Dremio Reports" admin@company.com

SQL File Format

Basic Query File

You can execute single or multiple statements in a file. Statements must be separated by semicolons (;). execution stops if any query fails.

-- monthly_sales.sql
SELECT 
  DATE_TRUNC('month', order_date) as month,
  SUM(amount) as total_sales
FROM orders
GROUP BY 1
ORDER BY 1 DESC;

-- Second statement
SELECT COUNT(*) FROM orders;

Complex Query File

-- customer_analysis.sql
WITH customer_orders AS (
  SELECT 
    customer_id,
    COUNT(*) as order_count,
    SUM(amount) as total_spent
  FROM orders
  WHERE order_date >= '2024-01-01'
  GROUP BY customer_id
),
customer_segments AS (
  SELECT 
    customer_id,
    CASE 
      WHEN total_spent > 10000 THEN 'Premium'
      WHEN total_spent > 1000 THEN 'Standard'
      ELSE 'Basic'
    END as segment
  FROM customer_orders
)
SELECT 
  c.name,
  c.email,
  co.order_count,
  co.total_spent,
  cs.segment
FROM customers c
JOIN customer_orders co ON c.id = co.customer_id
JOIN customer_segments cs ON c.id = cs.customer_id
ORDER BY co.total_spent DESC

Output Formats

Table (Default)

alt-dremio-cli sql execute "SELECT * FROM customers LIMIT 5"

Output:

┌────┬──────────┬─────────────────┬────────┐
│ ID │ Name     │ Email           │ Region │
├────┼──────────┼─────────────────┼────────┤
│ 1  │ John Doe │ john@email.com  │ US     │
│ 2  │ Jane Doe │ jane@email.com  │ EU     │
└────┴──────────┴─────────────────┴────────┘

JSON

alt-dremio-cli --output json sql execute "SELECT * FROM customers LIMIT 2"

Output:

{
  "rows": [
    {"id": 1, "name": "John Doe", "email": "john@email.com"},
    {"id": 2, "name": "Jane Doe", "email": "jane@email.com"}
  ],
  "rowCount": 2
}

YAML

alt-dremio-cli --output yaml sql execute "SELECT * FROM customers LIMIT 2"

Output:

rows:
  - id: 1
    name: John Doe
    email: john@email.com
  - id: 2
    name: Jane Doe
    email: jane@email.com
rowCount: 2

Tips

  1. Use files for complex queries: Store reusable queries in files

    alt-dremio-cli sql execute --file queries/monthly_report.sql
  2. Validate before executing: Catch syntax errors early

    alt-dremio-cli sql validate --file query.sql && alt-dremio-cli sql execute --file query.sql
  3. Use async for long queries: Don't block on large queries

    alt-dremio-cli sql execute "SELECT * FROM huge_table" --async
  4. Set context to simplify queries: Avoid repeating paths

    alt-dremio-cli sql execute "SELECT * FROM table" --context "MySpace"
  5. Export results for analysis: Save to files for further processing

    alt-dremio-cli sql execute "SELECT * FROM data" --output-file data.json

Error Handling

Syntax Error

$ alt-dremio-cli sql execute "SELECT * FORM table"
Error: SQL syntax error: Encountered "FORM" at line 1, column 10

Solution: Fix the SQL syntax:

alt-dremio-cli sql execute "SELECT * FROM table"

Table Not Found

$ alt-dremio-cli sql execute "SELECT * FROM nonexistent"
Error: Table 'nonexistent' not found

Solution: Verify table exists:

alt-dremio-cli catalog list | grep "nonexistent"

Job Still Running

$ alt-dremio-cli sql execute "SELECT * FROM large_table"
⚠ Could not fetch results: Job may still be running

Solution: Use async mode or check job status:

alt-dremio-cli sql execute "SELECT * FROM large_table" --async
alt-dremio-cli job get <job-id>

Platform Differences

Software

  • Full SQL support via /api/v3/sql
  • Explain and validate work
  • All features available

Cloud

  • SELECT queries are fully supported via API
  • DDL/DML operations are supported but may have limitations compared to Software
  • Uses specialized generic SQL endpoint

Best Practices

  1. Validate queries before execution: Catch errors early
  2. Use explain for optimization: Understand query plans
  3. Store queries in files: Version control and reusability
  4. Use async for long queries: Better resource management
  5. Set appropriate context: Simplify query writing
  6. Export results for analysis: Enable downstream processing
  7. Monitor job status: Track query execution
  8. Use limits during development: Test with small datasets first

Advanced Usage

Parameterized Queries

# Create template
cat > query_template.sql <<EOF
SELECT * FROM customers WHERE region = '{REGION}' AND created_at >= '{DATE}'
EOF

# Replace parameters and execute
REGION="US"
DATE="2024-01-01"
sed "s/{REGION}/$REGION/g; s/{DATE}/$DATE/g" query_template.sql | alt-dremio-cli sql execute --file /dev/stdin

Query Pipeline

# Extract
alt-dremio-cli sql execute "SELECT * FROM source_table" --output-file extracted.json

# Transform (using jq)
cat extracted.json | jq '.rows[] | {id, name, email}' > transformed.json

# Load (create view with results)
alt-dremio-cli view create --path "Processed.customers" --sql "SELECT * FROM transformed_data"

Monitoring and Alerts

#!/bin/bash
# monitor_query.sh

# Execute query
RESULT=$(dremio sql execute "SELECT COUNT(*) as count FROM errors WHERE created_at > NOW() - INTERVAL '1 hour'")

# Parse result
ERROR_COUNT=$(echo $RESULT | jq -r '.rows[0].count')

# Alert if threshold exceeded
if [ $ERROR_COUNT -gt 100 ]; then
  echo "High error count: $ERROR_COUNT" | mail -s "Alert" admin@company.com
fi