Opentir is a comprehensive open source project that builds on, with, and among Palantir's open source technologies. It provides a complete toolkit for analyzing, organizing, and documenting Palantir's extensive open source ecosystem of 250+ repositories, 6.2+ million lines of code, and 500,000+ functions and classes.
📊 Complete Palantir Ecosystem Guide
Palantir has built one of the most comprehensive enterprise open source ecosystems, spanning:
- 🏗️ Infrastructure & Data: Hadoop, Spark, Cassandra, AtlasDB, Parquet
- 🌐 Web Development: Blueprint, Plottable, TSLint, React Native
- ⚙️ Backend Services: Conjure, Dialogue, Witchcraft frameworks
- 🛠️ Developer Tools: gödel, Gradle plugins, Code quality tools
- 🔐 Security & Compliance: Encryption, Authentication, Policy management
- Blueprint - React UI toolkit (20,000+ stars)
- TSLint - TypeScript linter (6,000+ stars)
- Plottable - D3 charting library (2,900+ stars)
- AtlasDB - Distributed database (800+ stars)
- Java Ecosystem - 24,979 files (70.3% of codebase)
- TypeScript/JavaScript - 7,014 files (19.8% of codebase)
- Go Ecosystem - 2,517 files (7.1% of codebase)
- Python Tools - 1,006 files (2.8% of codebase)
- Automated Discovery: Fetch all 250+ Palantir repositories from GitHub
- Organized Structure: Automatically organize repos by language, category, and popularity
- Smart Cloning: Efficient cloning with rate limiting and error handling
- Package Categories: Deep categorization by function and technology
- Multi-Language Support: Analyze Python, JavaScript, TypeScript, Java, Go, Rust, and more
- Method Extraction: Extract and catalog all 500,000+ functions, classes, and methods
- Complexity Analysis: Calculate code complexity and quality metrics across all repos
- Functionality Mapping: Generate comprehensive functionality matrices
- Vast Functionality Tables: Detailed tables of all 100,000+ functions
- Interactive Documentation: Beautiful, searchable documentation
- API Reference: Comprehensive API documentation with cross-references
- Analysis Reports: Detailed reports on dependencies, metrics, and patterns
- CLI Tool: Powerful command-line interface for all operations
- Python API: Programmatic access to all functionality
- Async Support: High-performance async operations for large-scale analysis
While you could manually clone 250+ repositories, Opentir transforms raw code into actionable intelligence:
- ANOVA Testing: Statistical significance across repository metrics
- Principal Component Analysis: Identify the most influential code patterns
- Clustering Analysis: Automatically group repositories by functionality
- Confidence Intervals: Quantify uncertainty in code quality metrics
- Effect Size Calculations: Measure practical significance of differences
- Cross-Repository Dependencies: Map how 250+ repos interconnect
- Functionality Overlap Detection: Find redundant implementations across projects
- Architecture Pattern Analysis: Identify common design patterns at scale
- Code Evolution Tracking: Understand how Palantir's practices have evolved
- Technical Debt Assessment: Quantify maintenance burden across the ecosystem
- Interactive Network Graphs: Explore repository relationships dynamically
- Complexity Heatmaps: Visualize code complexity across the entire ecosystem
- Dependency Trees: Navigate intricate inter-project dependencies
- Timeline Analysis: Track project evolution and activity patterns
- Technology Stack Distribution: Understand language and framework adoption
- Technology Maturity Scoring: Evaluate stability and adoption readiness
- Maintenance Risk Analysis: Identify projects with sustainability concerns
- Integration Complexity Mapping: Plan implementation strategies
- Resource Allocation Insights: Understand development effort requirements
- Compliance & Security Analysis: Assess enterprise readiness
- Smart Package Discovery: Find the right tool for specific use cases
- Integration Examples: See how packages work together in practice
- Best Practice Extraction: Learn from Palantir's engineering excellence
- Performance Benchmarking: Compare alternatives with data-driven insights
- Documentation Quality Assessment: Evaluate learning curve and support
| Task | Manual Effort | With Opentir | Time Saved |
|---|---|---|---|
| Repository Discovery | 2-3 hours browsing GitHub | 5 minutes automated | 95% faster |
| Code Analysis | 40+ hours per repo × 250 | 2 hours total | 99.8% faster |
| Documentation Generation | 200+ hours writing docs | Automated | 100% saved |
| Dependency Mapping | 100+ hours manual tracing | Instant visualization | 100% saved |
| Statistical Analysis | Weeks of data science work | Built-in analytics | 95% faster |
| Cross-Reference Building | Months of manual linking | Automated cross-refs | 100% saved |
- ✅ 500,000+ Functions Cataloged - Searchable database of all capabilities
- ✅ Comprehensive Metrics - Complexity, quality, and performance data
- ✅ Integration Roadmaps - How to combine packages effectively
- ✅ Executive Summaries - Business-level insights for decision makers
- ✅ Technical Deep Dives - Engineer-level implementation details
- ✅ Risk Assessments - Maintenance, security, and compliance analysis
- Extract and analyze 500,000+ code elements
- Calculate complexity metrics across 6.2M+ lines of code
- Identify architectural patterns and design principles
- Generate quality scorecards for every repository
- Map inter-project dependencies and relationships
- Discover functionality overlaps and integration opportunities
- Analyze technology stack evolution and adoption patterns
- Identify key maintainers and community health metrics
- Technology Roadmap Planning: Which packages align with your architecture
- Risk Mitigation: Identify deprecated or poorly maintained projects
- Investment Prioritization: Focus on high-impact, well-supported tools
- Team Skills Development: Understand learning paths and complexity curves
- Implementation Playbooks: Step-by-step integration guides
- Performance Optimization: Benchmarks and tuning recommendations
- Security Compliance: Vulnerability assessments and best practices
- Monitoring & Maintenance: Long-term sustainability planning
- AST Parsing: Beyond regex - true semantic understanding
- Cyclomatic Complexity: Scientific measurement of code difficulty
- Documentation Coverage: Quantified maintainability metrics
- API Surface Analysis: Complete public interface cataloging
- Regression Analysis: Predict project success and maintenance needs
- Correlation Studies: Understand relationships between metrics
- Outlier Detection: Identify exceptional projects for deeper study
- Trend Analysis: Track ecosystem evolution over time
- Dependency Graphs: Visualize the entire ecosystem's structure
- Centrality Metrics: Identify the most critical packages
- Community Detection: Find natural groupings of related projects
- Path Analysis: Understand technology migration routes
- Executive Briefings: C-suite appropriate technology assessments
- Technical Specifications: Detailed integration requirements
- Risk Registers: Comprehensive risk/benefit analysis
- ROI Calculations: Quantified business impact projections
- Peer-Reviewable Methods: Reproducible analysis techniques
- Statistical Validation: Hypothesis testing and significance analysis
- Confidence Metrics: Uncertainty quantification throughout
- Methodology Documentation: Complete analytical transparency
Opentir doesn't just give you code repositories - it gives you:
- 🔬 Research-Grade Analysis - Publication-quality insights
- 📊 Data-Driven Decisions - Move beyond gut feelings
- ⚡ Instant Expertise - Years of ecosystem knowledge in hours
- 🎯 Strategic Clarity - Clear technology adoption roadmaps
- 💼 Professional Deliverables - Enterprise-ready documentation
- 🔄 Continuous Intelligence - Keep analysis current as ecosystem evolves
Transform your approach from ad-hoc exploration to systematic intelligence.
opentir/
├── src/ # Core Opentir package
│ ├── __init__.py # Package initialization
│ ├── github_client.py # GitHub API client with rate limiting
│ ├── repo_manager.py # Repository cloning and organization
│ ├── code_analyzer.py # Multi-language code analysis
│ ├── docs_generator.py # Documentation generation
│ ├── cli.py # Command-line interface
│ ├── main.py # Main orchestrator
│ ├── config.py # Configuration management
│ ├── utils.py # Utility functions and logging
│ └── templates/ # Documentation templates
├── repos/ # Cloned repositories (created during execution)
│ ├── all_repos/ # All Palantir repositories
│ ├── by_language/ # Organized by programming language
│ ├── by_category/ # Organized by functionality
│ └── popular/ # Popular repositories (1000+ stars)
├── docs/ # Generated documentation
│ ├── index.md # Main documentation
│ ├── repositories/ # Repository-specific docs
│ ├── api_reference/ # API documentation
│ ├── analysis/ # Analysis reports
│ └── mkdocs.yml # MkDocs configuration
├── examples/ # Usage examples
├── tests/ # Test suite
├── requirements.txt # Python dependencies
├── setup.py # Package configuration
└── README.md # This file
- Foundry Platform - Unified data platform
- Conjure Framework - API-first development
- Witchcraft Services - Microservices framework
- gödel Ecosystem - Go development lifecycle
- Largest Repositories
- Hadoop (2.18M lines), Cassandra (417K lines), AtlasDB (378K lines)
- Most Complex Packages
- Advanced algorithms, distributed systems, enterprise frameworks
- Most Popular
- Blueprint, TSLint, Plottable with thousands of stars
Palantir Architecture Deep Dive
- Microservices Patterns - Conjure + Witchcraft
- Data Pipeline Architecture - Hadoop + Spark + Parquet
- Frontend Architecture - Blueprint + TypeScript + React
- Security Architecture - Encryption + Auth + Compliance
Palantir Solutions by Use Case
-
- Components: Hadoop + Spark + AtlasDB + Foundry
- Scale: Petabyte-scale data processing
-
- Components: Blueprint + TypeScript + Conjure APIs
- Scale: Complex, data-dense interfaces
-
- Components: Witchcraft + Dialogue + Service mesh
- Scale: High-throughput distributed systems
-
- Components: gödel + Gradle plugins + Quality tools
- Scale: Large-scale software development
Comprehensive Technical Reports
-
- Average complexity: 1.04 across all repos
- Documentation coverage: 65%+ across major projects
-
- Cross-repository dependencies and integration patterns
- External dependency usage and management
-
- Benchmarks, scalability metrics, resource usage
-
- Project lifecycle, adoption patterns, maintenance status
# Quick exploration of Palantir's ecosystem
opentir build-complete
# View ecosystem overview
open docs/palantir/index.md
# Explore specific categories
open docs/palantir/categories/data-analytics.md
open docs/palantir/categories/web-development.md# Example: Setting up Blueprint development
npm install @blueprintjs/core @blueprintjs/icons
# See: docs/palantir/flagship/blueprint.md
# Example: Using Conjure for APIs
# See: docs/palantir/enterprise/conjure.md
# Example: Setting up data pipeline with Spark
# See: docs/palantir/categories/data-analytics.md# Clone the repository
git clone https://github.com/username/opentir.git
cd opentir
# Install dependencies
pip install -r requirements.txt
# Install the package
pip install -e .Opentir works with or without a GitHub API token, but a token is highly recommended for downloading all 250+ repositories:
- Without token: 60 requests/hour (will hit rate limits quickly)
- With token: 5,000 requests/hour (smooth operation)
Just run the command - Opentir will guide you through token setup:
opentir build-completeThe tool will:
- Detect if no token is available
- Show you exactly how to get a GitHub token
- Let you paste the token securely
- Allow you to skip and continue without a token
export GITHUB_TOKEN=your_github_token_here
opentir build-completeopentir build-complete --token your_github_token_here- Go to: https://github.com/settings/tokens
- Click "Generate new token" → "Generate new token (classic)"
- Give it a name like
opentir-access - Select scope: "public_repo" (read access to public repositories)
- Click "Generate token" and copy the token
- Use it with any of the options above
Download all 250+ Palantir repositories with one command:
opentir build-completeThis single command will:
- ✅ Prompt for GitHub token (if needed)
- ✅ Download all Palantir repositories
- ✅ Organize them by language and category
- ✅ Analyze all code and extract functionality
- ✅ Generate comprehensive documentation
# Complete build (recommended - does everything)
opentir build-complete
# Check what's been downloaded
opentir status
# View generated documentation
cd docs && mkdocs serveimport asyncio
from src.main import build_complete_ecosystem
async def main():
results = await build_complete_ecosystem(
github_token="your_token",
force_reclone=False
)
print(f"Analyzed {results['summary']['repositories_analyzed']} repositories!")
asyncio.run(main())# Complete workflow - does everything in one command
opentir build-completeInteractive token setup included - just run and follow prompts!
If you prefer to run individual steps:
# Fetch repository information
opentir fetch-repos
# Clone all repositories
opentir clone-all
# Analyze code and extract functionality
opentir analyze
# Generate comprehensive documentation
opentir generate-docsAll commands will prompt for GitHub token if needed
# Show workspace status
opentir status
# Clean up repositories
opentir cleanup --keep-popularAll commands support multiple ways to provide GitHub tokens:
# Interactive prompt (easiest)
opentir build-complete
# Command line argument
opentir build-complete --token YOUR_TOKEN
# Environment variable
export GITHUB_TOKEN=YOUR_TOKEN
opentir build-completeAfter running the complete workflow, you'll have:
- 250+ Cloned Repositories organized by language and category
- Comprehensive Code Analysis with extracted methods and functionality
- Interactive Documentation with searchable API reference
- Functionality Matrix showing capabilities across all repositories
- Analysis Reports with metrics, dependencies, and patterns
| Repository | Lines of Code | Files | Primary Language | Key Features |
|---|---|---|---|---|
| hadoop | 2.18M | 10,650 | Java/Python | Distributed storage & processing |
| cassandra | 417K | 1,892 | Java/Python | NoSQL distributed database |
| atlasdb | 378K | 3,226 | Java | Distributed transactional DB |
| spark | 300K | 1,538 | Java/Python | Unified analytics engine |
| react-native | 222K | 1,755 | JavaScript | Cross-platform mobile |
| blueprint | 110K | 816 | TypeScript | React UI toolkit |
| parquet-mr | 122K | 768 | Java/Python | Columnar storage format |
| conjure-java | 88K | 636 | Java | API code generation |
| plottable | 49K | 286 | TypeScript | D3-based charting |
| gradle-baseline | 41K | 307 | Java | Gradle build standards |
Complete Developer Tools Guide
- palantir-java-format - Java code formatter
- tslint - TypeScript linter with 150+ rules
- godel - Go development lifecycle tool
- gradle-consistent-versions - Dependency management
- policy-bot - GitHub workflow automation
- blueprint - 40+ React components for data-dense UIs
- documentalist - Documentation generation
- typesettable - Text layout for SVG/Canvas
- redoodle - Redux utilities
- tslint-react - React-specific linting rules
- conjure - API specification and code generation
- dialogue - HTTP client library
- witchcraft-go-server - Go web server framework
- tracing-java - Distributed tracing
- metrics-schema - Metrics collection
- encrypted-config-value - Secure configuration
- auth-tokens - Authentication token management
- safe-logging - Secure logging utilities
- bouncer - Access control and security
- godel - Build, test, and distribute Go projects
- distgo - Go application distribution
- okgo - Go code quality checks
- pkg - Common Go utilities
- go-baseapp - Base application framework
- foundry-platform-python - Foundry Python SDK
- conjure-python - Python client generation
- python-language-server - Language server protocol
- typedjsonrpc - Type-safe JSON-RPC
Choose the Right Palantir Packages
- Core: Hadoop + Spark + Parquet + AtlasDB
- Streaming: Kafka integrations + Spark Streaming
- Storage: HDFS + Cassandra + Iceberg
- Core: Blueprint + TypeScript + React
- Visualization: Plottable + D3 integrations
- Mobile: React Native components
- Core: Conjure + Dialogue + Witchcraft
- Java: Spring integrations + Gradle plugins
- Go: gödel + Service frameworks
- Build: gödel + Gradle baseline + Formatters
- Security: Encrypted config + Auth tokens
- Monitoring: Tracing + Metrics + Logging
- Fetches all Palantir repositories via GitHub API
- Organizes repositories by language, category, and popularity
- Handles rate limiting and error recovery
- Provides cleanup and update functionality
- Multi-Language Parsing: Python (AST), JavaScript/TypeScript (regex), Java, Go
- Element Extraction: Functions, classes, methods, variables
- Complexity Analysis: Cyclomatic complexity calculation
- Pattern Recognition: Common naming patterns and functionality categories
- MkDocs Integration: Beautiful, searchable documentation
- Functionality Tables: Comprehensive tables of all methods and capabilities
- Cross-References: Links between repositories and functionality
- Export Formats: JSON, CSV, and Markdown outputs
- Async Operations: Concurrent API calls and processing
- Rate Limiting: Respectful GitHub API usage
- Caching: Intelligent caching of analysis results
- Progress Tracking: Real-time progress indicators
Create a .env file:
GITHUB_TOKEN=your_github_token_here
LOG_LEVEL=INFO
ANALYSIS_DEPTH=comprehensiveIf you encounter rate limit errors:
-
Get a GitHub token (most common solution):
opentir build-complete # Will prompt for token -
Check your current rate limit:
curl -H "Authorization: token YOUR_TOKEN" https://api.github.com/rate_limit -
Wait and retry (rate limits reset hourly):
# Rate limits reset every hour opentir build-complete --token YOUR_TOKEN
"No repositories found"
- Run
opentir statusto check workspace state - Ensure you have internet connection
- Try with a GitHub token
"Permission denied"
- Check your GitHub token has
public_reposcope - Regenerate token if it's expired
"Analysis failed"
- Check disk space (repos can be several GB)
- Run
opentir cleanupto free space - Run individual steps to isolate issues
Opentir provides deep insights into Palantir's ecosystem:
- Language Distribution: Which languages are most used
- Functionality Patterns: Common patterns across repositories
- Code Quality Metrics: Complexity, documentation coverage
- Dependency Analysis: Inter-repository dependencies
- Activity Metrics: Most active and popular projects
We welcome contributions! Please see our Contributing Guide for details.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
- 🏠 Main Palantir Guide - Complete ecosystem overview
- 📋 All Package Categories - Organized by function
- 🏆 Flagship Projects - Most popular repositories
- 🏢 Enterprise Solutions - Large-scale platforms
- ☕ Java Ecosystem - 24,979 files, 70.3% of codebase
- 📜 TypeScript/JavaScript - 7,014 files, 19.8% of codebase
- 🐹 Go Development - 2,517 files, 7.1% of codebase
- 🐍 Python Tools - 1,006 files, 2.8% of codebase
- 📈 Repository Rankings - By size, complexity, popularity
- 🔧 Code Quality Metrics - Complexity, documentation coverage
- 🔗 Dependency Analysis - Cross-repo relationships
- ⚡ Performance Benchmarks - Speed, scalability metrics
- 📦 Package Selection Guide - Choose the right tools
- 🏗️ Architecture Patterns - Design patterns and best practices
- 💼 Use Case Scenarios - Solutions by industry/function
- 🚀 Getting Started Guides - Quick start for different roles
- 📖 API Reference - Complete API documentation
- 🔍 Function Catalog - 100,000+ functions searchable
- 🏛️ Class Hierarchy - 50,000+ classes documented
- 🔄 Cross-References - Inter-package relationships
- Palantir GitHub Organization - Official repositories
- Popular Palantir Projects - Most starred
- Palantir Developer Portal - Official developer resources
- Blueprint Design System - UI component documentation
- Conjure Documentation - API specification framework
After complete analysis of Palantir's ecosystem, you'll have access to:
- 🏛️ 250+ repositories fully analyzed and categorized
- 📊 6.2M+ lines of code across all programming languages
- 🔧 500K+ code elements (functions, classes, methods) cataloged
- 📈 Complexity metrics and quality assessments for each repository
- 🔗 Dependency mapping showing inter-repository relationships
- 📚 Interactive documentation with full-text search capabilities
- 🎯 Usage patterns and integration examples
- 📋 Functionality matrix showing capabilities across all projects
| Command | Purpose | Documentation |
|---|---|---|
opentir build-complete |
🚀 Complete ecosystem analysis | Quick Start |
opentir status |
📊 Check analysis progress | Status Guide |
opentir clone-all |
📥 Download all repositories | Repository Management |
opentir analyze |
🔍 Code analysis & extraction | Analysis Guide |
opentir generate-docs |
📚 Generate documentation | Documentation Guide |
Token Setup Options:
- 🎯 Interactive:
opentir build-complete(guided setup) - 🔧 Environment:
export GITHUB_TOKEN=your_token - ⚡ Command line:
opentir build-complete --token your_token
Start with Palantir Ecosystem Overview
- Data Engineering: Data & Analytics Guide
- Frontend Development: Web Development Guide
- Backend Services: Backend Services Guide
- DevOps/Tools: Developer Tools Guide
Explore specific packages like Blueprint, Conjure, or gödel
Follow integration examples and best practices
🌟 Built with ❤️ for the open source community and Palantir's incredible ecosystem of 250+ packages