Pale Fire is an advanced knowledge graph search system built on Graphiti, featuring:
- 5-Factor Ranking System: Semantic, connectivity, temporal, query matching, and entity-type intelligence
- Question-Type Detection: Automatically understands WHO/WHERE/WHEN/WHAT/WHY/HOW questions
- NER Enrichment: Extracts and tags entities (PER, LOC, ORG, DATE, etc.)
- Multi-Factor Search: Combines multiple relevance signals for optimal results
cd /path/to/palefire
# Install base dependencies
pip install graphiti-core python-dotenv websockets youtube-transcript-api
# Install NER dependencies (optional but recommended)
pip install -r requirements-ner.txt
python -m spacy download en_core_web_sm
# Install keyword extraction dependencies
pip install gensim>=4.3.0
# Optional: For better stemming support
pip install nltkCopy the example configuration file and customize it:
cp env.example .env
# Edit .env with your settingsKey configuration options in .env:
# Neo4j Configuration (REQUIRED)
NEO4J_URI=bolt://10.147.18.253:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password
# LLM Provider ('ollama' or 'openai')
LLM_PROVIDER=ollama
OPENAI_API_KEY=your_api_key_here
# Ollama Configuration
OLLAMA_BASE_URL=http://10.147.18.253:11434/v1
OLLAMA_MODEL=deepseek-r1:7b
OLLAMA_VERIFICATION_MODEL=gpt-oss:latest # Optional: separate model for NER verification
# Search Configuration
DEFAULT_SEARCH_METHOD=question-aware
SEARCH_RESULT_LIMIT=20
SEARCH_TOP_K=5
# Ranking Weights (must sum to <= 1.0)
WEIGHT_CONNECTION=0.15
WEIGHT_TEMPORAL=0.20
WEIGHT_QUERY_MATCH=0.20
WEIGHT_ENTITY_TYPE=0.15All configuration is centralized in config.py, which reads from environment variables with sensible defaults.
Check your current configuration:
python palefire-cli.py configThis will display all settings including Neo4j connection, LLM configuration, search parameters, and ranking weights.
Create a JSON file with your episodes (see example_episodes.json):
[
{
"content": "Your content here...",
"type": "text",
"description": "your description"
},
# Add more episodes...
]# First time: Ingest episodes with NER enrichment
# Set ADD = True in palefire-cli.py
python palefire-cli.py
# After ingestion: Run search queries
# Set ADD = False in palefire-cli.py
python palefire-cli.pypalefire/
├── palefire-cli.py # Main CLI application
├── RANKING_SYSTEM.md # Ranking system documentation
├── NER_ENRICHMENT.md # NER system documentation
├── QUESTION_TYPE_DETECTION.md # Question-type detection guide
├── QUERY_MATCH_SCORING.md # Query matching documentation
├── requirements-ner.txt # NER dependencies
└── PALEFIRE_SETUP.md # This file
- Automatic entity extraction (persons, locations, organizations, dates)
- Entity-enriched content for better graph understanding
- Visual feedback during ingestion
- Detects 8 question types (WHO, WHERE, WHEN, etc.)
- Automatically adjusts entity type weights
- Example: WHO questions boost person entities 2.0x
- Semantic (30%): RRF hybrid search
- Connectivity (15%): Graph connections
- Temporal (20%): Time period matching
- Query Match (20%): Term matching
- Entity Type (15%): Question-type alignment
Run all 5 search approaches side-by-side:
- Standard RRF
- Connection-based
- Temporal-aware
- Multi-factor
- Question-aware (recommended)
Edit in palefire-cli.py:
neo4j_uri = "bolt://your-server:7687"
neo4j_user = "your-username"
neo4j_password = "your-password"Currently configured for Ollama. To use OpenAI:
llm_config = LLMConfig(
api_key=os.environ.get('OPENAI_API_KEY'),
model="gpt-4",
small_model="gpt-3.5-turbo",
base_url=None, # Use OpenAI default
)For question-aware search:
await search_episodes_with_question_aware_ranking(
graphiti, query,
connection_weight=0.15, # Graph connectivity
temporal_weight=0.20, # Time matching
query_match_weight=0.20, # Term matching
entity_type_weight=0.15 # Entity type intelligence
# Remaining 30% = semantic relevance
)In palefire-cli.py, extend QuestionTypeDetector.QUESTION_PATTERNS:
'CUSTOM_TYPE': {
'patterns': [r'\byour pattern\b'],
'entity_weights': {'PER': 1.5, 'LOC': 1.2},
'description': 'Your custom query type'
}# WHO questions (boosts person entities)
"Who was the California Attorney General in 2020?"
"Who is Gavin Newsom?"
# WHERE questions (boosts location entities)
"Where did Kamala Harris work as district attorney?"
"Where is the Attorney General's office located?"
# WHEN questions (boosts date entities)
"When did Gavin Newsom become governor?"
"When was Kamala Harris Attorney General?"
# WHAT questions (boosts organization/role entities)
"What position did Harris hold in 2015?"
"What organization did she lead?"- Use spaCy for NER: Much better accuracy than pattern-based fallback
- Batch Ingestion: Process episodes in batches for large datasets
- Index Optimization: Ensure Neo4j indices are built (done automatically)
- Adjust Weights: Tune ranking weights based on your use case
- Cache Results: Consider caching frequently accessed results
python -m spacy download en_core_web_sm- Verify Neo4j is running
- Check connection details in .env
- Test connection:
neo4j://localhost:7687
- Ensure NER enrichment is enabled (spaCy installed)
- Check entity extraction during ingestion
- Adjust ranking weights
- Try question-aware search instead of standard
- Reduce
node_search_config.limit(default: 20) - Process episodes in smaller batches
- Use pattern-based NER instead of spaCy
- RANKING_SYSTEM.md: Complete ranking system guide
- NER_ENRICHMENT.md: NER system documentation
- QUESTION_TYPE_DETECTION.md: Question-type detection guide
- QUERY_MATCH_SCORING.md: Query matching details
- Add Your Data: Replace example episodes with your content
- Run Ingestion: Set
ADD = Trueand runpython palefire-cli.py - Test Queries: Set
ADD = Falseand test different query types - Tune Weights: Adjust ranking weights for your use case
- Monitor Performance: Track search accuracy and speed
Extend EntityEnricher.ENTITY_TYPES for domain-specific entities
Add language-specific patterns to QuestionTypeDetector
Wrap search functions in FastAPI/Flask for REST API access
Process large datasets with async batch operations
Implement Redis caching for frequently accessed queries
For issues or questions:
- Check documentation files in this directory
- Review example queries in
palefire-cli.py - Examine console output for debugging information
Inherits license from parent Open WebUI project.
Pale Fire - Named after Vladimir Nabokov's novel, where a poem becomes the subject of extensive commentary and interpretation, much like how this system builds a rich knowledge graph from text and enables intelligent exploration through questions.