Bella Tracer v2 is an advanced observability platform that leverages Graph Retrieval-Augmented Generation (GraphRAG) and Neo4j to analyze and understand complex distributed system traces. The platform synthesizes synthetic logs, builds dynamic knowledge graphs from observability data, and provides intelligent querying capabilities powered by AI agents.
- LangGraph-based Agent: Intelligent query processing with question optimization and answer ranking
- OpenAI Integration: Advanced LLM and embedding capabilities
- Multi-stage Processing: Query optimization, document retrieval, and semantic reranking
- Neo4j Backend: Powerful graph database for relationship mapping
- Dynamic Graph Building: Automatic creation of nodes and relationships from trace data
- Vector Search: Semantic search capabilities with OpenAI embeddings
- Synthetic Data Generation: Complex trace pattern generation for testing and validation
- Kafka Integration: Real-time data streaming and processing
- Prefect Workflows: Orchestrated data pipelines for ETL operations
- Multi-Level Trace Processing: Service, pod, and log entry correlation
- Context Extraction: Intelligent metadata parsing from observability logs
- Relationship Mapping: Automatic discovery of trace hierarchies and dependencies
┌─────────────────────────────────────────────────────────┐
│ Synthetic Data Generator Pipeline │
│ │
│ • Generates complex trace patterns │
│ • Creates realistic log sequences │
│ • Publishes to Kafka │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Kafka Broker │
└─────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Knowledge Graph Parser Pipeline │
│ │
│ • Consumes trace data from Kafka │
│ • Parses log entries into narrative format │
│ • Builds knowledge graph with LLM extraction │
│ • Stores in Neo4j with vector embeddings │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────┐
│ Neo4j Graph │
│ + Vectors │
└────────┬────────┘
│
▼
┌───────────────────────┐
│ REST API Endpoint │
│ /query - GraphRAG │
│ Powered by LangGraph │
└───────────────────────┘
| Module | Purpose |
|---|---|
api/app.py |
FastAPI REST endpoint for GraphRAG queries |
pipelines/synthetic_data_generator.py |
Generates realistic synthetic traces and logs |
pipelines/knowledge_graph_parser.py |
Converts trace data into knowledge graphs |
services/kafka.py |
Kafka producer/consumer management |
agent.py |
LangGraph agent orchestration for query processing |
models.py |
Pydantic models for request/response validation |
- Synthetic Data Generation: Creates diverse trace patterns representing different scenarios
- Kafka Streaming: Publishes generated logs to Kafka topics
- Knowledge Graph Building: Consumes logs, extracts entities/relationships, builds Neo4j graph
- Vector Indexing: Embeds chunk data for semantic search
- Query Interface: Provides REST API for intelligent trace querying
- Python 3.12+
- Neo4j 5.x
- Kafka 3.x (or use Docker)
- OpenAI API key
Create a .env file in the project root:
# Neo4j Configuration
NEO4J_URI=neo4j://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
# Kafka Configuration
KAFKA_BROKER=localhost:9092
KAFKA_TOPIC=data
# OpenAI Configuration
OPENAI_API_KEY=your_api_key_here# Install dependencies using uv
uv sync
# Or using pip
pip install -e .# Start Neo4j and Kafka using Docker Compose
docker-compose up -d# Create vector index for semantic search
make neo4j-index
# Or directly
uv run create_neo4j_indexStart both synthetic data generation and knowledge graph parsing pipelines:
make run-flowsOr run individually:
# Synthetic data generator pipeline
uv run synthetic_data_generator_pipeline
# Knowledge graph parser pipeline
uv run knowledge_graph_parser_pipeline# Start the FastAPI server
uv run api
# Server will be available at http://localhost:8000curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"question": "What services failed in the last hour?"
}'Intelligent query endpoint powered by GraphRAG.
Request:
{
"question": "string"
}Response:
{
"answer": "string",
"original_question": "string",
"optimized_question": "string",
"extracted_dates": { },
"context_sources": ["string"]
}-
Raw Log Entry (JSON)
{ "trace_id": "trace-123", "service_name": "api-gateway", "level": "ERROR", "message": "Database connection timeout", "metadata": [ {"key": "pod_id", "value": "pod-456"}, {"key": "db.statement", "value": "SELECT * FROM users"} ] } -
Narrative Extraction
Service 'api-gateway' (running on pod 'pod-456') logged level ERROR with message: "Database connection timeout". Context: executed database query 'SELECT * FROM users' -
Knowledge Graph Nodes & Relationships
- Nodes: Service, Trace, Pod, LogEntry, Database
- Relationships: PART_OF_TRACE, RUNNING_ON, EXECUTED_QUERY
bella-tracer-v2/
├── src/bella_tracer_v2/
│ ├── api/ # FastAPI application
│ │ └── app.py
│ ├── pipelines/ # ETL pipelines
│ │ ├── synthetic_data_generator.py
│ │ └── knowledge_graph_parser.py
│ ├── services/ # External services
│ │ └── kafka.py
│ ├── agent.py # LangGraph agent
│ ├── models.py # Data models
│ ├── main.py # Entry points
│ └── synthetic_data.py # Trace generation
├── artifacts/ # Generated datasets
├── docker-compose.yaml # Local environment
├── Makefile # Build commands
└── pyproject.toml # Project metadata
- LangChain: AI framework and tool integrations
- LangGraph: Agent orchestration and workflow
- Neo4j GraphRAG: Knowledge graph RAG
- FastAPI: REST API framework
- Prefect: Workflow orchestration
- Kafka: Distributed streaming
- OpenAI: LLM and embeddings
- spaCy: NLP processing
- Pandas: Data manipulation
Contributions are welcome! Please ensure:
- Code follows PEP 8 standards
- Tests are provided for new features
- Documentation is updated accordingly
This project is licensed under the MIT License - see the LICENSE file for details.
For issues, questions, or suggestions, please open an issue on the repository.
Status: Beta - Under active development
Last Updated: December 2025