Ion is an enterprise-grade observability client for Go services. It unifies structured logging (Zap) and distributed tracing (OpenTelemetry) into a single, cohesive API designed for high-throughput, long-running infrastructure.
Status: v0.2 Release Candidate
Target: Microservices, Blockchain Nodes, Distributed Systems
Ion is built on strict operational guarantees. Operators can rely on these invariants in production:
- No Process Termination: Ion will never call
os.Exit,panic, orlog.Fatal. EvenCriticallevel logs are strictly informational (mapped to FATAL severity) and guarantee control flow returns to the caller. - Thread Safety: All public APIs on
LoggerandTracerare safe for concurrent use by multiple goroutines. - Non-Blocking Telemetry: Trace export is asynchronous and decoupled from application logic. A slow OTEL collector will never block your business logic (logs are synchronous to properly handle crash reporting, but rely on high-performance buffered writes).
- Failure Isolation: Telemetry backend failures (e.g., Collector down) are isolated. They may result in data loss (dropped spans) but will never crash the service.
To maintain focus and stability, Ion explicitly avoids:
- Metrics: Use the Prometheus or OpenTelemetry Metrics SDKs directly.
- Alerting: Ion emits signals; it does not manage thresholds or paging.
- Framework Magic: Ion does not auto-inject into HTTP handlers without explicit middleware usage.
- Logs: Emitted synchronously to the configured cores (Console/File/Memory). This ensures that if your application crashes immediately after a log statement, the log is persisted (up to OS buffering).
- Traces: Buffered and exported asynchronously. Spans are batched in memory and sent to the configured OTEL, endpoint on a timer or size threshold.
- Correlation:
trace_idandspan_idare extracted fromcontext.Contextat the moment of logging and injected as fields.
- Logs: Use for state changes, errors, and high-cardinality events (e.g., specific transaction failure reasons). Logs must be reliable and available immediately.
- Traces: Use for latency analysis, causality (who called whom), and request flows. Traces are sampled and statistically significant, but individual traces may be dropped under load.
go get github.com/JupiterMetaLabs/ionRequires Go 1.21+.
A minimal, correct example for a production service.
package main
import (
"context"
"log"
"time"
"github.com/JupiterMetaLabs/ion"
)
func main() {
ctx := context.Background()
// 1. Initialize with Service Identity
// Returns warning slice for non-fatal config issues (e.g. invalid OTEL url)
app, warnings, err := ion.New(ion.Default().WithService("payment-node"))
if err != nil {
log.Fatalf("Fatal: failed to init observability: %v", err)
}
for _, w := range warnings {
log.Printf("Ion Startup Warning: %v", w)
}
// 2. Establishing the Lifecycle Contract
// Ensure logs/traces flush before exit.
defer func() {
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
// Errors here mean data loss, not application failure.
if err := app.Shutdown(shutdownCtx); err != nil {
log.Printf("Shutdown data loss: %v", err)
}
}()
// 3. Application Logic
app.Info(ctx, "node started", ion.String("version", "1.0.0"))
// Simulate work
doWork(ctx, app)
}
func doWork(ctx context.Context, logger ion.Logger) {
// Context is mandatory for correlation
logger.Info(ctx, "processing block", ion.Uint64("height", 100))
}Ion uses a comprehensive configuration struct for behavior control. This maps 1:1 with ion.Config.
| Field | Type | Default | Description |
|---|---|---|---|
Level |
string |
"info" |
Minimum log level (debug, info, warn, error, fatal). |
Development |
bool |
false |
Enables development mode (pretty output, caller location, stack traces). |
ServiceName |
string |
"unknown" |
Identity of the service (vital for trace attribution). |
Version |
string |
"" |
Service version (e.g., commit hash or semver). |
Console |
ConsoleConfig |
Enabled: true |
configuration for stdout/stderr. |
File |
FileConfig |
Enabled: false |
configuration for file logging (with rotation). |
OTEL |
OTELConfig |
Enabled: false |
configuration for remote OpenTelemetry logging. |
Tracing |
TracingConfig |
Enabled: false |
configuration for Distributed Tracing. |
| Field | Type | Default | Description |
|---|---|---|---|
Enabled |
bool |
true |
If false, stdout/stderr is silenced. |
Format |
string |
"json" |
"json" (production) or "pretty" (human-readable). |
Color |
bool |
true |
Enables ANSI colors (only references pretty format). |
ErrorsToStderr |
bool |
true |
Writes warn/error/fatal to stderr, others to stdout. |
| Field | Type | Default | Description |
|---|---|---|---|
Enabled |
bool |
false |
Enables file writing. |
Path |
string |
"" |
Absolute path to the log file (e.g., /var/log/app.log). |
MaxSizeMB |
int |
100 |
Max size per file before rotation. |
MaxBackups |
int |
5 |
Number of old files to keep. |
MaxAgeDays |
int |
7 |
Max age of files to keep. |
Compress |
bool |
true |
Gzip old log files. |
Controls the OpenTelemetry Logs Exporter.
| Field | Type | Default | Description |
|---|---|---|---|
Enabled |
bool |
false |
Enables log export to Collector. |
Endpoint |
string |
"" |
host:port (e.g., localhost:4317). |
Protocol |
string |
"grpc" |
"grpc" (recommended) or "http". |
Insecure |
bool |
false |
If true, disables TLS (dev only). |
Username |
string |
"" |
Basic Auth Username. |
Password |
string |
"" |
Basic Auth Password. |
BatchSize |
int |
512 |
Max logs per export batch. |
ExportInterval |
Duration |
5s |
flush interval. |
Controls the OpenTelemetry Trace Provider.
| Field | Type | Default | Description |
|---|---|---|---|
Enabled |
bool |
false |
Enables trace generation and export. |
Endpoint |
string |
"" |
host:port. Defaults to OTEL.Endpoint if empty. |
Sampler |
string |
"always" |
"always", "never", or "ratio:0.X" (e.g., ratio:0.1 for 10%). |
Protocol |
string |
"grpc" |
"grpc" or "http". |
For full control, initialize the struct directly rather than using builders.
cfg := ion.Config{
Level: "info",
ServiceName: "payment-service",
Version: "v1.2.3",
Console: ion.ConsoleConfig{
Enabled: true,
Format: "json",
ErrorsToStderr: true,
},
// File rotation
File: ion.FileConfig{
Enabled: true,
Path: "/var/log/payment-service.log",
MaxSizeMB: 500,
MaxBackups: 3,
Compress: true,
},
// Remote Telemetry (OTEL)
OTEL: ion.OTELConfig{
Enabled: true,
Endpoint: "otel-collector.prod:4317",
Protocol: "grpc",
// Attributes added to every log
Attributes: map[string]string{
"cluster": "us-east-1",
"env": "production",
},
},
// Distributed Tracing
Tracing: ion.TracingConfig{
Enabled: true,
// Fallback to OTEL.Endpoint is automatic if this is empty
Sampler: "ratio:0.05", // Sample 5% of traces
},
}
// Initialize
app, warnings, err := ion.New(cfg)
if err != nil {
panic(err)
}
// Handle warnings (e.g., invalid sampler string fallback)
for _, w := range warnings {
log.Println("Ion config warning:", w)
}
defer app.Shutdown(context.Background())๐ Deep Dive: For enterprise patterns, advanced tracing, and blockchain-specific examples, see the Enterprise User Guide.
Use the Logger for human-readable events, state changes, and errors.
- Always pass
context.Context(even if generic). - Prefer typed fields (
ion.String) over genericion.F. - Do not log sensitive data (PII/Secrets).
// INFO: Operational state changes
app.Info(ctx, "transaction processed",
ion.String("tx_id", "0x123"),
ion.Duration("latency", 50 * time.Millisecond),
)
// ERROR: Actionable failures. Does not interrupt flow.
if err != nil {
// Automatically adds "error": err.Error() field
app.Error(ctx, "database connection failed", err, ion.String("db_host", "primary"))
}
// CRITICAL: Invariant violations (e.g. data corruption).
// Use this for "wake up the on-call" events.
// GUARANTEE: Does NOT call os.Exit(). Safe to use in libraries.
app.Critical(ctx, "memory corruption detected", nil)
### 3. Child Loggers (Scopes)
Use `With` and `Named` to create context-aware sub-loggers. This is often better than passing raw `ion` fields everywhere.
* **`Named(name)`**: Appends to the logger name. Good for components.
* `app` -> `app.http` -> `app.http.client`
* **`With(fields...)`**: Permanently attaches fields to all logs from this logger.
```go
// In your constructor
func NewPaymentService(root ion.Logger) *PaymentService {
// All logs from this service will have "service": "payment" and "component": "core"
// AND be named "main.payment"
log := root.Named("payment").With(ion.String("component", "core"))
return &PaymentService{log: log}
}
func (s *PaymentService) Process(ctx context.Context) {
// Log comes out as:
// logger="main.payment" component="core" msg="processing"
s.log.Info(ctx, "processing")
}
Use the Tracer for latency measurement and causal chains.
- Start/End: Every
StartMUST have a correspondingEnd(). - Defer: Use
defer span.End()immediately after checkingerrisn't nil (or just immediately if function is simple). - Attributes: Add attributes to spans only if they are valuable for querying latency/filtering (e.g., "http.status_code"). High cardinality data belongs in Logs, not Spans attributes (usually).
func ProcessOrder(ctx context.Context, orderID string) error {
// 1. Get Named Tracer
tracer := app.Tracer("order.processor")
// 2. Start Span
ctx, span := tracer.Start(ctx, "ProcessOrder")
// 3. Ensure End
defer span.End()
// 4. Enrich Span
span.SetAttributes(attribute.String("order.id", orderID))
// ... work ...
if err := validate(ctx); err != nil {
// 5. Record Errors in Span
span.RecordError(err)
span.SetStatus(codes.Error, "validation failed")
return err
}
return nil
}Recipes for standard deployment scenarios.
Ion is flexible. Here are the core patterns for different environments.
Best for: CLI tools, scripts, local testing.
cfg := ion.Default()
cfg.Level = "info"
// cfg.Development = true // Optional: enables callers / stacktracesBest for: Systemd services, VM-based deployments, legacy nodes.
cfg := ion.Default()
// Console for live tailing (kubectl logs)
cfg.Console.Enabled = true
// File for long-term retention
cfg.File.Enabled = true
cfg.File.Path = "/var/log/app/app.log"
cfg.File.MaxSizeMB = 500
cfg.File.MaxBackups = 5Best for: High-traffic sidecars where local IO is expensive.
cfg := ion.Default()
cfg.Console.Enabled = false // Disable local IO
cfg.OTEL.Enabled = true
cfg.OTEL.Endpoint = "localhost:4317"
cfg.OTEL.Protocol = "grpc"Best for: Kubernetes microservices.
cfg := ion.Default()
cfg.Console.Enabled = true // For pod logs
cfg.OTEL.Enabled = true // For log aggregation (Loki/Elastic)
cfg.OTEL.Endpoint = "otu-col:4317"
cfg.Tracing.Enabled = true // For distributed traces (Tempo/Jaeger)
cfg.Tracing.Sampler = "ratio:0.1" // 10% samplingDistributed tracing is powerful but requires discipline. Follow these rules to get useful traces.
- Root Spans: Created by middleware (HTTP/gRPC). You rarely start these manually unless writing a background worker.
- Child Spans: Created by
Start(ctx, name). Always inherit parent ID fromctx.
// 1. Start (creates child if ctx has parent)
ctx, span := tracer.Start(ctx, "CalculateHash")
// 2. Defer End (Critical!)
defer span.End() - Attributes: "Search Tags". Low cardinality. Use for filtering.
- Good:
user_id,http.status,region,retry_count - Bad:
error_message(too variable),payload_dump(too big)
- Good:
- Events: "Timestamped Markers". Significant moments inside a span.
- Example:
span.AddEvent("cache_miss")
- Example:
- Logs: "Detailed Context". High cardinality. Use
app.Info(ctx, ...)instead.- Why: Logs are cheaper and searchable by full text.
The Problem: By default, a span finishes as OK even if your function returns an error. This results in 0% Error Rate on your dashboard despite complete failure.
The Fix: You must explicitly "taint" the span.
if err != nil {
// 1. Record stacktrace and error type in the span
span.RecordError(err)
// 2. Flip the span status to Error (turns red in Jaeger/Tempo)
span.SetStatus(codes.Error, "failed to insert order")
// 3. Log it for humans (Logs = Detail, Traces = Signals)
app.Error(ctx, "failed to insert order", err)
return err
}
// Optional: Explicitly mark success if needed, though default is Unset/OK
span.SetStatus(codes.Ok, "success")The Problem: Spans are bound to context. If the parent request finishes, ctx is canceled and the parent span Ends. A background goroutine using that same ctx becomes a "Ghost Span" โ it might log errors, but it has no parent in the trace visualization (disconnected).
The Fix: Create a Link. A Link connects a new Root Span to the old Parent Trace, saying "This background job was caused by that request", without being killed by it.
// Fire-and-Forget Background Task
go func(parentCtx context.Context) {
// 1. Create a FRESH context (so we don't die when request finishes)
newCtx := context.Background()
tracer := app.Tracer("background_worker")
// 2. Create a Link from the parent (Preserves causality)
link := trace.LinkFromContext(parentCtx)
// 3. Start a new Root Span with the link
ctx, span := tracer.Start(newCtx, "AsyncJob", ion.WithLinks(link))
defer span.End()
// Now you have a safe, independent span correlated to the original request
app.Info(ctx, "processing in background", ion.String("job", "email_send"))
}(ctx)Ion provides specialized middleware/interceptors to automate context propagation.
import "github.com/JupiterMetaLabs/ion/middleware/ionhttp"
mux := http.NewServeMux()
handler := ionhttp.Handler(mux, "payment-api")
http.ListenAndServe(":8080", handler)import "github.com/JupiterMetaLabs/ion/middleware/iongrpc"
// Server
s := grpc.NewServer(
grpc.StatsHandler(iongrpc.ServerHandler()),
)
// Client
conn, err := grpc.Dial(addr,
grpc.WithStatsHandler(iongrpc.ClientHandler()),
)Operators must understand how Ion behaves under stress:
- OTEL Collector Down: The internal exporter will retry with exponential backoff. If buffers fill, new traces will be dropped. Application performance is preserved (failure is isolated).
- Disk Full (File Logging):
lumberjackrotation will attempt to write. If the write syscall fails, Zap internal error handling catches it. The application continues, but logs are lost (written to stderr fallback if possible). - High Load: Tracing uses a batch processor. Under extreme load, if the export rate lags generation, spans are dropped to prevent memory leaks (bounded buffer).
โ ๏ธ WARNING: Global state is strictly for migration and legacy support.
Do not use ion.SetGlobal in new libraries or microservices.
Do inject ion.Logger via struct constructors.
// CORRECT: Dependency Injection
type Server struct {
log ion.Logger
}
// INCORRECT: Hidden dependency
func (s *Server) Handle() {
ion.Info(...) // Relies on global state
}If ion.GetGlobal() is called without initialization, it returns a safe no-op logger. It will not panic, but your logs will be silently discarded to protect the runtime.
- Pass Context Everywhere:
context.Background()breaks the trace chain. Only use it inmainor background worker roots. - Shutdown is Mandatory: Failing to call
Shutdownguarantees data loss (buffered traces/logs) on deployment. - Structured Keys: Use consistent key names (e.g.,
user_id, notuserIDoruid) to make logs queryable. - No Dynamic Keys:
ion.String(userInput, "value")is a security risk and breaks indexing. Keys must be static constants.
- Public API:
ion.go,logger.go,config.go. Stable v0.2. - Internal:
internal/*. No stability guarantees. - Behavior: Log format changes or configuration defaults are considered breaking changes.
MIT ยฉ 2025 JupiterMeta Labs