CipherSwarm uses structured logging to provide comprehensive observability across all components of the system. This guide explains the logging infrastructure, formats, and best practices for debugging and monitoring.
Structured logging in CipherSwarm serves several key purposes:
- Debugging: Quickly identify and diagnose issues in development and production
- Monitoring: Track system health, performance, and agent behavior
- Auditing: Maintain records of important state changes and user actions
- Integration: Compatible with log aggregation services (ELK, Datadog, Splunk, etc.)
- Consistent Format: All logs use predictable, parseable formats with
[LogType]prefixes - Rich Context: Logs include relevant IDs, timestamps, and contextual information
- Sensitive Data Protection: Automatic filtering of passwords, tokens, and plaintext hashes
- Performance: Minimal overhead with efficient JSON serialization
- Searchability: Easy to query and filter logs by type and context
CipherSwarm uses Lograge to transform Rails request logs into structured JSON format.
- ✅ Production: Fully enabled for production monitoring
- ✅ Development: Enabled to match production behavior
- ❌ Test: Disabled to avoid test pollution
Each HTTP request generates a single-line JSON log entry:
{
"method": "GET",
"path": "/campaigns/1",
"format": "html",
"controller": "CampaignsController",
"action": "show",
"status": 200,
"duration": 45.23,
"view": 32.1,
"db": 8.45,
"allocations": 15234,
"host": "localhost:3000",
"request_id": "abc123...",
"user_agent": "Mozilla/5.0...",
"ip": "192.168.1.100",
"time": "2026-01-07T21:00:00.000Z",
"user_id": 5,
"agent_id": null,
"task_id": null,
"attack_id": null
}Lograge is configured to extract additional context from each request:
- user_id: Extracted from Devise
current_userfor web UI requests - agent_id: Extracted from
@agent,params[:agent_id], or authorization token - task_id: Extracted from
@taskorparams[:task_id] - attack_id: Extracted from
@attack,@task.attack, orparams[:attack_id]
The following paths are filtered out to reduce log noise:
- Health Checks:
Rails::HealthController#show(no value in monitoring) - Assets:
/assets/*paths (handled by CDN/web server in production)
Location: config/initializers/lograge.rb
Rails.application.configure do
config.lograge.enabled = Rails.env.production? || Rails.env.development?
config.lograge.formatter = Lograge::Formatters::Json.new
config.lograge.custom_payload do |controller|
{
host: controller.request.host,
request_id: controller.request.request_id,
user_agent: controller.request.user_agent,
ip: controller.request.remote_ip,
time: Time.current,
user_id: controller.try(:current_user)&.id,
agent_id: controller.instance_variable_get(:@agent)&.id ||
controller.params[:agent_id] ||
controller.request.headers["Authorization"]&.split&.last&.then { |token| Agent.find_by(token: token)&.id },
task_id: controller.instance_variable_get(:@task)&.id || controller.params[:task_id],
attack_id: controller.instance_variable_get(:@attack)&.id ||
controller.instance_variable_get(:@task)&.attack_id ||
controller.params[:attack_id]
}
end
endThe lograge custom_options lambda also emits exception_class, exception_message, and the first five backtrace frames when a request raises. To prevent log injection from attacker-controlled error text, the message is:
- Stripped of
\r/\nso it cannot break the single-line JSON envelope. - Coerced to a
String(so a raw exceptionClasscannot surface). - Truncated to 500 characters (
EXCEPTION_MESSAGE_MAX_LEN) to bound log volume.
The pinning is enforced by spec/initializers/lograge_spec.rb so regressions are caught at CI time.
CipherSwarm's Sidekiq workers emit single-line JSON via the formatter shipped with Sidekiq 8.x. The format mirrors lograge so the same log pipeline can ingest both layers without bespoke parsers.
Location: config/initializers/sidekiq.rb
Sidekiq.configure_server do |config|
config.logger = Sidekiq::Logger.new($stdout, level: Logger::INFO)
config.logger.formatter = Sidekiq::Logger::Formatters::JSON.new
end{
"ts": "2026-05-15T10:00:01.234Z",
"pid": 12,
"tid": "abc",
"lvl": "INFO",
"msg": "start",
"ctx": {
"class": "ProcessHashListJob",
"jid": "5f4dcc3b",
"queue": "ingest"
}
}Sidekiq 8.x's JobLogger never logs job arguments at any level — only the lifecycle markers (start, done, fail) plus the per-job context hash (jid, class, logged_job_attributes). Verified against sidekiq-8.1.5/lib/sidekiq/job_logger.rb. The :info level here is the standard production setting, not a redaction control.
Jobs that need to log argument context — for example ApplicationJob's discard_on handler — must run arguments through ActiveSupport::ParameterFilter.new(Rails.application.config.filter_parameters) before writing. See app/jobs/application_job.rb for the canonical pattern.
Sidekiq's default ERROR_HANDLER emits ex.detailed_message (class + message, no backtrace) at the standard log level. The backtrace is only included when the logger is set to :debug. If you need a backtrace for a novel exception:
- Edit
config/initializers/sidekiq.rband changelevel: Logger::INFOtolevel: Logger::DEBUG. - Restart the Sidekiq process (
docker compose restart sidekiqin dev; orchestrator-specific in prod). - Reproduce the error.
- Revert the change once you have what you need — DEBUG roughly doubles log volume.
The default stays at :info because backtraces from common, well-understood errors create more noise than signal in production.
In production, ActionMailer and ActionCable are pinned to log level WARN — DEBUG/INFO per-event chatter is dropped, but WARN/ERROR/FATAL still reach $stdout and the JSON log stream:
- ActionMailer: per-delivery INFO lines disappear. Delivery failures still surface because
raise_delivery_errorsdefaults totrue(Rails 8.x default — verify on future upgrades) — raised SMTP errors propagate through the request stack and reach lograge's exception payload. - ActionCable: per-subscribe/unsubscribe INFO chatter disappears, but cable connection rejections and channel errors at WARN+ still log. Cable subsystem WARN+ events do NOT route through controllers — they only surface via this logger, which is why we preserve WARN+ rather than dropping the logger entirely.
See config/environments/production.rb.
These are recipes for slicing the local docker compose logs JSON stream. For aggregator-side queries (Datadog/Splunk/Loki DSL) see the Common Queries section under Production Monitoring further down.
The structured JSON output is greppable, but most operator tasks compose more cleanly with jq. Examples:
# All HTTP requests that returned 5xx
docker compose logs web | jq -c 'select(.status >= 500)'
# All Sidekiq events for a specific job class
docker compose logs sidekiq | jq -c 'select(.ctx.class == "ProcessHashListJob")'
# Sidekiq errors only
docker compose logs sidekiq | jq -c 'select(.lvl == "ERROR")'
# Cross-layer correlation by agent_id (web only — Sidekiq does not carry agent_id today)
docker compose logs web | jq -c 'select(.agent_id == 7)'
# Slow requests above 1s
docker compose logs web | jq -c 'select(.duration > 1000)'CipherSwarm uses four main log type prefixes to categorize different logging scenarios.
Tracks the lifecycle of API requests from start to completion.
Logged at the beginning of each API request (before authentication).
Format:
[APIRequest] START - Agent {agent_id} - {method} {path} - {timestamp}
Example:
[APIRequest] START - Agent 42 - POST /api/v1/client/tasks/123/submit_status - 2026-01-07T21:00:00.000Z
[APIRequest] START - Agent unknown - GET /api/v1/client/configuration - 2026-01-07T21:00:05.000Z
Fields:
agent_id: Authenticated agent ID or "unknown" if authentication hasn't occurred yetmethod: HTTP method (GET, POST, PATCH, DELETE)path: Request pathtimestamp: ISO 8601 timestamp
Logged after each API request completes (after response is rendered).
Format:
[APIRequest] COMPLETE - Agent {agent_id} - {method} {path} - Status {status} - Duration {duration}ms - {timestamp}
Example:
[APIRequest] COMPLETE - Agent 42 - POST /api/v1/client/tasks/123/submit_status - Status 204 - Duration 45.23ms - 2026-01-07T21:00:00.050Z
[APIRequest] COMPLETE - Agent unknown - GET /api/v1/client/configuration - Status 401 - Duration 12.34ms - 2026-01-07T21:00:05.015Z
Fields:
agent_id: Authenticated agent ID or "unknown" for failed authenticationmethod: HTTP methodpath: Request pathstatus: HTTP response status codeduration: Request duration in milliseconds (calculated from START time)timestamp: ISO 8601 timestamp
- START: Before
authenticate_agentcallback inBaseController - COMPLETE: After
update_last_seencallback inBaseController - Frequency: One START + one COMPLETE log per API request
Location: app/controllers/api/v1/base_controller.rb
before_action :log_api_request_start
after_action :log_api_request_complete
def log_api_request_start
@api_request_start_time = Time.current
agent_id = @agent&.id || "unknown"
Rails.logger.info("[APIRequest] START - Agent #{agent_id} - #{request.method} #{request.path} - #{@api_request_start_time}")
rescue StandardError => e
Rails.logger.error("Failed to log API request start: #{e.message}")
end
def log_api_request_complete
agent_id = @agent&.id || "unknown"
duration_ms = if @api_request_start_time
((Time.current - @api_request_start_time) * 1000).round(2)
else
"N/A"
end
Rails.logger.info("[APIRequest] COMPLETE - Agent #{agent_id} - #{request.method} #{request.path} - Status #{response.status} - Duration #{duration_ms}ms - #{Time.current}")
rescue StandardError => e
Rails.logger.error("Failed to log API request completion: #{e.message}")
endTracks errors that occur during API request processing.
| Error Type | Description | HTTP Status |
|---|---|---|
JSON_PARSE_ERROR |
Invalid JSON in request body | 400 Bad Request |
VALIDATION_ERROR |
Model validation failed | 422 Unprocessable Content |
DUPLICATE_RECORD |
Unique constraint violation | 409 Conflict |
UNHANDLED_ERROR |
Unexpected exception | 500 Internal Server Error |
TASK_ABANDON_FAILED |
Failed to abandon task | 422 |
TASK_ACCEPT_FAILED |
Failed to accept task | 422 |
ATTACK_ACCEPT_FAILED |
Failed to accept attack | 422 |
TASK_EXHAUST_FAILED |
Failed to exhaust task | 422 |
ATTACK_EXHAUST_FAILED |
Failed to exhaust attack | 422 |
TASK_ALREADY_COMPLETED |
Task is already completed | 422 |
HASH_NOT_FOUND |
Hash not found in hash list | 404 |
HASH_UPDATE_FAILED |
Failed to update hash item | 422 |
TASK_ACCEPT_CRACK_FAILED |
Failed to accept crack | 422 |
GUESS_NOT_FOUND |
Guess parameters missing | 422 |
DEVICE_STATUSES_NOT_FOUND |
Device statuses missing | 422 |
STATUS_SAVE_FAILED |
Failed to save status | 422 |
TASK_ACCEPT_STATUS_FAILED |
Failed to accept status | 422 |
AGENT_UPDATE_FAILED |
Failed to update agent | 422 |
BENCHMARK_SUBMISSION_FAILED |
Failed to submit benchmarks | 422 |
ERROR_RECORD_SAVE_FAILED |
Failed to save error record | 422 |
ATTACK_NOT_FOUND |
Attack not found | 404 |
MISSING_VERSION |
Version parameter missing | 400 |
MISSING_OS |
Operating system parameter missing | 400 |
INVALID_VERSION_FORMAT |
Invalid version format | 400 |
General:
[APIError] {ERROR_TYPE} - Agent {agent_id} - {method} {path} - Error: {message} - {timestamp}
With Context:
[APIError] {ERROR_TYPE} - Agent {agent_id} - Task {task_id} - Errors: {error_messages} - {timestamp}
[APIError] {ERROR_TYPE} - Agent {agent_id} - Attack {attack_id} - Errors: {error_messages} - {timestamp}
With Backtrace (UNHANDLED_ERROR):
[APIError] UNHANDLED_ERROR - Agent {agent_id} - {method} {path} - Error: {error_class} - {error_message} - Backtrace: {first_5_lines} - {timestamp}
[APIError] JSON_PARSE_ERROR - Agent 42 - POST /api/v1/client/tasks/123/submit_status - Error: unexpected token at '{invalid json}' - 2026-01-07T21:00:00.000Z
[APIError] VALIDATION_ERROR - Agent 42 - POST /api/v1/client/agents/42 - Model: Agent - Errors: Name can't be blank - 2026-01-07T21:00:00.000Z
[APIError] TASK_ACCEPT_FAILED - Agent 42 - Task 123 - Errors: State transition failed - 2026-01-07T21:00:00.000Z
[APIError] UNHANDLED_ERROR - Agent 42 - GET /api/v1/client/tasks/new - Error: NoMethodError - undefined method `foo' for nil:NilClass - Backtrace: app/controllers/api/v1/client/tasks_controller.rb:45:in `new'\n... - 2026-01-07T21:00:00.000Z
- When
rescue_fromblocks catch exceptions inBaseController - When specific error conditions are detected in controller actions
- For all API errors that return 4xx or 5xx status codes
Errors are caught and logged in app/controllers/api/v1/base_controller.rb and individual controller actions.
For task-related API errors, the TaskErrorHandling concern provides a centralized logging helper method:
Location: app/controllers/concerns/task_error_handling.rb
# Logs an API error for a task-related operation in a consistent structured format.
# Centralizes the [APIError] log pattern used across task state transition failures.
#
# @param code [String] The error code (e.g., "TASK_ABANDON_FAILED", "ATTACK_ACCEPT_FAILED")
# @param agent [Agent] The agent involved in the operation
# @param task [Task] The task involved in the operation
# @param error_messages [Array<String>, String] Error message(s) to include
def log_task_api_error(code, agent, task, error_messages)
messages = Array(error_messages).join(", ")
Rails.logger.error(
"[APIError] #{code} - Agent #{agent.id} - Task #{task.id} - " \
"Errors: #{messages} - #{Time.current}"
)
endUsage Example:
Instead of inline logging:
# ❌ Avoid: Inline logging (duplicates format string)
Rails.logger.error("[APIError] TASK_ABANDON_FAILED - Agent #{@agent.id} - Task #{@task.id} - Errors: #{@task.errors.full_messages.join(', ')} - #{Time.current}")Use the helper method:
# ✅ Preferred: Use the helper method
log_task_api_error("TASK_ABANDON_FAILED", @agent, @task, @task.errors.full_messages)This helper is used in TasksController for all task-related error logging:
TASK_ABANDON_FAILEDTASK_ACCEPT_FAILEDATTACK_ACCEPT_FAILEDTASK_EXHAUST_FAILED
Tracks important agent state transitions and lifecycle events.
| Event | Description | State Transition |
|---|---|---|
connect |
Agent first connects and activates | pending → active |
disconnect |
Agent is manually deactivated | {any} → stopped |
shutdown |
Agent explicitly shuts down | {any} → offline |
shutdown_success |
Shutdown completed successfully | N/A |
shutdown_failed |
Shutdown failed with errors | N/A |
reconnect |
Agent comes back online after being offline | offline → active |
heartbeat_timeout |
Agent hasn't checked in within threshold | {any} → offline |
heartbeat_threshold_exceeded |
Agent's last_seen_at exceeds offline threshold | N/A (warning) |
heartbeat_failed |
Heartbeat state transition failed | N/A |
benchmark_complete |
Agent completes benchmark submission | {any} → benchmarked |
benchmark_stale |
Agent's benchmarks are too old | N/A (warning) |
Connect:
[AgentLifecycle] connect: agent_id={id} state_change={old}->{new} last_seen_at={time} ip={ip} timestamp={time}
Disconnect:
[AgentLifecycle] disconnect: agent_id={id} state_change={old}->{new} last_seen_at={time} ip={ip} timestamp={time}
Shutdown:
[AgentLifecycle] shutdown: agent_id={id} state_change={old}->{new} running_tasks_abandoned={count} timestamp={time}
[AgentLifecycle] shutdown_success: agent_id={id} state={state} timestamp={time}
[AgentLifecycle] shutdown_failed: agent_id={id} state={state} errors={error_messages} timestamp={time}
Reconnect:
[AgentLifecycle] reconnect: agent_id={id} state_change={old}->{new} last_seen_at={time} ip={ip} timestamp={time}
Heartbeat Timeout:
[AgentLifecycle] heartbeat_timeout: agent_id={id} state_change={old}->{new} last_seen_at={time} threshold={config} timestamp={time}
[AgentLifecycle] heartbeat_threshold_exceeded: agent_id={id} last_seen_at={time} threshold={config} timestamp={time}
Benchmark Complete:
[AgentLifecycle] benchmark_complete: agent_id={id} state_change={old}->{new} benchmark_count={count} timestamp={time}
[AgentLifecycle] connect: agent_id=42 state_change=pending->active last_seen_at=2026-01-07T21:00:00.000Z ip=192.168.1.100 timestamp=2026-01-07T21:00:00.000Z
[AgentLifecycle] shutdown: agent_id=42 state_change=active->offline running_tasks_abandoned=3 timestamp=2026-01-07T21:05:00.000Z
[AgentLifecycle] shutdown_success: agent_id=42 state=offline timestamp=2026-01-07T21:05:00.100Z
[AgentLifecycle] reconnect: agent_id=42 state_change=offline->active last_seen_at=2026-01-07T21:10:00.000Z ip=192.168.1.100 timestamp=2026-01-07T21:10:00.000Z
[AgentLifecycle] heartbeat_timeout: agent_id=42 state_change=active->offline last_seen_at=2026-01-07T19:00:00.000Z threshold=3600 timestamp=2026-01-07T21:00:00.000Z
[AgentLifecycle] benchmark_complete: agent_id=42 state_change=active->benchmarked benchmark_count=15 timestamp=2026-01-07T21:00:00.000Z
- When agent state transitions occur via state machine callbacks
- After important agent operations complete
- When timeout thresholds are exceeded
Location: app/models/agent.rb
Lifecycle events are logged in state machine after_transition callbacks:
state_machine :state do
after_transition on: :shutdown do |agent, transition|
# Log shutdown event
end
after_transition on: :activate do |agent, transition|
# Log connect event
end
# ... etc
endTracks failures in Turbo Stream broadcasts to prevent application crashes.
[BroadcastError] Model: {model_name} - Record ID: {record_id} - Error: {error_message}
Backtrace:
{first_5_backtrace_lines}
[BroadcastError] Model: Agent - Record ID: 42 - Error: Connection refused - connect(2) for "localhost" port 6379
Backtrace:
/path/to/redis/client.rb:123:in `connect'
/path/to/turbo/broadcastable.rb:45:in `broadcast_refresh'
app/models/agent.rb:234:in `block in <class:Agent>'
app/controllers/agents_controller.rb:67:in `update'
...
- When Turbo Stream broadcasts fail (Redis connection issues, network problems, etc.)
- During model updates that trigger
broadcasts_refreshesor similar broadcasts - Should be rare in production but critical to catch
Models that include SafeBroadcasting concern:
- Agent
- Task
- Attack
- Campaign
- User
- AgentError
- HashList
- Project
- HashcatBenchmark
- WordList
- RuleList
- MaskList
Location: app/models/concerns/safe_broadcasting.rb
module SafeBroadcasting
extend ActiveSupport::Concern
included do
after_commit :safe_broadcast_refresh, if: :previously_new_record?
after_update_commit :safe_broadcast_update
end
private
def safe_broadcast(method)
send(method)
rescue StandardError => e
Rails.logger.error("[BroadcastError] Model: #{self.class.name} - Record ID: #{id} - Error: #{e.message}")
Rails.logger.error("Backtrace:\n#{e.backtrace.first(5).join("\n")}")
end
def safe_broadcast_refresh
safe_broadcast(:broadcast_refresh)
end
def safe_broadcast_update
safe_broadcast(:broadcast_update)
end
endCipherSwarm automatically filters sensitive parameters from all logs to protect user data.
The following parameter names (or partial matches) are filtered:
| Pattern | Matches | Examples |
|---|---|---|
passw |
Any password field | password, password_confirmation, current_password |
email |
Email addresses | email, user_email, contact_email |
secret |
Secret values | secret, secret_key, oauth_secret |
token |
Authentication tokens | token, auth_token, api_token, csrf_token |
_key |
API keys and encryption keys | api_key, encryption_key, private_key |
crypt |
Encrypted values | encrypted_password, crypt |
salt |
Password salts | salt, password_salt |
certificate |
SSL certificates | certificate, ssl_certificate |
otp |
One-time passwords | otp, otp_secret |
ssn |
Social security numbers | ssn, social_security_number |
cvv |
Credit card CVV codes | cvv, cvv2, cvc |
cvc |
Credit card CVC codes | cvc |
plain_text |
Cracked hash plaintext | plain_text |
hash_value |
Full hash values | hash_value |
Rails' built-in ActiveSupport::ParameterFilter automatically replaces filtered values with [FILTERED] in logs.
Before filtering:
{
"user": {
"email": "user@example.com",
"password": "supersecret123",
"name": "John Doe"
},
"agent": {
"token": "abc123xyz789"
}
}After filtering:
{
"user": {
"email": "[FILTERED]",
"password": "[FILTERED]",
"name": "John Doe"
},
"agent": {
"token": "[FILTERED]"
}
}Location: config/initializers/filter_parameter_logging.rb
Rails.application.config.filter_parameters += %i[
passw email secret token _key crypt salt certificate otp ssn cvv cvc plain_text hash_value
]To verify sensitive data is filtered:
-
Development Testing:
just dev # Make requests with sensitive data # Search logs for actual values (should not appear) grep -r "supersecret123" log/ # Should return no results
-
Log Review:
- Review logs after authentication requests
- Review logs after crack submissions
- Verify passwords, tokens, and plaintext hashes show as
[FILTERED]
-
Automated Testing:
- RSpec tests verify filtering is configured
- Integration tests verify sensitive data doesn't appear in log output
Lograge is already enabled in development to match production behavior. No configuration changes needed.
Start the development server:
just devLogs will appear in the terminal with JSON format (single line per request).
# In terminal 1: Start server
just dev
# In terminal 2: Make API requests
curl -H "Authorization: Bearer YOUR_AGENT_TOKEN" \
http://localhost:3000/api/v1/client/configuration
# Check logs for:
# [APIRequest] START - Agent X - GET /api/v1/client/configuration
# [APIRequest] COMPLETE - Agent X - GET /api/v1/client/configuration - Status 200 - Duration XXms# Create agent and trigger state transitions
rails console
agent = Agent.create!(name: "Test Agent", ...)
agent.activate # Check for [AgentLifecycle] connect log
agent.shutdown # Check for [AgentLifecycle] shutdown log# Temporarily stop Redis
brew services stop redis
# Update a model
rails console
agent = Agent.first
agent.update(name: "New Name")
# Check for [BroadcastError] log (application should continue working)
# Restart Redis
brew services start redis# Make request with sensitive data
curl -X POST http://localhost:3000/api/v1/client/tasks/123/submit_crack \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d '{"hash": "abc123", "plain_text": "password123"}'
# Search logs for plaintext (should not appear)
grep "password123" log/development.log
# Should return no resultsCipherSwarm's structured JSON logs are compatible with:
- ELK Stack (Elasticsearch, Logstash, Kibana)
- Datadog
- Splunk
- Papertrail
- Loggly
- CloudWatch (AWS)
- Stackdriver (GCP)
For local Docker-compose stdout streams, see the Useful jq Queries section earlier in this doc. The recipes below are aggregator-side queries (Datadog/Splunk/Loki).
[APIError]
[AgentLifecycle] AND (heartbeat_timeout OR disconnect OR shutdown_failed)
[BroadcastError]
[APIRequest] COMPLETE AND Duration > 1000ms
[APIRequest] COMPLETE AND Status 401
[APIRequest] AND "Agent 42"
[AgentLifecycle] AND "agent_id=42"
- Unhandled Errors: Any
[APIError] UNHANDLED_ERRORlog - Agent Shutdown Failures: Any
[AgentLifecycle] shutdown_failedlog - High Error Rate: More than 10
[APIError]logs per minute
- Broadcast Failures: More than 5
[BroadcastError]logs per minute - Agent Timeouts: More than 5
[AgentLifecycle] heartbeat_timeoutlogs per hour - Authentication Failures: More than 20 failed auth attempts per minute (potential attack)
- Slow API Requests: Average API duration > 500ms over 5 minutes
- High API Volume: More than 1000 API requests per minute (capacity planning)
- API request volume (requests per minute)
- API response time (p50, p95, p99)
- API error rate (errors per minute)
- API status code distribution
- Active agents count
- Agent connections per hour
- Agent disconnections per hour
- Agent heartbeat timeouts per hour
- Agent state distribution
- Error rate by type
- Top 10 error messages
- Error rate by agent
- Error rate by endpoint
- Broadcast error rate
- Broadcast errors by model
- Time since last broadcast error
✅ Do Log:
- State transitions (agent, task, attack, campaign)
- API request start and completion
- Errors and exceptions
- Important business events (crack submission, task completion)
- Performance-critical operations
❌ Don't Log:
- Inside tight loops (performance impact)
- Sensitive data (passwords, tokens, plaintext)
- Excessive debug information in production
- Every database query (use tools like Bullet instead)
info: Normal operations, state transitions, API requestswarn: Degraded performance, approaching thresholds, recoverable errorserror: Errors that prevent operations, unhandled exceptionsdebug: Detailed diagnostic information (development only)
Always use the [LogType] prefix format:
Rails.logger.info("[AgentLifecycle] connect: agent_id=#{agent.id} state_change=#{from}->#{to} timestamp=#{Time.current}")Not:
Rails.logger.info("Agent #{agent.id} connected") # ❌ Unstructured, hard to parseWhen the same log format is used in multiple places, extract it to a helper method:
# ✅ Good: Centralized in a concern
def log_task_api_error(code, agent, task, error_messages)
messages = Array(error_messages).join(", ")
Rails.logger.error("[APIError] #{code} - Agent #{agent.id} - Task #{task.id} - Errors: #{messages} - #{Time.current}")
end
# Usage
log_task_api_error("TASK_ABANDON_FAILED", @agent, @task, @task.errors.full_messages)Benefits:
- Consistency: Ensures all logs of the same type use identical formatting
- Maintainability: Single place to update format if needed
- Testability: Easier to test log patterns in isolation
- DRY: Reduces code duplication across controllers
See TaskErrorHandling#log_task_api_error in the APIError Implementation section for a real example.
Always include relevant IDs and timestamps:
Rails.logger.error("[APIError] VALIDATION_ERROR - Agent #{@agent.id} - Task #{@task.id} - Errors: #{@task.errors.full_messages.join(', ')} - #{Time.current}")Always rescue logging failures to prevent application crashes:
def log_api_request_start
# ... logging code ...
rescue StandardError => e
Rails.logger.error("Failed to log API request start: #{e.message}")
endAlways test that important events are logged:
it "logs agent shutdown" do
agent = create(:agent, state: "active")
expect(Rails.logger).to receive(:info).with(/\[AgentLifecycle\] shutdown:/)
agent.shutdown
end-
Check Lograge is enabled:
# In rails console Rails.application.config.lograge.enabled # Should return true
-
Check log level:
Rails.logger.level # Should return 1 (info) or lower
-
Check log output location:
Rails.logger.instance_variable_get(:@logdev).filename # Should show log file path
-
Reduce log level in production:
config.log_level = :warn
-
Add more ignored paths to Lograge configuration
-
Implement log sampling for high-volume endpoints
- Verify custom_payload is configured in Lograge initializer
- Check instance variables are set before logging
- Add more context to log messages
If logging impacts performance:
- Use log level filtering (don't log debug in production)
- Move expensive operations out of log statements
- Use background jobs for complex logging
- Implement log sampling for high-volume events