To ensure effective monitoring and logging of the Waltti APC microservices, detect anomalies proactively, optimize performance, and ensure system health, while aligning with data anonymization and pilot specifications.
Given the diversity of the microservices, monitor these primary metrics:
- Performance Metrics: CPU, Memory, and Network I/O for each microservice.
- Application Health: Uptime, response times, error rates.
- Data Flow Metrics: Message throughput for microservices like
http-pulsar-poller
,mqtt-pulsar-forwarder
,pulsar-mqtt-forwarder
, etc. - Database Operations: For
waltti-apc-analytics-db-sink
, monitor DB write rates, errors, and latencies.
- Microservice Failures: Alerts for service unavailability or high error rates.
- Resource Constraints: Alerts for nearing resource limits in terms of CPU, memory, or storage.
- Data Flow Anomalies: Alerts for significant drops or spikes in data flow.
Given the diversity and specificity of your repositories:
- Operational Logs: Logs from the microservices detailing their operations, errors, and state changes.
- Data Processing Logs: Logs from services responsible for data manipulation, such as
waltti-apc-anonymizer
, indicating successful operations, transformations, or any errors. - Infrastructure Logs: Logs from
waltti-apc-deployment
andwaltti-apc-gcp-terraform
indicating the status of deployments and infrastructure changes.
- Focus on actionable logs; e.g., errors or warnings.
- Exclude verbose logs that don't have operational significance to manage costs.
- Convert frequently occurring error logs into metrics to easily track and set alerts.
As there's a dedicated service and plan (waltti-apc-anonymization-plan
& waltti-apc-anonymizer
) for data anonymization, it's crucial to ensure that no personally identifiable information (PII) gets logged. Regular audits and filters should be in place.
- Utilize GCP's integration capabilities to forward critical alerts to your team's Slack channel.
- Service Documentation: Each microservice should have documentation detailing its logging and monitoring specifics.
- Regular Reviews: Periodically assess the effectiveness of the monitoring and logging setup. Adjust as the system evolves.
- Feedback Loop: Encourage team members to provide feedback on the usefulness of alerts and dashboards. Adjust to reduce noise and enhance utility.