Enabling observability in microservices with traces, metrics, and logs using OpenTelemetry and Grafana
|
Note
|
This repository contains the guide documentation source. To view the guide in published form, view it on the Open Liberty website. |
Learn how to enable the collection of traces, metrics, and logs from microservices by using MicroProfile Telemetry and the Grafana stack.
In a microservices architecture, it can be difficult to understand how services interact, where latency occurs, and what causes failures. Without visibility across service boundaries, diagnosing issues and tuning performance can become slow and error-prone.
Observability helps address these challenges by capturing telemetry data such as logs, metrics, and traces. OpenTelemetry is an open source framework that provides APIs, SDKs, and tools for generating and managing this data. MicroProfile Telemetry adopts OpenTelemetry to enable both automatic and manual instrumentation in MicroProfile applications. Traces and metrics, along with runtime and application logs, can be exported in a standardized format through an OpenTelemetry Collector to any compatible backend.
In this guide, you’ll use the Grafana Docker OpenTelemetry LGTM image (grafana/otel-lgtm), an open source Docker image that provides a preconfigured observability backend for OpenTelemetry, based on the Grafana stack. This setup includes:
-
OpenTelemetry Collector: a gateway for receiving telemetry data from applications
-
Prometheus: a time-series database for storing numerical metrics, like request rates and memory usage
-
Loki: a log aggregation system for collecting and querying logs
-
Tempo: a distributed tracing backend that stores traces, which represent the path and timing of a request as it flows across services
-
Grafana: a dashboard tool that brings together logs, metrics, and traces for visualization and analysis
The diagram shows multiple services, but for simplicity, this guide configures only the system and inventory services to demonstrate observability in a distributed environment. In this guide, you’ll learn how to enable the automatic collection of logs, metrics, and traces.
Before you begin, ensure that Docker is installed and running on your system. For installation instructions, see the official Docker documentation.
Start a container from the grafana/otel-lgtm Docker image by running the following command:
docker run -d --name otel-lgtm -p 3000:3000 -p 4317:4317 -p 4318:4318 --rm -ti grafana/otel-lgtm
You can monitor the container startup by viewing its logs:
docker logs otel-lgtm
It may take a minute for the container to start. After you see the following message, your observability stack is ready:
The OpenTelemetry collector and the Grafana LGTM stack are up and running.
When the container is running, you can access the Grafana dashboard at the http://localhost:3000 URL.
The finish directory in the root of this guide contains the finished application. Give it a try before you proceed.
To try out the application, go to the finish directory and run the following Maven goal to build the system service and deploy it to Open Liberty:
mvnw.cmd -pl system liberty:run./mvnw -pl system liberty:run./mvnw -pl system liberty:runNext, open another command-line session in the finish directory and run the following command to start the inventory service:
mvnw.cmd -pl inventory liberty:run./mvnw -pl inventory liberty:run./mvnw -pl inventory liberty:runAfter you see the following message in both command-line sessions, both of your services are ready:
The defaultServer server is ready to run a smarter planet.
When both services are running, navigate your browser to the http://localhost:9081/inventory/systems/localhost URL.
When you visit this endpoint, it sends an HTTP GET request to the inventory service. The inventory service then makes two outbound GET requests to the system service. One request goes to the /system/properties endpoint, and the other goes to the /health endpoint.
In addition, the inventory service automatically pings the system service every 30 seconds to refresh the health status of all systems in the inventory.
To explore its telemetry data, open the Grafana dashboard at the http://localhost:3000 URL and follow these steps:
View trace with Tempo:
-
Open the Explore view from the left menu.
-
Select Tempo as the data source.
-
Set Query type to
Search. -
Click Run query at the upper-right corner to list recent traces.
-
Click a trace ID with the
GET /inventory/systems/{hostname}name that you generated. You see the following result:The trace contains five spans, one for the initial request to the
inventoryendpoint, two client spans from theinventoryservice making outbound calls to thesystemservice, and two server spans from thesystemservice handling those requests. Under Service & Operation, you see the spans in this trace. You can inspect each span by clicking it to reveal more detailed information, such as the times that a request was received and a response was sent. -
Expand the Node graph to see the relationship of spans between the
inventoryandsystemmicroservices.This graph helps visualize the request flow across services and identify any latency hotspots or bottlenecks.
View logs with Loki:
You can click a Log icon beside a span to see the logs associated with it.
You can also follow the following steps to explore the application and runtimes logs from both services:
-
Open the Drilldown → Logs view from the left menu. This view displays an overview of time series and log visualizations for all services that send logs to Loki.
-
Click the Show logs button for a specific service to display its logs.
-
Expand a log entry to view the full message along with its trace context.
View metrics with Prometheus:
-
Open the Drilldown → Metrics view from the left menu. This view shows a query-less experience for browsing the available metrics that are collected by Prometheus.
-
For a more detailed view of any metric, click the Select button next to its graph.
After you’re finished reviewing the application, stop the Open Liberty instances by pressing CTRL+C in the command-line sessions where you ran the system and inventory services. Alternatively, you can run the following goals from the finish directory in another command-line session:
mvnw.cmd -pl system liberty:stop
mvnw.cmd -pl inventory liberty:stop./mvnw -pl system liberty:stop
./mvnw -pl inventory liberty:stop./mvnw -pl system liberty:stop
./mvnw -pl inventory liberty:stopMicroProfile Telemetry can automatically collect telemetry data without requiring changes to your application code. To collect and export telemetry data, you need to enable the MicroProfile Telemetry feature and configure the required OpenTelemetry properties in your application.
Navigate to the start directory to begin.
Start by adding the MicroProfile Telemetry feature to the server.xml file of each service.
Replace theserver.xmlfile of the inventory service.inventory/src/main/liberty/config/server.xml
inventory/server.xml
link:finish/inventory/src/main/liberty/config/server.xml[role=include]The mpTelemetry feature enables MicroProfile Telemetry support in Open Liberty for the inventory service.
Replace theserver.xmlfile of the system service.system/src/main/liberty/config/server.xml
system/server.xml
link:finish/system/src/main/liberty/config/server.xml[role=include]Similarly, the added mpTelemetry feature enables telemetry support for the system service.
By default, the OpenTelemetry SDK is disabled to reduce performance overhead. To enable it, set the otel.sdk.disabled property to false in a valid configuration source.
Create the bootstrap.properties file for the inventory service.
inventory/src/main/liberty/config/bootstrap.properties
inventory/bootstrap.properties
link:finish/inventory/src/main/liberty/config/bootstrap.properties[role=include]Setting the otel.sdk.disabled property to false property in the bootstrap properties file enables telemetry collection at the runtime level. This allows both runtime and application telemetry to be collected. If you instead configure this property at the application level, runtime telemetry will not be included. For more information, refer to the MicroProfile Telemetry configuration documentation.
The otel.service.name property sets the service name to inventory, helping identify the source of the telemetry data in monitoring tools like Grafana.
The observability backend provided by the grafana/otel-lgtm image receives telemetry data through the OTLP protocol, which is the default for OpenTelemetry. Therefore, no extra exporter configuration is needed.
Create the bootstrap.properties file for the system service.
system/src/main/liberty/config/bootstrap.properties
system/bootstrap.properties
link:finish/system/src/main/liberty/config/bootstrap.properties[role=include]The otel.* properties are configured in the bootstrap.properties file for the system service to enable telemetry collection and define service-specific settings.
For more information about these and other Telemetry properties, see the MicroProfile Config properties for MicroProfile Telemetry documentation.
Now, start the services to begin collecting telemetry data.
When you run Open Liberty in dev mode, dev mode listens for file changes and automatically recompiles and deploys your updates whenever you save a new change. Run the following command to start the system service in dev mode:
mvnw.cmd -pl system liberty:dev./mvnw -pl system liberty:dev./mvnw -pl system liberty:devOpen another command-line session and run the following command to start the inventory service in dev mode:
mvnw.cmd -pl inventory liberty:dev./mvnw -pl inventory liberty:dev./mvnw -pl inventory liberty:devAfter you see the following message, your Liberty instance is ready in dev mode:
************************************************************** * Liberty is running in dev mode.
Dev mode holds your command-line session to listen for file changes. Open another command-line session to continue, or open the project in your editor.
When both services are running, visit the http://localhost:9081/inventory/systems/localhost URL to trigger automatic telemetry collection for the request. By default, OpenTelemetry generates trace spans for incoming HTTP requests to JAX-RS and REST endpoints, collects runtime and application metrics such as HTTP request durations and JVM performance, and captures message logs written by the application or Liberty runtime at the INFO level or higher.
To explore the automatic telemetry data, open the Grafana dashboard at the http://localhost:3000 URL.
View the traces that were automatically created from your request in the Grafana dashboard. From the left menu, open the Explore view and select Tempo as the data source. For Query type, choose Search, then click Run query. You see that the first result is the trace for the GET /inventory/systems/{hostname} request from the inventory service. Click the trace ID to open the trace details and verify that there are three spans from the inventory service and two spans from the system service.
View the messages logs to see timestamped events from the server startup in the dashboard. Navigate to the Drilldown → Logs from the menu. You can click Show logs from each service to see the detailed context for each log.
View an overview of the JVM metrics to get insights into class count, CPU usage, and heap memory utilization. Open the Dashboards view from menu and select the JVM Overview (OpenTelemetry) dashboard.
Open the RED Metrics (classic histogram) dashboard to get an overview of the HTTP request performance.
When MicroProfile Telemetry is enabled, OpenTelemetry automatically collects logs from the Liberty message log stream. This includes logs that are written by using the java.util.logging API at the INFO level or higher, as well as messages from the System.out standard output and System.err standard error streams.
While System.out and System.err are useful for quick debugging, they are limited in production environments. These streams lack structure, consistent severity levels, and the contextual metadata that is critical for monitoring distributed systems. In contrast, the java.util.logging API produces structured logs with fine-grained control over log levels, built-in support for exceptions, and better integration with telemetry tools like Grafana.
SystemClient.java
link:start/inventory/src/main/java/io/openliberty/guides/inventory/client/SystemClient.java[role=include]Currently, the SystemClient class logs messages by using System.out and System.err.
To observe a basic standard output log, open your browser and navigate to the http://localhost:9081/inventory/systems/localhost URL. Then, open the Grafana dashboard at the http://localhost:3000 URL.
In the Explore view, select the Loki data source and set a filter for service_name = inventory. Click the Run query button to run the search.
In the Logs section, select the Logs view from the upper-right corner to show the results in logs visualization. This enables log expansion.
Look for the log entry that says Received response with status: 200. When expanded, you see that both the detected_level and the io_openliberty_module fields are set to SystemOut.
Now, access the http://localhost:9081/inventory/systems/unknown URL to simulate an exception. This request targets a nonexistent host and triggers a RuntimeException.
Rerun the same query in Grafana. In the Logs section, expand the log entry:
Unexpected exception while processing system service request: RESTEASY004655: Unable to invoke request: java.net.UnknownHostException: unknown: nodename nor servname provided, or not known
This log entry shows SystemErr as the value for both the detected_level and io_openliberty_module fields, with the stack trace included directly in the log message.
Although both System.out and System.err logs are collected, they do not provide structured metadata. You cannot identify logs by severity, separate stack traces, or correlate logs with contextual information such as the originating class or error type.
To enable structured logging, update your application to use the java.util.logging API.
Replace the SystemClient class.
inventory/src/main/java/io/openliberty/guides/inventory/client/SystemClient.java
SystemClient.java
link:finish/inventory/src/main/java/io/openliberty/guides/inventory/client/SystemClient.java[role=include]The updated SystemClient class uses the Logger.getLogger() method to retrieve a logger instance and the Logger.log() method to emit log messages at specific levels such as INFO or WARNING based on the context.
Because you are running the services in dev mode, the changes that you made are automatically picked up.
Return to the http://localhost:9081/inventory/systems/localhost URL and rerun the Loki query in Grafana. This time, the log is structured. The detected_level is set to INFO, and the io_openliberty_module field contains the logger name, making it easier to trace the origin of the log.
Access the http://localhost:9081/inventory/systems/unknown URL again to simulate an exception, then rerun the Loki query in Grafana. The resulting log shows WARNING in the detected_level field, includes a structured stack trace in the exception_stacktrace field, and identifies the exception type as jakarta.ws.rs.ProcessingException in the exception_type field.
By default, OpenTelemetry collects only message logs. For details on how to include other sources, see Collect logs from a specified source.
Manually verify the telemetry signals by inspecting them in the Grafana dashboard. You can also run the included tests to check the basic functionality of the services. If any of the tests fail, you might have introduced a bug into the code.
Because you started Open Liberty in dev mode, you can run the tests for the system and inventory services by pressing the enter/return key from the command-line sessions where you started the services.
When you are done checking out the services, exit dev mode by pressing CTRL+C in the shell sessions where you ran the system and inventory services.
Finally, run the following command to stop the container that you started from the grafana/otel-lgtm image in the Additional prerequisites section.
docker stop otel-lgtm
You just used MicroProfile Telemetry in Open Liberty to enable traces, metrics, and logs for microservices and the Grafana stack to collect and visualize the data.
Try out one of the related MicroProfile guides. These guides demonstrate more technologies that you can learn to expand on what you built in this guide.











