OpenTelemetry and Jakarta REST Services

Wednesday, Jun 2, 2021 |

Knowing what's going on in your microservices deployment is extremely important when something goes wrong. In a distributed system, though, it can be difficult to know where things have gone wrong. That's where a tracing system such as OpenTelemetry can be immensely valuable. In this post, we'll build two simple services, one of which calls the other, and trace the execution from end to end.

The Parent POM ==

We'll start by setting up the project, and we'll do so by creating a multimodule Maven pom:

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.steeplesoft.otel</groupId>
    <artifactId>multiservice-demo</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>pom</packaging>

    <modules>
        <module>otel-integration</module>
        <module>service1</module>
        <module>service2</module>
    </modules>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>11</maven.compiler.source>
        <maven.compiler.target>11</maven.compiler.target>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>io.opentelemetry</groupId>
                <artifactId>opentelemetry-bom</artifactId>
                <version>1.2.0</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>jakarta.ws.rs</groupId>
            <artifactId>jakarta.ws.rs-api</artifactId>
            <version>2.1.6</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>jakarta.enterprise</groupId>
            <artifactId>jakarta.enterprise.cdi-api</artifactId>
            <version>2.0.2</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>io.grpc</groupId>
            <artifactId>grpc-okhttp</artifactId>
            <version>1.36.1</version>
        </dependency>
        <dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-api</artifactId>
        </dependency>
        <dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-sdk</artifactId>
        </dependency>
        <dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-sdk-trace</artifactId>
        </dependency>
        <dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-exporter-jaeger</artifactId>
        </dependency>
    </dependencies>
</project>

By setting up the parent POM this way, we'll make the three submodules much simpler. Since the two services are largely identical, we'll only look at one of those POMs.

The Basic REST service ==

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <artifactId>multiservice-demo</artifactId>
        <groupId>com.steeplesoft.otel</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>

    <artifactId>service1</artifactId>
    <packaging>war</packaging>

    <dependencies>
        <dependency>
            <groupId>${project.groupId}</groupId>
            <artifactId>otel-integration</artifactId>
            <version>${project.version}</version>
        </dependency>
    </dependencies>
</project>

There's really not much to this beyond a dependency on another module from our project, which we'll look at shortly. First, our REST endpoint:

@Path("/endpoint1")
public class RestEndpoint1 {
    @Inject
    private Tracer tracer;
    @Inject
    private OpenTelemetry otel;


    @GET
    public String method1() {
        final Span span = tracer.spanBuilder("Doing some work")
                .startSpan();

        span.makeCurrent();
        span.setAttribute("in.my", "application");
        span.addEvent("Test Event");
        sleep();
        doSomeMoreWork();
        span.addEvent("After work");

        String service2 = sendRequest();
        sleep();
        doEvenMoreWork();

        span.end();

        return "Hello World, from service 1! Service 2 happened to say '" + service2 + "'";
    }

    private void doSomeMoreWork() {
        final Span span = tracer.spanBuilder("Doing some more work")
                .startSpan();
        span.makeCurrent();
        sleep();
        doEvenMoreWork();
        span.end();
    }

    private void doEvenMoreWork() {
        final Span span = tracer.spanBuilder("Doing even more work")
                .startSpan();
        span.makeCurrent();
        sleep();
        span.end();
    }
    // ...
}

This is a pretty boring, if somewhat contrived, REST endpoint. The interesting part is the injection of the OpenTelemetry Tracer instance, and the creation of the spans in the REST method itself. From the OpenTelemetrey docs , we learn that

Traces track the progression of a single request, called a trace, as it is handled by services that make up an application...Each unit of work in a trace is called a span; a trace is a tree of spans." and that "[s]pans are objects that represent the work being done by individual services or components involved in a request as it flows through a system. _In the midst of our "work", then, we create a span to track the process. This will help help us mark the beginning and end of unit of work. As we monitor the system, we can compute statistics on, for example, the average time a given span takes. Should it start deviating wildly, we may have found our issue's culprit. The use of a span may look something like this, generically speaking:
Span span = tracer.spanBuilder("Doing some work")
    .startSpan();
span.makeCurrent();
// Do some work
span.end();
When creating the span, we want to give it a meaningful name so we can find it more easily when we look at the logs.

Remember, though, that a trace is a tree of spans. When we create the span here, OpenTelemetry automatically sets as its parent any span that might be current. In looking through our endpoint, we create several spans, and their parent span is automatically set for us so we can see the relationship of one span to another (e.g., "Span A" encompasses "Span B" through "Span D") to get a better picture of a request's flow through the system.

While that's all well and good, where do OpenTelemetry and Tracer come from? For that, let's look at the otel-integration module.
OpenTelemetry Integration Module ==
At the moment, unless one is using OpenTelemetry's automatic instrumentation to integrate the library, there's a small bit of setup that is required. This module offers three classes that do that: one to produce the OpenTelemetry instance, one for the Tracer, and one to set up a Jakarta REST filter to automatically trace incoming requests.

The producer methods are fairly standard CDI producers. For OpenTelemetry, we have this:
@ApplicationScoped
public class OpenTelemetryProducer {
    @javax.annotation.Resource(lookup="java:app/AppName")
    private String applicationName;

    private volatile OpenTelemetry openTelemetry;

    @Produces
    public OpenTelemetry getOpenTelemetryInstance() {
        OpenTelemetry localRef = openTelemetry;
        if (localRef == null) {
            synchronized (this) {
                localRef = openTelemetry;
                if (localRef == null) {
                    openTelemetry = localRef = localBuild();
                }
            }
        }

        return localRef;
    }

    private OpenTelemetrySdk localBuild() {
        final JaegerGrpcSpanExporterBuilder exporterBuilder = JaegerGrpcSpanExporter.builder();
        final BatchSpanProcessorBuilder spanProcessorBuilder = BatchSpanProcessor.builder(exporterBuilder.build());

        final SdkTracerProviderBuilder tracerProviderBuilder = SdkTracerProvider.builder()
                .addSpanProcessor(spanProcessorBuilder.build())
                .setResource(Resource.create(Attributes.of(
                        ResourceAttributes.SERVICE_NAME, applicationName,
                        AttributeKey.stringKey("foo"), "bar")));

        return OpenTelemetrySdk.builder()
                .setTracerProvider(tracerProviderBuilder.build())
                .setPropagators(ContextPropagators.create(W3CTraceContextPropagator.getInstance()))
                .buildAndRegisterGlobal();
    }
}
In this application-scoped bean, we have a producer method that... produces the OpenTelemetry instance, using the double-check idiom (you can read more about that here , if you're curious, and, yes, using it might be overkill, but better safe, eh? :) In the localBuild() method (so named to differentiate from the GlobalOpenTelemetry.get() approach), we define an exporter, which will export our trace information to an external system, Jaeger, for aggregation and study, and a Tracer provider, which lets us configure our Tracer. In this case, we want to use the Jakarta EE application name, injected into applicationName to set the service name (and we add a random attribute just to demonstrate where that might show up).

For the Tracer, we have something similar:
@ApplicationScoped
public class TracerProducer {
    @Inject
    private OpenTelemetry openTelemetry;
    private volatile Tracer tracer;

    @Produces
    public Tracer getTracer() {
        Tracer localRef = tracer;
        if (localRef == null) {
            synchronized (this) {
                localRef = tracer;
                if (localRef == null) {
                    tracer = localRef = openTelemetry.getTracer("com.steeplesoft.otel",
                            "1.0.0-SNAPSHOT");
                }
            }
        }
        return localRef;
    }
}
The only new thing of note here is simply the OpenTelemetry API call to get the Tracer instance.

Finally, let's look at our request/response filter:
@ApplicationScoped
public class OpenTelemetryFilter implements ContainerRequestFilter, ContainerResponseFilter {
    @Inject
    private OpenTelemetry openTelemetry;
    @Inject
    private Tracer tracer;

    @Override
    public void filter(ContainerRequestContext requestContext) {
        Context extractedContext = openTelemetry.getPropagators()
                .getTextMapPropagator()
                .extract(Context.current(), requestContext, new TextMapGetter<>() {
                    @Override
                    public String get(ContainerRequestContext requestContext, String key) {
                        if (requestContext.getHeaders().containsKey(key)) {
                            return requestContext.getHeaders().get(key).get(0);
                        }
                        return null;
                    }

                    @Override
                    public Iterable<String> keys(ContainerRequestContext requestContext) {
                        return requestContext.getHeaders().keySet();
                    }
                });
        final UriInfo uriInfo = requestContext.getUriInfo();
        final URI requestUri = uriInfo.getRequestUri();
        final String method = requestContext.getMethod();
        final String uri = uriInfo.getPath();

        Span serverSpan = tracer.spanBuilder(method + " " + uri)
                .setSpanKind(SpanKind.SERVER)
                .setParent(extractedContext)
                .startSpan();
        serverSpan.makeCurrent();
        serverSpan.setAttribute(SemanticAttributes.HTTP_METHOD, method);
        serverSpan.setAttribute(SemanticAttributes.HTTP_SCHEME, requestUri.getScheme());
        serverSpan.setAttribute(SemanticAttributes.HTTP_HOST, requestUri.getHost() + ":" + requestUri.getPort());
        serverSpan.setAttribute(SemanticAttributes.HTTP_TARGET, uri);

        requestContext.setProperty("span", serverSpan);
    }

    @Override
    public void filter(ContainerRequestContext containerRequestContext, ContainerResponseContext containerResponseContext) {
        Object serverSpan = containerRequestContext.getProperty("span");
        if (serverSpan != null && serverSpan instanceof Span) {
            ((Span) serverSpan).end();
        }
    }
}
There's quite a bit going on here:

First, we extract any context that may have been propagated to us by the calling system. OpenTelemetry is a specification with implementations in many languages, so it's entirely possible that, say, a JavaScript client has created a trace and is calling our service. In that case, we want to pick up that context and continue using it in this part of the system. We'll see this in use when we call between our two Java services in a bit.

Next, we extract some information about the request from the Jakarta REST UriInfo instance on the ContainerRequestContext.

We create a server span, adding information about the request as attributes.

Before finishing up the request part of the filter, we set the span as a property on the ContainerRequestContext for use later.

Finally, in the response portion of the filter, we retrieve the span from the ContainerRequestContext and call span.end()

If we deploy the services now, we'll certainly get traces logged, but when we call from Service 1 to Service 2, we lose some of the context, so let's see how to get the trace to propagate across services.
Context Propagation ==
Back in Service 1, we make a call to Service 2 in the method sendRequest(). We left that out earlier for brevity's sake, so let's look at that now:
private String sendRequest() {
    TextMapSetter<HttpRequest.Builder> setter =
            (requestBuilder, key, value) -> {
                requestBuilder.header (key, value);
            };

    HttpRequest.Builder builder = HttpRequest.newBuilder()
            .uri(URI.create("http://localhost:8080/service2-1.0-SNAPSHOT/api/endpoint2"))
            .timeout(Duration.ofMinutes(1))
            .header("Content-Type", "application/json")
            .GET();

    otel.getPropagators().getTextMapPropagator().inject(Context.current(), builder, setter);

    final HttpRequest request = builder.build();
    try {
        HttpResponse<String> response = HttpClient.newBuilder()
                .version(HttpClient.Version.HTTP_2)
                .build()
                .send(request, HttpResponse.BodyHandlers.ofString());
        return response.body();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}
Similar to reading the context from an incoming request, we create an instance of TextMapSetter<T> that will write the context propagation header to the outgoing request. We start building the request, using Java 11's HTTP client, hardcoding Service 2's API because we can. :)

In the middle, there, we ask OpenTelemetry to inject itself into the request:
otel.getPropagators().getTextMapPropagator().inject(Context.current(), builder, setter);
This line of code adds a header that our request filter above will then extract to set up the trace context in service 2. Once the context headers are injected, we finish the request to service 2, and let the request to service 1 complete as well.
Looking at the traces ==
Now that we've instrumented our application, how do we view the telemetry data? For that, we're going to use Jaeger, and since this post is already long enough and a production setup of Jaeger is out of scope, we'll go super simple. Download the Jaeger all-in-one distribution from here and run it in one terminal, then start, say, WildFly in another, and deploy our apps in still another:
$ jaeger-all-in-one --collector.zipkin.host-port=:9411
// Another terminal
$ bin/standalone.sh
// Yet another
$ cd $PROJECT_DIR
$ mvn clean package
...
$ jboss-cli.sh -c "deploy --force service1/target/service1-1.0-SNAPSHOT.war" && jboss-cli.sh -c "deploy --force service2/target/service2-1.0-SNAPSHOT.war"
$ http :8080/service1-1.0-SNAPSHOT/api/endpoint1
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 81
Content-Type: application/octet-stream
Date: Wed, 02 Jun 2021 21:44:11 GMT

Hello World, from service 1! Service 2 happened to say 'Service 2 did something!'
While this proves our services work (for some value of 'work'), where are the traces? For that, point your browser at the Jaeger UI . Once the page loads, select service1-1.0-SNAPSHOT from the Service dropdown, then click Find Traces. You should see results that look something like this:

Clicking on that first trace, should get you a screen like this:

You should see 8 spans across 2 different services. I added some random sleeps in the services to help the nested spans be more obvious.
Wrapping Up ==

I am not an expert on OpenTelemetry, so I can't guarantee this is a production-sound approach, but, Jaeger setup aside, it feels pretty solid to me at this point. At the very least, I hope it offers someone some help in getting started with OpenTelemetry. I should note that I'm doing this, at least in part, as part of the nascent efforts of adding OpenTelemetry support to WildFly, so, if all goes well, much of this will be done for you automatically if you deploy to WildFly. Until then, go forth and instrument your applications manually. :)

If you'd like to see the entire project, you can find it on GitHub , and you can find me on Twitter .