An Update on OpenTelemetry and WildFly

Friday, Jul 9, 2021 |

In a recent post, I worked through setting up OpenTelemetry support in your Jakarta EE application. Since that time, I’ve put quite a bit of work into integrating that support, as teased in the post, into WildFly. In this post, I’d like to provide an update on what that WildFly support currently looks like, and put out a request for feedback.

These changes are now part of the official WildFly builds, so this part is no longer necessary.

To get started experimenting with my changes, you need to do one of two things:

build the server from source using my branch, or

download this hopefully up-to-date binary build.__

With the current state of changes, you get the following:

CDI injection of a Tracer instance.
CDI injection of an OpenTelemetry instance should want to manually create a Tracer.
Automatic context propagation on all incoming REST requests so long as the request adheres to the OpenTelemetry context propagation spec.
Automatic context propagation on all outgoing REST Client requests. This is done via an automatically-registered ClientRequestFilter, so no additional work need be done in your application.

Along with that runtime functionality, you can configure how OpenTelemetry behaves:

service-name: The name of the service reported in the traces
exporter: Can either be jaeger or otlp. The default is jaeger using gRPC.
endpoint: The endpoint of the trace collector. The default is for Jaeger on localhost.
span-processor: Can either be batch or simple. The default is batch.
batch-delay: The time in milliseconds to delay a batch processing. This is only used if span-processor is set to batch. The default is 5000ms.
max-queue-size: The maximum size of the batch before sending. The default is 2048.
max-export-batch-size: The maximum number of samples to export at a time. The default is 512.
export-timeout: The maximum wait time while exporting traces. The default is 30000ms, or 30 seconds.
sampler: The sampler to use: on, off, or ratio
sampler-arg: The ratio to use when sampling traces.

From a WildFly configuration perspective, the configuration looks like this:

<subsystem xmlns="urn:wildfly:opentelemetry:1.0"
   exporter="jaeger"
   endpoint="http://localhost:14250"
   span-processor="batch"
   batch-delay="5000"
   max-queue-size="2048"
   max-export-batch-size="512"
   export-timeout="30000"
   />

As it stands now, it seems to work really well. In designing and implementing what I have so far, I’ve discussed things internally with other Red Hat engineers in the observability space, as well as with some in the CNCF Slack channel, but more input would be extremely helpful.

Are there features you’d like to see?

Are there any changes you’d like to see in the configuration?

Is there anything missing in the runtime support that you’d like to see?

Currently, the service name is the same for all applications deployed to a given WildFly instance. Is that acceptable? If not, if it’s technically possible, would a per-app service name be preferable?

Any and all feedback is welcome. You can always find me on Twitter or, better yet, comment on the issue in JIRA.