As the tech landscape progressively adopts OpenTelemetry as the go-to standard for application telemetry, I feel compelled to bring attention to an all too common issue I encounter in various blog posts and documentation - the neglect of the OpenTelemetry Protocol as a singular exporter. When you peruse the official OpenTelemetry documentation or even delve into an article about exporters, you often stumble upon a direct configuration of OpenTelemetry tracing with a Zipkin exporter.
|
|
A quick search for OpenTelemetry.Exporter
on nuget.org brings forth an array of exporters suitable for your telemetry output. However, simply incorporating these packages hardly seems like the most efficient approach. The software I develop is generally deployed across multiple environments, each bearing unique requirements. For example, during development, console output might be my telemetry method of choice. But when it comes to testing or acceptance stages, we could be looking at a centralized Application Performance Monitoring (APM) service that holds onto data for a limited period of a week. The story changes further in production, where we may need to dispatch discrete telemetry to multiple APM SaaS providers.
If you choose to use OpenTelemetry.Exporter.*
packages, your application will demand a fair share of conditional coding to meet these diverse environmental needs. I firmly believe that applications should focus on producing telemetry rather than worry about its destination. This is where the OpenTelemetry Collector takes center stage, relieving your application of the exporting task. A bonus point to consider is the Collector’s ability to handle multiple APM services, thus enhancing your application’s overall efficiency.
OpenTelemetry Collector
Rather than directly employing the aforementioned exporters in our application and configuration, we shift to using OpenTelemetry.Exporter.OpenTelemetryProtocol as our exporter of choice. This necessitates modifying AddZipkinExporter()
to AddOtlpExporter()
in the configuration. The default configuration of the OTLP exporter sends all telemetry to http://localhost:4317
via gRPC, but it can be tailored for any destination:
|
|
If you’ve implemented these changes, give yourself a pat on the back! Your application now logs into oblivion since nothing is listening on http://localhost:4317
. Let’s rectify this by launching the OpenTelemetry Collector and setting up suitable port forwarding on the container, enabling it to receive the telemetry data.
I prefer running the OpenTelemetry Collector in a Docker container, which can be done using the following docker-compose.yaml
:
|
|
In the Docker Compose file, we mount a configuration file for the container. This file orchestrates the exporters, processors and receivers for the OpenTelemetry Collector. Assuming that you’re utilizing Zipkin as demonstrated earlier, we can resort to this simple configuration:
|
|
Here, we’ve set up an OTLP receiver using the GRPC protocol, a batch processor, and the Zipkin exporter. Next, we set up a pipeline for the traces signal, which includes receivers (IN), processors (TRANSFORM), and exporters (OUT). In this instance, we’ve only configured the traces pipeline. For a more comprehensive configuration overview, click here. Once everything is in place and the Docker Compose file is running, your traces should be visible in Zipkin as before.
You might be thinking, the situation seems unchanged, apart from the introduction of additional dependencies and configuration. But, the power of the OpenTelemetry Collector is about to shine. By integrating more exporters into the configuration file, the application code remains unaltered.
You might have noticed the -contrib
suffix in the Docker Compose file. This stems from the structure of the OpenTelemetry Collector project, a part of the larger OpenTelemetry project. This project encourages community contributions, including the creation of their own exporters, processors, and receivers, which are grouped together in the OpenTelemetry Collector Contrib project. This community-oriented structure simplifies the expansion of your telemetry exporting capabilities. All you need to do is include your preferred receiver, processor, or exporter in your configuration file. For a comprehensive list of supported receivers, processors, and exporters, refer to the respective links.
Working locally
Despite my admiration for products like Honeycomb and DataDog, I prefer not to be dependent on them for local development. Given that internet blackouts have been frequent during my past commutes, I find the ability to work offline very important. In these circumstances, the OpenTelemetry Collector proves highly beneficial.
There exist numerous impressive Docker images that can be configured for use with the collector.
Tracing
For tracing, the solution is relatively straightforward. There are many open source systems that allow you to run an instance locally. I frequently use Jaeger, an excellent open source tracing system. There is a docker image that allows us to spin up a local instance of Jaeger. Let’s adjust our docker-compose.yaml
file to utilize this image by adding the following:
|
|
Next, let’s modify the configuration file to use the Jaeger exporter:
|
|
After restarting your docker-compose and running the application, navigate to http://localhost:5000/trace. The Jaeger UI should be available at http://localhost:16686, where you should see the traces of the producer appearing in the UI.
Metrics
Metrics are a bit more complex to handle than tracing. Although there are several open source systems available, they tend not to be as straightforward to set up as Jaeger. For local development, I’ve found that the best solution is to use Prometheus. But, as Prometheus operates on a pull-based system, the configuration is slightly different.
In the docker-compose.yml
we add the following:
|
|
It’s crucial to note that we’re adding a links
component to the Prometheus container. This is because we need to be able to reach the OpenTelemetry Collector from the Prometheus container.
In the otel-collector-config.yaml
we configure the collector to use Prometheus as an exporter:
|
|
Here, the endpoint doesn’t signify the endpoint of the Prometheus instance, but rather an endpoint of the OpenTelemetry Collector. The Prometheus exporter will initiate a server on this endpoint, which Prometheus can then scrape.
Lastly, we need to configure Prometheus to scrape the OpenTelemetry Collector. This is accomplished in the prometheus-config.yaml
file:
|
|
As with the Jaeger exporter, we need to restart docker-compose and run the application again. This time, visit http://localhost:5000/metric. The Prometheus UI should be accessible at http://localhost:9090. If you enter random_number_count
into the search field, you should see the metrics appear in the UI.
Logging
The logging signal specification was among the last to stabilize in the OpenTelemetry project, happening on the 11th of may 2022. During most of my explorations with OpenTelemetry, many systems implementing the specification were yet to support this feature. I’ve only utilized the logging signal in a production environment once, and during that period, didn’t spend much time operating a local logging system. However, while preparing this article, I managed to successfully use Loki as an export target for logging.
Like the other signals, we need to include the Loki docker image in the docker-compose.yml
file:
|
|
Ensure that Loki is linked to the OpenTelemetry Collector, so that the collector can access Loki. Loki also needs a configuration file. I found the following config online. This might contain more than required, but it works. As I’m not entirely familiar with Loki, I won’t go into detail about the configuration. You can find it in the source article, or in my repository.
Since Loki is headless, if you want to visualize your logs, you’ll need a UI. Grafana can serve this purpose. We add the following to the docker-compose.yml
file:
|
|
Here, we link Grafana to Loki as it must be able to access it. We also have to configure Grafana to use Loki as a data source. This is done in the grafana-datasources.yaml
file:
|
|
The final configuration step is to set the OpenTelemetry Collector to use Loki as an exporter. This is done in the otel-collector-config.yaml
file:
|
|
And with that, everything should be ready. We just need to restart docker-compose and run the application again, this time visiting http://localhost:5000/Log. Grafana should be available at http://localhost:3000/explore. If you enter {job="producer"}
into the query field, you should see the logs appear.
As previously stated, my experience with Loki is limited. Therefore, if you detect any inaccuracies or oversights, feel free to reach out to me directly.
Application Performance Monitoring
We now have all the individual signals functioning locally, a feat in itself, but let’s aim higher. In a perfect world, we would want to emulate a full APM solution akin to Honeycomb or DataDog. Regrettably, after an extensive search, I’m yet to stumble upon a solution that fits the bill.
Take Uptrace for instance, a service I managed to get up and running. If you follow the steps I used and run the docker-compose
available here, you should see your telemetry data appear in the UI as well. However, the interface leaves much to be desired, being somewhat unintuitive. In fact, I found myself investing more time in wrestling with the UI to get my telemetry data than if I had just switched between individual services.
Another option I’ve come across is Grafana Labs. This company, which offers multiple products, three of which I’ve mentioned in this post, might present a viable solution. Hypothetically, Jaeger could be substituted with Grafana Tempo, one of their offerings. However, I must admit I haven’t tested this out yet. The endgame, it seems, would involve integrating all of Grafana’s products and configuring a personalized APM solution via the Grafana UI. This approach could be worthwhile if you’re already utilizing Grafana products for your production systems, as it might enable you to replicate your production APM environment locally. But be prepared - this endeavor would require a substantial investment of time and effort, commodities I’ve not yet been able to spare.
Lastly, I’ve identified a few more APM solutions that I haven’t explored as yet:
- SigNoz, documentation for which can be found here
- Elastic APM, the quick start guide is available here
To sum it up, finding the perfect local APM solution remains a challenging goal. However, with continued investigation and experimentation, I believe we’ll get there.