In our current project we just wanted to see which requests exceeded a certain response time threshold. Additionally we wanted to know where the time was spent, i.e. which backend requests (e.g. hitting the database or search engine) contributed how much to the overall response time. So we’d like to get log entries like this:
2014-12-13 22:12:30 SLOW "GET /search?q=foo" (139 ms), segments: solrQuery (67 ms), dbLookup (69 ms)
In a synchronous/blocking application measuring subordinate requests/operations and associating them with the main request thread is fairly easy (just use ThreadLocal
). In an application that runs async requests this is not the case, because there’s not that single thread that can be used as “global” state to associate measurements with.
Here comes Kamon into play, a toolkit for monitoring of reactive applications built on top of Play, Spray or Akka. Kamon uses bytecode instrumentation to introduce metrics and tracing logic into your application, so it already solves the problem of transferring the context of a request to other, subordinate asynchronous operations (segments). In Kamon this is called the TraceContext
(see also their Trace documentation).
The awesome guys of Kamon also just introduced trace subscriptions, that allow users to receive information about recorded traces. And that’s exactly what we need to log slow requests and details about segments: Kamon maintains the TraceContext
over async operations, we record segment information, and after the request is finished Kamon passes the combined trace information back to us so that we just need to format + log the slow trace.
In the following it’s shown what needs to be done to log slow requests using Kamon’s trace subscription.
Add Kamon to your project
At first you need to add Kamon to your project’s dependencies. Because the trace subscription is still only available in a Kamon snapshot, you also need to add the Kamon snapshots repo as resolver. In your build.sbt
add the following:
resolvers += "Kamon Repository Snapshots" at "http://snapshots.kamon.io"
libraryDependencies += "io.kamon" %% "kamon-play" % "0.3.6-1b508ff95ea477387a0b27eae41ee7a76861928e"
Configure kamon to collect traces
Trace collection must be enabled by setting the kamon.trace.level
to simple-trace
(by default metrics-only
). Because we want to be notified about traces exceeding a certain threshold we also set the sampling strategy to “threshold” and the “minimum-elapsed-time” to the duration that we consider as “slow”. For this add the following to the application.conf
:
kamon {
# Disable warnings for missing weaving, would be printed in run mode + tests
metrics.disable-aspectj-weaver-missing-error = true
trace {
# trace level 'simple-trace' needed for tracing subscription (slow request logging)
level = "simple-trace"
# sampling: we use the threshold sampler that gathers tracing information for all traces but only reports
# those whose elapsed-time is equal or greated to the specified threshold.
sampling = "threshold"
threshold-sampler {
# Duration format: https://github.com/typesafehub/config/blob/master/HOCON.md#duration-format
minimum-elapsed-time = 100 milliseconds
}
}
}
Store request method + uri in the TraceContext metadata
Because Kamon supplies trace information via messages sent to an actor (our SlowRequestLogger
), in that actor we do not know anything about the request related to the trace. But of course we’d like to log the request method and uri of the request that was slow, therefore we need to make this available somehow. To achieve this we can use a filter that adds the required information to the TraceContext
metadata:
This only has to be registered e.g. via object Global extends WithFilters(StoreRequestInfoFilter)
.
Subscribe the SlowRequestLogger to traces
The most important part then is to subscribe an actor to TraceInfo
messages, in our case this is our SlowRequestLogger
. Because we already have set the threshold for reported traces, we get only messages for requests that exceeded the configured threshold. The SlowRequestLogger
shown below produces logs like e.g.
2014-12-13 22:12:30 SLOW “GET /search?q=foo” (139 ms), segments: solrQuery (67 ms, numItems: 42), dbLookup (69 ms)
which includes the previously stored request method + uri, the response time and information about segments. Segments are referred to by name, the time recorded for the segment is added and metadata that might have been added to the segment (like “numItems: 42” in our example). I’ll show later how to record segments.
Here’s the SlowRequestLogger
:
The SlowRequestLogger
companion object already allows to start
the actor, we can easily do this in our Global
:
The only missing piece now is that we have to record our information about our segments.
Record segment info
To record a segment we need to get the current TraceContext
via TraceRecorder.withTraceContextAndSystem
, start the segment before the async operation is executed, and finish the segment once the operation (Future) is completed. As an example, in our project we’re using cassandra and the datastax java-driver that returns a ResultSetFuture
that we already convert to a Scala Future[ResultSet]
, which we can now easily intercept to record a segment:
Please note that TraceRecorder.withTraceContextAndSystem
returns an Option
, because the TraceContext
might not be set (in the case that the application is not started/instrumented with Kamon), therefore we’re just returning the Future
in the else case.
It’s also possible to record additional metadata for the segment via segment.addMetadata("key", "value")
that would also be reported by our SlowRequestLogger
. We’re using this for our Solr Queries to report the query time measured/returned by Solr. To intercept Solr queries we’re using the request interception facility of our async Solr client solrs, so we just need to register this TracingRequestInterceptor
when building the AsyncSolrClient
:
In Action
To see how/that this works you can checkout this play2-slow-request-logging project and run
$ sbt stage
$ ./target/universal/stage/bin/play2-slow-request-logging -J-javaagent:`pwd`/etc/aspectjweaver-1.8.4.jar
The aspectj weaver javaagent is needed so that Kamon can instrument the relevant classes.
Once the Play application is started you can
$ curl http://localhost:9000
which should produce a log entry like
2014-12-13 22:12:30 SLOW "GET /" (139 ms), segments: op1 (67 ms), op2 (69 ms, numItems: 2)
Conclusion
Because Kamon does all the heavy work (maintains the TraceContext
) and provides the possibility to subscribe to traces it’s fairly easy to log slow requests together with details about recorded segments. Therefore I want to say a huge thanks to the Kamon guys for their awesome work!
Kommentare