How to log slow async requests with Play Framework 2

In our current project we just wanted to see which requests exceeded a certain response time threshold. Additionally we wanted to know where the time was spent, i.e. which backend requests (e.g. hitting the database or search engine) contributed how much to the overall response time. So we’d like to get log entries like this:

2014-12-13 22:12:30 SLOW "GET /search?q=foo" (139 ms), segments: solrQuery (67 ms), dbLookup (69 ms)

In a synchronous/blocking application measuring subordinate requests/operations and associating them with the main request thread is fairly easy (just use ThreadLocal). In an application that runs async requests this is not the case, because there’s not that single thread that can be used as “global” state to associate measurements with.

Here comes Kamon into play, a toolkit for monitoring of reactive applications built on top of Play, Spray or Akka. Kamon uses bytecode instrumentation to introduce metrics and tracing logic into your application, so it already solves the problem of transferring the context of a request to other, subordinate asynchronous operations (segments). In Kamon this is called the TraceContext (see also their Trace documentation).

The awesome guys of Kamon also just introduced trace subscriptions, that allow users to receive information about recorded traces. And that’s exactly what we need to log slow requests and details about segments: Kamon maintains the TraceContext over async operations, we record segment information, and after the request is finished Kamon passes the combined trace information back to us so that we just need to format + log the slow trace.

In the following it’s shown what needs to be done to log slow requests using Kamon’s trace subscription.

Add Kamon to your project

At first you need to add Kamon to your project’s dependencies. Because the trace subscription is still only available in a Kamon snapshot, you also need to add the Kamon snapshots repo as resolver. In your build.sbt add the following:

resolvers += "Kamon Repository Snapshots" at "http://snapshots.kamon.io"

libraryDependencies += "io.kamon" %% "kamon-play" % "0.3.6-1b508ff95ea477387a0b27eae41ee7a76861928e"

Configure kamon to collect traces

Trace collection must be enabled by setting the kamon.trace.level to simple-trace (by default metrics-only). Because we want to be notified about traces exceeding a certain threshold we also set the sampling strategy to “threshold” and the “minimum-elapsed-time” to the duration that we consider as “slow”. For this add the following to the application.conf:

kamon {
  # Disable warnings for missing weaving, would be printed in run mode + tests
  metrics.disable-aspectj-weaver-missing-error = true
  trace {
    # trace level 'simple-trace' needed for tracing subscription (slow request logging)
    level = "simple-trace"
    # sampling: we use the threshold sampler that gathers tracing information for all traces but only reports
    # those whose elapsed-time is equal or greated to the specified threshold.
    sampling = "threshold"
    threshold-sampler {
      # Duration format: https://github.com/typesafehub/config/blob/master/HOCON.md#duration-format
      minimum-elapsed-time = 100 milliseconds
    }
  }
}

Store request method + uri in the TraceContext metadata

Because Kamon supplies trace information via messages sent to an actor (our SlowRequestLogger), in that actor we do not know anything about the request related to the trace. But of course we’d like to log the request method and uri of the request that was slow, therefore we need to make this available somehow. To achieve this we can use a filter that adds the required information to the TraceContext metadata:

This only has to be registered e.g. via object Global extends WithFilters(StoreRequestInfoFilter).

Subscribe the SlowRequestLogger to traces

The most important part then is to subscribe an actor to TraceInfo messages, in our case this is our SlowRequestLogger. Because we already have set the threshold for reported traces, we get only messages for requests that exceeded the configured threshold. The SlowRequestLogger shown below produces logs like e.g.

2014-12-13 22:12:30 SLOW “GET /search?q=foo” (139 ms), segments: solrQuery (67 ms, numItems: 42), dbLookup (69 ms)

which includes the previously stored request method + uri, the response time and information about segments. Segments are referred to by name, the time recorded for the segment is added and metadata that might have been added to the segment (like “numItems: 42” in our example). I’ll show later how to record segments.

Here’s the SlowRequestLogger:

The SlowRequestLogger companion object already allows to start the actor, we can easily do this in our Global:

The only missing piece now is that we have to record our information about our segments.

Record segment info

To record a segment we need to get the current TraceContext via TraceRecorder.withTraceContextAndSystem, start the segment before the async operation is executed, and finish the segment once the operation (Future) is completed. As an example, in our project we’re using cassandra and the datastax java-driver that returns a ResultSetFuture that we already convert to a Scala Future[ResultSet], which we can now easily intercept to record a segment:

Please note that TraceRecorder.withTraceContextAndSystem returns an Option, because the TraceContext might not be set (in the case that the application is not started/instrumented with Kamon), therefore we’re just returning the Future in the else case.

It’s also possible to record additional metadata for the segment via segment.addMetadata("key", "value") that would also be reported by our SlowRequestLogger. We’re using this for our Solr Queries to report the query time measured/returned by Solr. To intercept Solr queries we’re using the request interception facility of our async Solr client solrs, so we just need to register this TracingRequestInterceptor when building the AsyncSolrClient:

In Action

To see how/that this works you can checkout this play2-slow-request-logging project and run

$ sbt stage
$ ./target/universal/stage/bin/play2-slow-request-logging -J-javaagent:`pwd`/etc/aspectjweaver-1.8.4.jar

The aspectj weaver javaagent is needed so that Kamon can instrument the relevant classes.

Once the Play application is started you can

$ curl http://localhost:9000

which should produce a log entry like

2014-12-13 22:12:30 SLOW "GET /" (139 ms), segments: op1 (67 ms), op2 (69 ms, numItems: 2)

Conclusion

Because Kamon does all the heavy work (maintains the TraceContext) and provides the possibility to subscribe to traces it’s fairly easy to log slow requests together with details about recorded segments. Therefore I want to say a huge thanks to the Kamon guys for their awesome work!

Kommentare