In this post I want to show on the basis of a short example how lazy data can be employed selectively as a design pattern to improve the readability of your code.
There are many articles already from the Haskell sphere touting the supposed advantages of non-strictly evaluated programming languages (laziness by default): composibility, performance, correctness, etc. This is not one of them. Personally, I always found those articles a bit divorced from reality. Not that what they’re saying is necessarily wrong, but they do a bad job convincing me, a programmer building CRUD web apps, why I should care.
What is it that laziness gives us? Laziness means that a value is only evaluated once it’s needed. It allows us to describe how to compute a value without necessarily doing it at that time. In that sense a lazy value is very similar to a function: it is a tool of abstraction that gives a name to a concept that is then used by someone else. What separates a lazy value from an ordinary function is that the lazy value will remember its value and not evaluate itself again for every access (Let’s call this “memoization”). Some languages with laziness-by-default also implement memoization for all functions, completing the equivalence between functions and values. However, memoization is not of importance to the example that I’m going to show.
Inversion of Control #
The main thing to understand about lazy data is that it creates inversion of control. The term “inversion of control” is usually used in the context of application frameworks: Instead of writing the application outside-in and calling out to libraries, we let the framework call us: We offer a set of functionalities to the framework and let it decide when it is appropriate to execute them. Lazy data works similarly: We return a large data type (perhaps infinitely large) and the caller picks only what they need, without incurring the cost of computing things that they don’t need. Control of the evaluation order is given to the consumer of the data and not the producer, as is usually the case in eager languages. This inversion of control through laziness can often simplify the interface of a function, for example when working with lazy streams we don’t have to tell the stream-producing function how many elements we want beforehand. We simply take as many as we need.
Compare the eager variant:
def eagerNaturalNumbers(howMany: Int): List[Int] = {
if (howMany == 0)
List.empty()
else
eagerNaturalNumbers(howMany - 1) :+ howMany
}
def eagerCountTo10(): Unit = {
eagerNaturalNumbers(howMany = 10).forEach { (num: Int) =>
println(num)
}
}
with the lazy variant:
// Doesn't need a parameter
val lazyNaturalNumbers = Stream.iterate[Int](start = 0) { (prev: Int) => prev + 1 }
def lazyCountTo10(): Unit = {
lazyNaturalNumbers.take(10).forEach { (num: Int) =>
println(num)
}
}
Example #
Let us now look at the following real world example to understand how inversion of control through laziness can be employed in practice: We are building an HTTP API that returns product recommendations for an online shop. This API should support content type negotiation and return its response either as JSON or HTML, depending on what content type the client has requested with an Accept header. In most HTTP frameworks we can implement this functionality in a non-obstrusive way by supplying the framework with different serializers (a JSON serializer and an HTML serializer) for the Reco object. We then simply return the Reco POJO object and let the framework pick the right serializer automatically:
/** Presentation layer class that deals with concerns of the HTTP API (exposing routes, parsing headers, ...) */
class RecoRoutes(val similarProductsUseCase: SimilarProductsUseCase) {
@Get("/reco?userid={userId}")
def getReco(userId: String): HttpResponse = {
val resultReco: Reco = similarProductsUseCase.computeSimilarProductsReco(User.validated(userId))
HttpResponse(200, resultReco)(using CombinedSerializer(recoJsonSerializer, recoHtmlSerializer))
}
given recoJsonSerializer: Serializer[Reco] = new Serializer(contentType = "application/json") {
def serialize(reco: Reco): String = Json.toJson(reco)
}
given recoHtmlSerializer: Serializer[Reco] = new Serializer(contentType = "text/html") {
def serialize(reco: Reco): String = renderHtml(reco)
}
def renderHtml(reco: Reco) = ...
}
/** Application layer class that orchestrates all the business logic, database loading and so on */
class SimilarProductsUseCase {
def computeSimilarProductsReco(user: User): Reco = ...
}
This is easy enough. But now imagine that additional application logic is necessary to actually do the HTML rendering (which is not necessary for JSON rendering). We may have to load images from a database, decide what texts to use for this particular user, do a network request to a React.js microservice for the HTML templating, you get the idea: this is more complicated logic that we probably don’t want to have in our RecoRoutes class. This is also expensive logic which should only be done if a "text/html" content type was in fact requested and not needlessly for JSON.
In order to improve testability and understanding, we want to keep this application logic, database requests and network requests out of our RecoRoutes class. It should be possible to unit test RecoRoutes independently with only a HTTP request and fake result from SimilarProductsUseCase, without mocking databases and other external systems. To that end, I often see code like this being implemented where the logic for HTML and JSON rendering has moved into SimilarProductsUseCase and the caller uses a htmlOrJson parameter to change the behaviour of SimilarProductsUseCase:
enum ResultType {
case Html, Json
}
/** Presentation layer class that deals with concerns of the HTTP API (exposing routes, parsing headers, ...) */
class RecoRoutes(val similarProductsUseCase: SimilarProductsUseCase) {
@Get("/reco?userid={userId}")
def getReco(userId: String, headers: Map[String, String]): HttpResponse = {
val htmlOrJson = headers.get("Accept") match {
case Some("text/html") => ResultType.Html
case _ => ResultType.Json
}
val resultReco: RecoHtml | RecoJson = similarProductsUseCase
.computeSimilarProductsReco(User.validated(userId), /* ---> */ htmlOrJson /* <--- */)
resultReco match {
case resultRecoHtml: RecoHtml => HttpResponse(200, resultRecoHtml.htmlString)
case resultRecoJson: RecoJson => HttpResponse(200, resultRecoJson.jsonString)
}
}
}
/** Application layer class that orchestrates all the business logic, database loading and so on */
class SimilarProductsUseCase {
def computeSimilarProductsReco(user: User, htmlOrJson: ResultType): RecoHtml | RecoJson = {
... // lots of business logic
collaboratingClass.createHtmlOrJsonResult(..., htmlOrJson)
}
}
class CollaboratingClass {
def createHtmlOrJsonResult(..., htmlOrJson: ResultType): RecoHtml | RecoJson = {
if (htmlOrJson == ResultType.Html) {
val images = loadImagesFromDb()
...
reactMicroservice.requestHtml(...)
}
else
...
}
}
You can imagine that this htmlOrJson parameter makes the control flow harder to understand, especially if it has to be passed through multiple levels of collaborating classes, given that the HTML and JSON rendering is likely to happen only at the end of the implementation after lots of other logic to compute the result. How can we fix it? By turning the sum type RecoJson | RecoHtml into a lazy product type containing both HTML and JSON:
trait Reco {
lazy val json: String
lazy val html: Future[String]
}
class SimilarProductsUseCase {
def computeSimilarProductsReco(user: User /* no htmlOrJson parameter needed anymore! */): Reco = {
... // lots of business logic
collaboratingClass.createHtmlOrJsonResult(...)
}
}
class RecoRoutes(val similarProductsUseCase: SimilarProductsUseCase) {
@Get("/reco?userid={userId}")
def getReco(userId: String): HttpResponse = {
val resultReco: Reco = similarProductsUseCase.computeSimilarProductsReco(User.validated(userId))
HttpResponse(200, resultReco)(using CombinedSerializer(recoJsonSerializer, recoHtmlSerializer))
}
given recoJsonSerializer: Serializer[Reco] = new Serializer(contentType = "application/json") {
def serialize(reco: Reco): String = reco.json
}
given recoHtmlSerializer: Serializer[Reco] = new Serializer(contentType = "text/html") {
def serialize(reco: Reco): String = Await.result(reco.html) // <-- NOT GOOD
}
def renderHtml(reco: Reco) = ...
}
class CollaboratingClass {
def createHtmlOrJsonResult(... /* no htmlOrJson parameter needed anymore! */): Reco = {
...
new Reco {
lazy val json = ...
lazy val html = {
val images = loadImagesFromDb()
...
reactMicroservice.requestHtml(...)
}
}
}
By turning Reco into a product type containing lazy properties of both the JSON and HTML result, we can get rid of all the htmlOrJson parameters that previously had to be passed through multiple levels of functions. The application logic thus becomes easier to follow as there are no longer multiple possible code paths. We no longer have to keep this parameter and its influence in our mind when trying to understand CollaboratingClass. There is, of course, a disadvantage to the design pattern which is also highlighted in the example: since the lazy properties Reco.json and Reco.html are only evaluated when needed, any possible side effects and errors that their computations may cause are delayed until a different point in the program. In this case, the evaluation is eventually forced in the Serializer framework classes, but Serializer was not designed with side effects in mind. Due to this design fault, we have to blockingly wait for the Future[String] with Await.result (a big no no for performance) and an error in the Reco.html computation may not be handled correctly by the framework. (In this case the issue could be worked around by doing the content-type negotiation manually in the getReco function instead of relying on the framework Serializer class but it is not always this easy). You should keep in mind that frameworks with a more imperative design will sometimes not work well with laziness (looking at you Akka HTTP!) whereas pure functional programming frameworks like Http4s are designed with side effects and laziness in mind.
Kommentare