Dies ist eine Fortsetzung des Blog-Posts Fehlerbehandlung für Kafka Consumer mit Retries, mit seitdem gewonnenen Erkenntnissen.
Kafka offers two cleanup policies, which seems simple enough: “delete”, where data is deleted after a certain amount of time, and “compact”, where only the most recent value is kept for any key. But what if data is not deleted/compacted as expected?
Our systems today are typically distributed, and sometimes integrated via an event bus such as Kafka. We store data in a database and publish events to inform other systems of changes. For example, the system that stores a Thing
is eventually consistent with the other systems that consume the ThingCreated
event. This means that at some point the other systems will be in the state that they should reach when they find out about the new Thing
. When systems fail to achieve this level of consistency, it often requires significant time for analysis, troubleshooting and consistency restoration. We would like to save ourselves this time and instead develop correct systems.
Murphy’s Law sagt: “Anything that can go wrong will go wrong” - wenn auf etwas Verlass ist, dann auf den Fehlerteufel. Deshalb schauen wir uns an, wie wir in den Kafka-Consumern die Event-Verarbeitung mit Retries robuster bauen können. Wir benutzen im Projekt Kafka mit Kotlin und spring-kafka, die grundlegenden Konzepte lassen sich aber auch auf andere Systeme übertragen.
Because we’re using Apache Kafka again and again in our projects and so far I didn’t find the “most important things” sufficiently compact in one place - I have taken the time to prepare this for myself/us/you. The “most important things” for me are the basic concepts and some configuration properties for brokers/producers/consumers that one should know when choosing trade-offs from e.g. consistency/durability and availability.
In our last weekly knowledge sharing session at inoio, we discussed our experiences and thoughts on how to design kafka topics. I.e. how to decide to which topic a (new) event type should be published. In customer projects sometimes we’ve seen that Kafka topics have not been chosen properly, so that later this had to be changed. Since mostly several systems / teams are affected by such a change, this causes quite some effort. So it is better to invest some time in this decision in advance. How to choose the Kafka topic for an event doesn’t seem to be discussed that much in the public, therefore we want to share our thoughts on that here.