In our last weekly knowledge sharing session at inoio, we discussed our experiences and thoughts on how to design kafka topics. I.e. how to decide to which topic a (new) event type should be published. In customer projects sometimes we’ve seen that Kafka topics have not been chosen properly, so that later this had to be changed. Since mostly several systems / teams are affected by such a change, this causes quite some effort. So it is better to invest some time in this decision in advance. How to choose the Kafka topic for an event doesn’t seem to be discussed that much in the public, therefore we want to share our thoughts on that here.
At first an important thing to know: when consuming events, ordering is guaranteed for a topic partition. Because events are often assigned to a partition according to the key (the hash of the key), events must be published with the same key to the same topic (e.g. key = $userId
, topic = user
) to get a guaranteed ordering. Alternatively it’s possible to assign the partition “manually”.
Regarding guidelines at first let’s have a look at two extremes:
- One topic per event type
- One topic for all event types
One topic per event type
The consequence here is clear: when there’s a happens-before relationship between two events (respectively event types) E1 and E2, where E1 is published to topic T1 and E2 is published to T2, the order as seen by the consumer is not guaranteed. I.e. in the example above, it might happen that Service 3 consumes E2 before E1, although E1 actually happened before E2.
The advantages of that approach are that
- consumers receive only events they’re interested in
- fine grained data protection/access policies can be applied (via ACLs)
- the topic configuration can be optimized for each event type, according to the workload
One topic for all event types
This means that every consumer receives all events, even if it’s only interested in certain event types:
Depending on the workload and throughput this might lead to a delay in event processing in peak scenarios. To deal with such a situation the number of partitions (and optionally also the number of consumers per consumer group) could be increased - to scale out and parallelize event processing.
Another consequence is that the topic configuration (e.g. number of partitions, replication factor) has to match the requirements of maybe very different workloads. An optimization of these settings becomes harder.
The advantage of the extreme “global topic” approach is that the ordering for all event types is guaranteed (assuming that keys are properly chosen / partitions are assigned properly).
Conclusion / guidelines
Based on these considerations we’d choose something in the middle of the two extremes. Here are points that could help to find a good balance:
- A good boundary for topics could be the bounded context / domain or subdomain, that a topic or event type belongs to. I.e. if two events A and B belong to different domains, they should probably go into different topics.
- If there’s a very strong relationship between two event types, they might go into the same topic (maybe regardless of the required ordering guarantees).
- If there’s a strong requirement regarding the correct ordering of two events A and B, then they should be published to the same topic. The frequency and timely relatedness (like the delay between) A and B may also be relevant and considered.
- Events containing very sensitive data, or events where access by services needs to be restricted severely, could be separated from other events with different access restrictions - i.e. these should be published to different topics.
- An event with an extremely high throughput should probably be published to a dedicated topic.
If you have additional points, you disagree with some of the above or you have other feedback please let us know. In case you want to accelerate your project, feel free to call us. We are looking forward to support your project or team and are curious about your individual challenges we could solve together.
Kommentare