Kafka Streams 是一个流处理库 — 构建在 Kafka topics 中实时处理和转换数据的应用程序(过滤、转换、聚合、连接流)。它在 Kafka 上直接启用实时数据处理。
什么是流处理
STREAM PROCESSING → process data CONTINUOUSLY as it arrives (in real time), vs batch
(processing stored data periodically):
→ consume events from topics, transform/analyze them, produce results (often to other topics)
→ real-time: react to and process events as they happen (low latency)
→ for: real-time analytics, transformations, monitoring, enrichment, aggregations
Kafka Streams 提供了什么
KAFKA STREAMS = a Java/Scala LIBRARY for building stream-processing apps on Kafka:
→ read from topics, process, write to topics (a processing topology)
→ OPERATIONS: map, filter, transform; aggregations (count, sum); windowing (time windows);
JOINS (join streams/tables)
→ STATEFUL processing → maintain state (e.g. running counts) with fault tolerance
→ STREAMS vs TABLES → a stream (events) and a table (state/changelog) duality (KStream/KTable)
→ it's a library (runs in your app) — no separate cluster needed (vs Flink/Spark)
