Kafka Streams は ストリーム処理 のためのライブラリです――Kafka の topic 内のデータをリアルタイムで処理・変換するアプリケーション(ストリームのフィルタリング、変換、集約、結合)を構築します。これにより、Kafka 上で直接リアルタイムのデータ処理が可能になります。
ストリーム処理とは何か
STREAM PROCESSING → process data CONTINUOUSLY as it arrives (in real time), vs batch
(processing stored data periodically):
→ consume events from topics, transform/analyze them, produce results (often to other topics)
→ real-time: react to and process events as they happen (low latency)
→ for: real-time analytics, transformations, monitoring, enrichment, aggregations
Kafka Streams が提供するもの
KAFKA STREAMS = a Java/Scala LIBRARY for building stream-processing apps on Kafka:
→ read from topics, process, write to topics (a processing topology)
→ OPERATIONS: map, filter, transform; aggregations (count, sum); windowing (time windows);
JOINS (join streams/tables)
→ STATEFUL processing → maintain state (e.g. running counts) with fault tolerance
→ STREAMS vs TABLES → a stream (events) and a table (state/changelog) duality (KStream/KTable)
→ it's a library (runs in your app) — no separate cluster needed (vs Flink/Spark)
