分散システム(ネットワークを介して連携する複数のコンピュータ)は、単一マシンシステムには存在しない重大な課題——ネットワークの信頼性の欠如、部分的な障害、一貫性、調整など——をもたらします。スケール時のシステム設計には、これらを理解することが不可欠です。
なぜ重要なのか
Multiple machines communicating over a NETWORK introduce fundamental challenges:
→ the NETWORK is unreliable (latency, packet loss, partitions) and not instant
→ PARTIAL FAILURES → some parts fail while others work (vs all-or-nothing on one machine)
→ no shared memory/clock → coordination is hard
→ "the network is reliable" etc. are FALLACIES — distributed systems break these assumptions.
主要な課題
NETWORK → unreliable, variable latency, partitions (can't assume messages arrive/are fast)
PARTIAL FAILURE → handle some components failing (detect, retry, recover); is it down or slow?
CONSISTENCY → keeping data consistent across nodes (CAP trade-offs; eventual consistency)
COORDINATION → consensus, distributed agreement, leader election (hard; e.g. Raft/Paxos)
ORDERING/time → no global clock; event ordering across nodes is hard
CONCURRENCY → many things happening at once; race conditions across nodes
IDEMPOTENCY → handle duplicate messages/retries safely
