可用性(システムが稼働していてアクセス可能である)と信頼性(システムが正しく動作する)は主要な非機能要件です。これらを達成するには、冗長性、障害耐性、単一障害点の排除、および障害の適切な処理が必要です。
可用性 vs 信頼性
AVAILABILITY → the system is UP and responsive (accessible when needed):
→ measured as uptime % ("nines": 99.9% = ~8.7h/year down; 99.99% = ~52min/year)
RELIABILITY → the system works CORRECTLY (does what it should, without failures/errors):
→ related but distinct (a system can be up but returning wrong results — available but
unreliable)
→ both matter: users need the system available AND working correctly.
高可用性の実現
✓ REDUNDANCY → multiple instances/copies → no single point of failure (if one fails,
others serve) — the core principle
✓ Spread across AVAILABILITY ZONES / regions → survive data center/region failures
✓ FAILOVER → automatically switch to backups when something fails
✓ LOAD BALANCING + health checks → route around failed instances
✓ Database replication; eliminate SINGLE POINTS OF FAILURE everywhere
