什么是不稳定测试，以及如何处理它们？

Question

什么是不稳定测试，以及如何处理它们？

Accepted Answer

**不稳定测试**是指那些**在代码未改变的情况下结果不一致**的测试——有时通过，有时在相同的代码上失败。这是一个严重的问题，因为它削弱了对测试套件的信任。理解它们的原因和解决方案很重要。

## 不稳定测试是什么以及它们为何有害

```text
A FLAKY test gives INCONSISTENT results (pass sometimes, fail other times) on the SAME code:
  → harmful: ERODES TRUST — people start ignoring failures ("oh, it's just flaky") →
    real failures get missed too
  → waste time on false alarms / re-runs; break CI; reduce confidence in the whole suite
→ Flaky tests are worse than no test if they make people distrust all tests.
```

## 常见原因

```text
✗ TIMING / async → race conditions; not waiting properly for async operations (a top cause
  in UI/E2E tests); arbitrary sleeps
✗ ORDER DEPENDENCE → tests depending on each other / shared mutable state
✗ EXTERNAL dependencies → real network/services (network blips, rate limits, downtime)
✗ NON-DETERMINISM → time/dates, randomness, timezone, locale
✗ Test ENVIRONMENT → leftover state, uncleaned data, concurrency/parallelism issues
```

## 修复和管理不稳定测试

```text
✓ Fix the ROOT CAUSE: wait for conditions (not sleeps); make tests INDEPENDENT and clean
  up state; MOCK external dependencies; control time/randomness (inject them)
✓ Ensure proper isolation (no shared state, no order dependence)
✓ Don't just RETRY blindly (it hides the problem) — investigate and fix
✓ QUARANTINE persistently flaky tests (isolate so they don't block) WHILE fixing them
✓ Treat flakiness seriously — track and address it (it degrades the whole suite)
```

## 为什么这很重要

理解不稳定测试及如何处理它们很重要，因为**不稳定测试是一个严重的、常见的问题，会破坏整个测试套件的价值**，所以这是有价值的实用知识。

一个不稳定测试（在相同代码上不一致地通过或失败）特别有害，因为它**破坏信任**：当测试不可预测地失败时，人们开始忽视失败（"这只是不稳定"），这意味着**真正的失败也会被忽视** —— 因此不稳定性会降低对整个套件的信心，可能使其变得比无用更糟。

理解这种危害会促使人们认真对待不稳定性。

理解**常见原因** —— **时序/异步问题**（竞态条件、未正确等待异步操作、任意睡眠——尤其是在 UI/E2E 测试中的首要原因）、**顺序依赖**（测试相互依赖或依赖共享状态）、**外部依赖**（真实网络/服务及其中断和停机）、**非确定性**（时间、随机性、时区、本地化）和**环境问题**（残留状态、并发） —— 是诊断不稳定性所必需的。

理解如何**修复和管理**它们 —— 修复根本原因（用等待条件替代睡眠、使测试独立和清洁、mock 外部依赖、控制时间和随机性）、确保隔离、**不盲目重试**（这会隐藏问题）、在修复持久性不稳定测试时对其进行隔离，以及将不稳定性视为需要跟踪和解决的严重问题 —— 反映了正确的方法。

由于不稳定测试是一个常见的、严重的问题，会破坏对测试套件的信任和价值（在许多真实项目中是真实问题），而且理解其原因和正确处理（修复根本原因、隔离、而不是盲目重试）是维护可信赖测试套件所必需的，理解不稳定测试是有价值的、具有实际重要性的知识——解决一个重大的现实世界测试问题，对于保持测试套件的可靠性和可信度很重要，并反映了处理测试最令人沮丧和最具破坏性问题之一的实践成熟度。