什么导致 goroutine 泄漏，如何防止它们？

Question

Accepted Answer

**goroutine 泄漏**是指一个永远无法终止的 goroutine——它被阻塞或永远运行，在程序的整个生命周期内消耗内存（并保持其引用的对象活跃）。由于 goroutine 启动成本低廉，很容易泄漏它们，泄漏会无声地积累，直到服务性能下降或内存耗尽。

## 原因 1：在没有发送者/接收者的 channel 上阻塞

```go
// ❌ LEAK — this goroutine blocks forever waiting to send
func leak() {
    ch := make(chan int)   // unbuffered
    go func() {
        ch <- 42           // blocks forever — NOBODY ever receives
    }()                     // the goroutine is stuck, never exits
    // function returns without reading ch → the goroutine leaks
}
```

在 channel 的发送/接收操作上阻塞且永不完成的 goroutine 将**永远**等待——它永远不会被垃圾回收，因为在技术上它仍然"在运行"。

## 原因 2：没有取消机制 / done 信号

```go
// ❌ LEAK — a goroutine in an infinite loop with no way to stop
func worker(jobs <-chan int) {
    for {
        job := <-jobs      // blocks forever if jobs is never closed and stops sending
        process(job)
    }                       // no exit condition — leaks when no longer needed
}
```

## 防止方法 1：使用 context 进行取消

```go
// ✅ the goroutine can be told to stop
func worker(ctx context.Context, jobs <-chan int) {
    for {
        select {
        case job := <-jobs:
            process(job)
        case <-ctx.Done():       // cancellation signal → exit cleanly
            return                // the goroutine terminates, no leak
        }
    }
}
// caller: ctx, cancel := context.WithCancel(...); defer cancel()
```

为每个长生命周期的 goroutine 提供被取消的方式（`<-ctx.Done()`）是主要防御措施——它确保当不再需要 goroutine 的工作时，goroutine 能够终止。

## 防止方法 2：使用带缓冲的 channel 或确保接收者存在

```go
// ✅ ensure a receiver exists, or use a buffered channel so the send doesn't block
ch := make(chan int, 1)   // buffered → the send completes even if no one receives yet
go func() { ch <- 42 }()   // doesn't block
```

## 防止方法 3：始终关闭 channel / 正确地清空它们

```go
// ✅ close channels so range loops terminate
go func() {
    defer close(results)    // signal completion → for-range over results ends
    for _, job := range jobs { results <- process(job) }
}()
```

## 检测泄漏

```go
runtime.NumGoroutine()         // monitor the goroutine count — steady growth = leak
import _ "net/http/pprof"      // pprof exposes goroutine stacks (/debug/pprof/goroutine)
// the goleak library asserts no leaked goroutines in tests
```

监控 `runtime.NumGoroutine()`（数量持续上升表示泄漏）、通过 `pprof` 检查 goroutine 转储和在测试中使用 `goleak` 是主要的检测工具。

## 为什么这很重要

Goroutine 泄漏是生产环境中 Go 服务最常见且最隐蔽的问题之一。

因为 goroutine 启动成本极低，开发者会自由地创建它们——但是被永远阻塞的 goroutine（在没有对端的 channel 上，或在没有取消机制的无限循环中）**永远不会终止，也永远不会释放其内存**（包括它引用的所有对象）。

在长运行的服务器中，这些泄漏会无声地积累，逐渐消耗内存，直到服务性能下降或崩溃——它们很难被发现，因为没有立即的错误提示。

理解原因（阻塞的 channel 操作、缺少取消机制）和防止方法（**基于 context 的取消**作为主要防御、正确的 channel 关闭、带缓冲的 channel、确保接收者存在）对于编写可靠的并发 Go 代码至关重要。

掌握检测工具（`runtime.NumGoroutine`、pprof、goleak）同样重要。

这是一个关键的、经常被考察的话题，它区分了能够编写生产级并发 Go 代码的开发者和在持续负载下会出现泄漏的开发者。