什么是生成器和 yield 关键字？

Question

什么是生成器和 yield 关键字？

Accepted Answer

**生成器**是一个函数，它使用 `yield` 关键字而不是 `return` 来**缓慢地、一次一个**地生成值。它根据需要计算每个值，而不是在内存中构建整个结果——非常适合大型或无限序列。

## yield vs return

```python
# a regular function builds the WHOLE list in memory
def get_squares(n):
    return [i ** 2 for i in range(n)]   # all n values created at once

# a generator yields values ONE AT A TIME, on demand
def gen_squares(n):
    for i in range(n):
        yield i ** 2                     # pause here, return one value, resume on next()

for sq in gen_squares(1_000_000):        # uses almost NO memory
    print(sq)
```

`yield` **暂停**函数，返回一个值，并在下一次迭代时从**那个确切的点恢复** ——保留值之间的本地状态。在请求之前不计算任何内容。

## 内存优势

```python
sum(i ** 2 for i in range(10_000_000))   # generator expression — constant memory
sum([i ** 2 for i in range(10_000_000)]) # list — builds a huge list first (wasteful)
```

对于大型数据，生成器保持内存平坦；列表推导式一次性实现所有内容。

## 无限序列（列表无法实现）

```python
def count_up():
    n = 0
    while True:        # infinite — but lazy, so it's fine
        yield n
        n += 1

gen = count_up()
next(gen)   # 0
next(gen)   # 1  — only computes when asked
```

由于值是按需生成的，生成器可以表示*无限*流——你只需取你需要的。

## 迭代在幕后的工作原理

```python
gen = gen_squares(3)
next(gen)   # 0  — runs until the first yield
next(gen)   # 1  — resumes, runs to the next yield
next(gen)   # 4
next(gen)   # StopIteration — generator exhausted
```

## 生成器表达式

```python
(x ** 2 for x in range(10))   # like a list comprehension but lazy (parentheses)
```

## 为什么这很重要

生成器对于大型或流式数据的**内存高效**处理至关重要——逐行读取巨大文件、处理大型数据集、管道或无限序列——其中构建完整列表会耗尽内存。

理解 `yield`（暂停/恢复并保留状态）和延迟求值能让你编写可扩展的数据处理代码，生成器表达式也提供了同样的好处，但更简洁。

当数据太大无法放入内存，或者你想按需处理项目而不是预先处理所有项目时，它们是关键工具。