MongoDB 中有哪些常见的数据建模模式？

Question

Accepted Answer

除了基本的 embed-vs-reference 决策之外，MongoDB 有一些既定的 **data modeling patterns** — 针对大型数组、多态数据和关系等常见场景的成熟解决方案。了解它们可以帮助您为真实情况设计有效的 schema。

## Bucket pattern — 分组时间序列/相关数据

```js
// instead of one document per reading (millions of tiny docs), BUCKET them:
{
  sensorId: "s1",
  date: ISODate("2024-01-15"),
  readings: [                            // a day's readings in one document
    { time: "10:00", value: 23 },
    { time: "10:01", value: 24 }
  ]
}
// → fewer documents, efficient for time-series/IoT data
```

## Subset pattern — 嵌入经常使用的子集

```js
// embed the most-used part, reference the rest (for large related data)
{
  _id: ObjectId("..."),
  title: "Product",
  recentReviews: [ /* 5 most recent — embedded for fast display */ ],
  // full reviews live in a separate collection (referenced)
}
// → fast common reads without loading ALL related data
```

## Computed pattern — 预计算并存储

```js
// store precomputed values (avoid recomputing on every read)
{ productId: "p1", totalReviews: 1500, avgRating: 4.5 }
// → update these when reviews change; reads are fast (no aggregation needed)
```

## Extended Reference pattern — 嵌入关键引用字段

```js
// embed the OFTEN-NEEDED fields of a referenced document (avoid a join for common reads)
{ orderId: "o1", customer: { _id: "u1", name: "Ann", city: "NY" } }  // enough for display
// → reference customer._id for full data; embed name/city for the common read
```

## 其他模式

```text
Polymorphic    → different document shapes in one collection (with a "type" field)
Schema Versioning → a version field to handle evolving schemas in one collection
Outlier        → handle rare documents that don't fit the common pattern specially
Approximation  → store approximate values (e.g. view counts) to reduce write load
Tree/Hierarchy → patterns for hierarchical data (parent refs, arrays of ancestors, etc.)
```

## 为什么这很重要

理解常见的 MongoDB 数据建模模式对于 **为真实场景设计有效的 schema** 非常有价值，它是建立在基本 embed-vs-reference 决策基础上的实用知识。

虽然 embedding vs referencing 是核心选择，但真实应用程序面临的是重复出现的建模挑战，这些既定的模式可以应对：**Bucket pattern**（将时间序列/物联网读数分组到更少的文档中 — 对常见的时间序列用例很重要，避免产生百万个小文档）、**Subset pattern**（嵌入经常使用的子集同时引用完整数据 — 在读取性能和大型相关数据的文档大小之间取得平衡）、**Computed pattern**（预计算并存储总计/平均值等值以避免读取时的昂贵重新计算）、**Extended Reference pattern**（嵌入被引用文档的常用字段以避免常见读取的联接 — 一种实用的混合方案）和其他模式（用于多样形状的多态、用于演进的 schema 版本控制、用于层级的树模式）。

了解这些模式提供了 **针对常见场景的成熟解决方案**，而不是重新设计，可以帮助您为时间序列数据、大型相关集合、昂贵的计算和分层数据等情况做出明智的建模决策。

由于真实的 MongoDB 应用程序面临这些重复出现的设计挑战，而且既定的模式提供了经过测试、有效的解决方案（改进性能和可维护性），因此理解常见的 MongoDB 数据建模模式 — bucket、subset、computed、extended reference 和其他模式 — 是一种有价值的、实用相关的知识，它将 schema 设计从基本的 embed/reference 选择提升到为真实场景应用成熟解决方案，体现了成熟的 MongoDB 设计技能，并帮助构建针对应用特定访问模式和数据特征而高效运行的 schema。