Embeddings 是数据(文本、图像等)的数值向量表示,它捕捉语义含义——在向量空间中将相似的项目放在一起。它们是现代AI的基础,使语义搜索、推荐和RAG成为可能。
Embeddings 是什么
EMBEDDING → a VECTOR (list of numbers) representing data (a word, sentence, image, etc.):
→ captures MEANING → semantically similar items have SIMILAR vectors (close in vector space)
→ e.g. 'king' and 'queen' have similar embeddings; 'cat' and 'dog' are closer than
'cat' and 'car'
→ produced by models (embedding models) that learn meaningful representations
→ turns data into numbers that capture semantic meaning (meaning as geometry)
Embeddings 为什么有用
✓ SEMANTIC SIMILARITY → measure how similar items are (vector distance/cosine similarity):
→ find similar/related items by meaning (not just keyword matching)
✓ SEMANTIC SEARCH → search by MEANING (find relevant results even with different words)
✓ RECOMMENDATIONS → find similar items (similar embeddings)
✓ RAG → embed documents + the query → find relevant context for an LLM (retrieval)
✓ CLUSTERING, classification → group/categorize by semantic similarity
→ embeddings enable working with the MEANING of data, not just exact matches
