Transformer 是一种神经网络架构(2017年引入),通过其 attention 机制 彻底改变了人工智能,特别是自然语言处理。它使模型能够有效处理序列,是现代 LLM(GPT、Claude 等)的基础。
什么是 Transformer
TRANSFORMER → a neural network architecture for processing SEQUENCES (text, etc.):
→ introduced in the 2017 paper 'Attention Is All You Need'
→ uses an ATTENTION mechanism (instead of processing strictly sequentially)
→ the foundation of modern LLMs and much of modern AI
→ revolutionized NLP and enabled the LLM era
Attention 机制(关键创新)
ATTENTION → lets the model WEIGH the importance of different parts of the input when
processing each part:
→ for each word, attend to (focus on) the RELEVANT other words → capture context/relationships
→ e.g. understanding what a pronoun refers to, long-range dependencies
→ SELF-ATTENTION → relate each element to all others in the sequence
✓ enables: capturing long-range context, PARALLEL processing (faster training than
sequential RNNs), understanding relationships
→ attention is why transformers handle language so well
