The dominant architecture for LLMs, built from alternating self-attention and feed-forward layers with residual connections and layer normalization. The original design used PostNorm; modern variants use PreNorm. Attention Residuals modifies how the residual stream accumulates across layers.
See also: softmax_attention, multi_head_attention, prenorm, residual_connections