User Tools

Site Tools


concepts:transformer

Transformer

The dominant architecture for LLMs, built from alternating self-attention and feed-forward layers with residual connections and layer normalization. The original design used PostNorm; modern variants use PreNorm. Attention Residuals modifies how the residual stream accumulates across layers.

See also: softmax_attention, multi_head_attention, prenorm, residual_connections

concepts/transformer.txt · Last modified: by aethersync

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki