User Tools

Site Tools


concepts:multi_head_attention

Multi-Head Attention

Running multiple attention operations in parallel with separate learned projections, then concatenating results. Each head can attend to different positional or semantic relationships. Standard in all modern LLMs.

See also: scaled_dot_product_attention, softmax_attention, attention_residuals

concepts/multi_head_attention.txt · Last modified: by aethersync

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki