User Tools

Site Tools


concepts:softmax_attention

Softmax Attention

The standard attention mechanism: computes dot-product similarity between a query and keys, applies softmax to produce a probability distribution, then takes a weighted sum of values. In Attention Residuals, softmax attention is repurposed for depth-wise aggregation across layer outputs instead of sequence positions.

See also: scaled_dot_product_attention, multi_head_attention, attention_residuals

concepts/softmax_attention.txt · Last modified: by aethersync

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki