====== Feed-Forward Network (FFN) ======

The position-wise MLP applied after each attention layer in a [[concepts:transformer|Transformer]] block. Typically two linear transformations with a nonlinearity: FFN(x) = W2 · act(W1 · x). In [[concepts:moe|MoE]] architectures, the FFN is replaced by multiple expert FFNs with [[concepts:expert_routing|expert routing]].

See also: [[concepts:transformer]], [[concepts:moe]], [[concepts:residual_connections]]