concepts:feedforward_network
Feed-Forward Network (FFN)
The position-wise MLP applied after each attention layer in a Transformer block. Typically two linear transformations with a nonlinearity: FFN(x) = W2 · act(W1 · x). In MoE architectures, the FFN is replaced by multiple expert FFNs with expert routing.
See also: transformer, moe, residual_connections
concepts/feedforward_network.txt · Last modified: by aethersync
