Pipeline Communication

The overhead of passing activations between pipeline stages in pipeline-parallel training. Block AttnRes reduces this by aggregating representations at block boundaries rather than every layer, cutting communication volume proportionally.

See also: block_attnres, attention_residuals