====== Pipeline Communication ====== The overhead of passing activations between pipeline stages in pipeline-parallel training. [[concepts:block_attnres|Block AttnRes]] reduces this by aggregating representations at block boundaries rather than every layer, cutting communication volume proportionally. See also: [[concepts:block_attnres]], [[papers:attention_residuals]]