The overhead of passing activations between pipeline stages in pipeline-parallel training. Block AttnRes reduces this by aggregating representations at block boundaries rather than every layer, cutting communication volume proportionally.
See also: block_attnres, attention_residuals