====== Block AttnRes ====== A practical variant of [[concepts:softmax_attention|Attention Residuals]] that partitions layers into N blocks. Within each block, standard residuals are used. At block boundaries, an AttnRes operation aggregates block-level representations, reducing memory from O(Ld) to O(Nd) while preserving most gains. See also: [[papers:attention_residuals]], [[concepts:residual_connections]], [[concepts:pipeline_communication]]