concepts:block_attnres
Block AttnRes
A practical variant of Attention Residuals that partitions layers into N blocks. Within each block, standard residuals are used. At block boundaries, an AttnRes operation aggregates block-level representations, reducing memory from O(Ld) to O(Nd) while preserving most gains.
See also: attention_residuals, residual_connections, pipeline_communication
concepts/block_attnres.txt · Last modified: by aethersync
