Papers
Notes and summaries of research papers.
Attention Residuals
— Kimi Team technique replacing fixed residual weights with learned softmax attention over layer outputs.