====== Concepts ====== Definitions and explanations of machine learning terms, organized as an interconnected graph. ===== Architecture ====== * [[concepts:llm|LLM]] * [[concepts:softmax_attention|Softmax Attention]] * [[concepts:scaled_dot_product_attention|Scaled Dot-Product Attention]] * [[concepts:multi_head_attention|Multi-Head Attention]] * [[concepts:residual_connections|Residual Connections]] * [[concepts:gradient_highway|Gradient Highway]] * [[concepts:moe|Mixture of Experts]] * [[concepts:expert_routing|Expert Routing]] * [[concepts:linear_attention|Linear Attention]] * [[concepts:rnn|RNN]] ===== Normalization & Stability ====== * [[concepts:layer_normalization|Layer Normalization]] * [[concepts:prenorm|PreNorm]] * [[concepts:postnorm|PostNorm]] * [[concepts:hidden_state_growth|Hidden-State Growth]] * [[concepts:vanishing_gradients|Vanishing Gradients]] ===== Efficiency & Pruning ====== * [[concepts:model_pruning|Model Pruning]] * [[concepts:layer_pruning|Layer Pruning]] * [[concepts:block_attnres|Block AttnRes]] * [[concepts:pipeline_communication|Pipeline Communication]] ===== Scaling ====== * [[concepts:scaling_laws|Scaling Laws]] * [[concepts:chinchilla_scaling|Chinchilla Scaling]] * [[concepts:neural_scaling|Neural Scaling]] * [[concepts:kimi_linear|Kimi Linear]]