====== Concepts ======

Definitions and explanations of machine learning terms, organized as an interconnected graph.

===== Architecture ======

  * [[concepts:llm|LLM]]
  * [[concepts:softmax_attention|Softmax Attention]]
  * [[concepts:scaled_dot_product_attention|Scaled Dot-Product Attention]]
  * [[concepts:multi_head_attention|Multi-Head Attention]]
  * [[concepts:residual_connections|Residual Connections]]
  * [[concepts:gradient_highway|Gradient Highway]]
  * [[concepts:moe|Mixture of Experts]]
  * [[concepts:expert_routing|Expert Routing]]
  * [[concepts:linear_attention|Linear Attention]]
  * [[concepts:rnn|RNN]]

===== Normalization & Stability ======

  * [[concepts:layer_normalization|Layer Normalization]]
  * [[concepts:prenorm|PreNorm]]
  * [[concepts:postnorm|PostNorm]]
  * [[concepts:hidden_state_growth|Hidden-State Growth]]
  * [[concepts:vanishing_gradients|Vanishing Gradients]]

===== Efficiency & Pruning ======

  * [[concepts:model_pruning|Model Pruning]]
  * [[concepts:layer_pruning|Layer Pruning]]
  * [[concepts:block_attnres|Block AttnRes]]
  * [[concepts:pipeline_communication|Pipeline Communication]]

===== Scaling ======

  * [[concepts:scaling_laws|Scaling Laws]]
  * [[concepts:chinchilla_scaling|Chinchilla Scaling]]
  * [[concepts:neural_scaling|Neural Scaling]]
  * [[concepts:kimi_linear|Kimi Linear]]