Tags
Transformer (8)
- Transformer Notes (VIII): Frontier Applications 2025-12-20
- Transformer Notes (VII): Deployment Optimization 2025-12-20
- Transformer Notes (VI): Evaluation and Benchmarks 2025-12-20
- Transformer Notes (V): Training Techniques 2025-12-20
- Transformer Notes (IV): Mixture of Experts Architecture 2025-12-20
- Transformer Notes (III): Attention Mechanisms 2025-12-20
- Transformer Notes (II): Core Components 2025-12-20
- Transformer Notes (I): Fundamentals 2025-12-20
RLHF (4)
- RL Notes (6): LLM Alignment (Part 2) 2025-12-19
- RL Notes (5): LLM Alignment (Part 1) 2025-12-19
- LLM-RL Training Stability: Root Cause Analysis and Solutions 2025-12-19
- Why is LoRA Effective in RL Fine-tuning? An Information Bandwidth Perspective 2025-12-19
Inference (3)
- Speculative Decoding: A Complete Guide to Principles, Methods, and Speedup Analysis 2026-01-05
- Nondeterminism in LLM Inference: Root Cause Analysis and Batch Invariance Solutions 2025-12-24
- Transformer Notes (VII): Deployment Optimization 2025-12-20
Reasoning (2)
- Train Long, Think Short: A Survey on LLM Reasoning Length Control 2025-12-31
- Transformer Notes (VIII): Frontier Applications 2025-12-20
PPO (2)
- RL Notes (3): Policy-Based RL 2025-12-19
- LLM-RL Training Stability: Root Cause Analysis and Solutions 2025-12-19
LLM (2)
- Speculative Decoding: A Complete Guide to Principles, Methods, and Speedup Analysis 2026-01-05
- Nondeterminism in LLM Inference: Root Cause Analysis and Batch Invariance Solutions 2025-12-24
GRPO (2)
- Train Long, Think Short: A Survey on LLM Reasoning Length Control 2025-12-31
- Entropy Control in LLM-RL: A Systematic Survey from Entropy Collapse to Exploration 2025-12-23
Efficiency (2)
- Speculative Decoding: A Complete Guide to Principles, Methods, and Speedup Analysis 2026-01-05
- Train Long, Think Short: A Survey on LLM Reasoning Length Control 2025-12-31
Alignment (2)
- RL Notes (6): LLM Alignment (Part 2) 2025-12-19
- RL Notes (5): LLM Alignment (Part 1) 2025-12-19
Training (1)
- Transformer Notes (V): Training Techniques 2025-12-20
Speculative Decoding (1)
Reproducibility (1)
RLVR (1)
Negative Samples (1)
Multimodal (1)
MoE (1)
MCTS (1)
- RL Notes (4): Model-Based Methods & MARL 2025-12-19
Evaluation (1)
Entropy (1)
Determinism (1)
DQN (1)
- RL Notes (2): Value-Based RL 2025-12-19
CUDA (1)
Batch Invariance (1)
Attention (1)
AlphaZero (1)
- RL Notes (4): Model-Based Methods & MARL 2025-12-19
RL (10)
- Train Long, Think Short: A Survey on LLM Reasoning Length Control 2025-12-31
- Entropy Control in LLM-RL: A Systematic Survey from Entropy Collapse to Exploration 2025-12-23
- RL Notes (6): LLM Alignment (Part 2) 2025-12-19
- RL Notes (5): LLM Alignment (Part 1) 2025-12-19
- RL Notes (4): Model-Based Methods & MARL 2025-12-19
- RL Notes (3): Policy-Based RL 2025-12-19
- RL Notes (2): Value-Based RL 2025-12-19
- RL Notes (1): Fundamentals 2025-12-19
- LLM-RL Training Stability: Root Cause Analysis and Solutions 2025-12-19
- Why is LoRA Effective in RL Fine-tuning? An Information Bandwidth Perspective 2025-12-19