Tags
RL (9)
- Entropy Control in LLM-RL: A Systematic Survey from Entropy Collapse to Exploration 2025-12-23
- RL Notes (6): LLM Alignment (Part 2) 2025-12-19
- RL Notes (5): LLM Alignment (Part 1) 2025-12-19
- RL Notes (4): Model-Based Methods & MARL 2025-12-19
- RL Notes (3): Policy-Based RL 2025-12-19
- RL Notes (2): Value-Based RL 2025-12-19
- RL Notes (1): Fundamentals 2025-12-19
- LLM-RL Training Stability: Root Cause Analysis and Solutions 2025-12-19
- Why is LoRA Effective in RL Fine-tuning? An Information Bandwidth Perspective 2025-12-19
Transformer (8)
- Transformer Notes (VIII): Frontier Applications 2025-12-20
- Transformer Notes (VII): Deployment Optimization 2025-12-20
- Transformer Notes (VI): Evaluation and Benchmarks 2025-12-20
- Transformer Notes (V): Training Techniques 2025-12-20
- Transformer Notes (IV): Mixture of Experts Architecture 2025-12-20
- Transformer Notes (III): Attention Mechanisms 2025-12-20
- Transformer Notes (II): Core Components 2025-12-20
- Transformer Notes (I): Fundamentals 2025-12-20
RLHF (4)
- RL Notes (6): LLM Alignment (Part 2) 2025-12-19
- RL Notes (5): LLM Alignment (Part 1) 2025-12-19
- LLM-RL Training Stability: Root Cause Analysis and Solutions 2025-12-19
- Why is LoRA Effective in RL Fine-tuning? An Information Bandwidth Perspective 2025-12-19
PPO (2)
- RL Notes (3): Policy-Based RL 2025-12-19
- LLM-RL Training Stability: Root Cause Analysis and Solutions 2025-12-19
Inference (2)
- Nondeterminism in LLM Inference: Root Cause Analysis and Batch Invariance Solutions 2025-12-24
- Transformer Notes (VII): Deployment Optimization 2025-12-20
Alignment (2)
- RL Notes (6): LLM Alignment (Part 2) 2025-12-19
- RL Notes (5): LLM Alignment (Part 1) 2025-12-19
Training (1)
- Transformer Notes (V): Training Techniques 2025-12-20
Reproducibility (1)
Reasoning (1)
RLVR (1)
Negative Samples (1)
Multimodal (1)
MoE (1)
MCTS (1)
- RL Notes (4): Model-Based Methods & MARL 2025-12-19
LLM (1)
GRPO (1)
Evaluation (1)
Entropy (1)
Determinism (1)
DQN (1)
- RL Notes (2): Value-Based RL 2025-12-19
CUDA (1)
Batch Invariance (1)
Attention (1)
AlphaZero (1)
- RL Notes (4): Model-Based Methods & MARL 2025-12-19