LLM Notes

LLM 与强化学习学习笔记 - Transformer、RLHF、PPO、DPO 等技术深度解析

Tags

Transformer (8)

Transformer Notes (VIII): Frontier Applications 2025-12-20
Transformer Notes (VII): Deployment Optimization 2025-12-20
Transformer Notes (VI): Evaluation and Benchmarks 2025-12-20
Transformer Notes (V): Training Techniques 2025-12-20
Transformer Notes (IV): Mixture of Experts Architecture 2025-12-20
Transformer Notes (III): Attention Mechanisms 2025-12-20
Transformer Notes (II): Core Components 2025-12-20
Transformer Notes (I): Fundamentals 2025-12-20

RLHF (4)

RL Notes (6): LLM Alignment (Part 2) 2025-12-19
RL Notes (5): LLM Alignment (Part 1) 2025-12-19
LLM-RL Training Stability: Root Cause Analysis and Solutions 2025-12-19
Why is LoRA Effective in RL Fine-tuning? An Information Bandwidth Perspective 2025-12-19

Inference (3)

Reasoning (2)

Train Long, Think Short: A Survey on LLM Reasoning Length Control 2025-12-31
Transformer Notes (VIII): Frontier Applications 2025-12-20

PPO (2)

RL Notes (3): Policy-Based RL 2025-12-19
LLM-RL Training Stability: Root Cause Analysis and Solutions 2025-12-19

LLM (2)

GRPO (2)

Efficiency (2)

Alignment (2)

RL Notes (6): LLM Alignment (Part 2) 2025-12-19
RL Notes (5): LLM Alignment (Part 1) 2025-12-19

Training (1)

Transformer Notes (V): Training Techniques 2025-12-20

Speculative Decoding (1)

Speculative Decoding: A Complete Guide to Principles, Methods, and Speedup Analysis 2026-01-05

Reproducibility (1)

Nondeterminism in LLM Inference: Root Cause Analysis and Batch Invariance Solutions 2025-12-24

RLVR (1)

Entropy Control in LLM-RL: A Systematic Survey from Entropy Collapse to Exploration 2025-12-23

Negative Samples (1)

Entropy Control in LLM-RL: A Systematic Survey from Entropy Collapse to Exploration 2025-12-23

Multimodal (1)

Transformer Notes (VIII): Frontier Applications 2025-12-20

MoE (1)

Transformer Notes (IV): Mixture of Experts Architecture 2025-12-20

MCTS (1)

RL Notes (4): Model-Based Methods & MARL 2025-12-19

Evaluation (1)

Transformer Notes (VI): Evaluation and Benchmarks 2025-12-20

Entropy (1)

Entropy Control in LLM-RL: A Systematic Survey from Entropy Collapse to Exploration 2025-12-23

Determinism (1)

Nondeterminism in LLM Inference: Root Cause Analysis and Batch Invariance Solutions 2025-12-24

DQN (1)

RL Notes (2): Value-Based RL 2025-12-19

CUDA (1)

Nondeterminism in LLM Inference: Root Cause Analysis and Batch Invariance Solutions 2025-12-24

Batch Invariance (1)

Nondeterminism in LLM Inference: Root Cause Analysis and Batch Invariance Solutions 2025-12-24

Attention (1)

Transformer Notes (III): Attention Mechanisms 2025-12-20

AlphaZero (1)

RL Notes (4): Model-Based Methods & MARL 2025-12-19

RL (10)

Train Long, Think Short: A Survey on LLM Reasoning Length Control 2025-12-31
Entropy Control in LLM-RL: A Systematic Survey from Entropy Collapse to Exploration 2025-12-23
RL Notes (6): LLM Alignment (Part 2) 2025-12-19
RL Notes (5): LLM Alignment (Part 1) 2025-12-19
RL Notes (4): Model-Based Methods & MARL 2025-12-19
RL Notes (3): Policy-Based RL 2025-12-19
RL Notes (2): Value-Based RL 2025-12-19
RL Notes (1): Fundamentals 2025-12-19
LLM-RL Training Stability: Root Cause Analysis and Solutions 2025-12-19
Why is LoRA Effective in RL Fine-tuning? An Information Bandwidth Perspective 2025-12-19