LLM Notes

LLM 与强化学习学习笔记 - Transformer、RLHF、PPO、DPO 等技术深度解析

Posts

Speculative Decoding: A Complete Guide to Principles, Methods, and Speedup Analysis 2026-01-05
Train Long, Think Short: A Survey on LLM Reasoning Length Control 2025-12-31
Nondeterminism in LLM Inference: Root Cause Analysis and Batch Invariance Solutions 2025-12-24
Entropy Control in LLM-RL: A Systematic Survey from Entropy Collapse to Exploration 2025-12-23
Transformer Notes (VIII): Frontier Applications 2025-12-20
Transformer Notes (VII): Deployment Optimization 2025-12-20
Transformer Notes (VI): Evaluation and Benchmarks 2025-12-20
Transformer Notes (V): Training Techniques 2025-12-20
Transformer Notes (IV): Mixture of Experts Architecture 2025-12-20
Transformer Notes (III): Attention Mechanisms 2025-12-20
Transformer Notes (II): Core Components 2025-12-20
Transformer Notes (I): Fundamentals 2025-12-20
RL Notes (6): LLM Alignment (Part 2) 2025-12-19
RL Notes (5): LLM Alignment (Part 1) 2025-12-19
RL Notes (4): Model-Based Methods & MARL 2025-12-19
RL Notes (3): Policy-Based RL 2025-12-19
RL Notes (2): Value-Based RL 2025-12-19
RL Notes (1): Fundamentals 2025-12-19
LLM-RL Training Stability: Root Cause Analysis and Solutions 2025-12-19
Why is LoRA Effective in RL Fine-tuning? An Information Bandwidth Perspective 2025-12-19

Tags

Transformer (8) RLHF (4) Inference (3) Reasoning (2) PPO (2) LLM (2) GRPO (2) Efficiency (2) Alignment (2) Training (1) Speculative Decoding (1) Reproducibility (1) RLVR (1) Negative Samples (1) Multimodal (1) MoE (1) MCTS (1) Evaluation (1) Entropy (1) Determinism (1) DQN (1) CUDA (1) Batch Invariance (1) Attention (1) AlphaZero (1) RL (10)