LLM Notes

LLM 与强化学习学习笔记 - Transformer、RLHF、PPO、DPO 等技术深度解析

Tags

Transformer (8)

RLHF (4)

Inference (3)

Reasoning (2)

PPO (2)

LLM (2)

GRPO (2)

Efficiency (2)

Alignment (2)

Training (1)

Speculative Decoding (1)

Reproducibility (1)

RLVR (1)

Negative Samples (1)

Multimodal (1)

MoE (1)

MCTS (1)

Evaluation (1)

Entropy (1)

Determinism (1)

DQN (1)

CUDA (1)

Batch Invariance (1)

Attention (1)

AlphaZero (1)

RL (10)