LLM Notes

LLM 与强化学习学习笔记 - Transformer、RLHF、PPO、DPO 等技术深度解析

Tags

RL (9)

Transformer (8)

RLHF (4)

PPO (2)

Inference (2)

Alignment (2)

Training (1)

Reproducibility (1)

Reasoning (1)

RLVR (1)

Negative Samples (1)

Multimodal (1)

MoE (1)

MCTS (1)

LLM (1)

GRPO (1)

Evaluation (1)

Entropy (1)

Determinism (1)

DQN (1)

CUDA (1)

Batch Invariance (1)

Attention (1)

AlphaZero (1)