Getting Started¶
This guide will help you quickly get started with the genrec framework.
Prerequisites¶
- Python 3.8 or higher
- CUDA 11.0+ (if using GPU)
- 8GB+ GPU memory (recommended)
Installation¶
1. Clone the Repository¶
2. Install Dependencies¶
3. Prepare Data¶
Download the P5 Amazon dataset:
First Experiment: Training RQVAE¶
1. Check Configuration File¶
Key configuration parameters:
- train.iterations=400000: Number of training iterations
- train.batch_size=64: Batch size
- train.learning_rate=0.0005: Learning rate
- train.dataset_folder="dataset/amazon": Dataset path
2. Start Training¶
During training you'll see: - Automatic data download and preprocessing - Text feature encoding progress - Training loss and metrics - Model checkpoint saving
3. Monitor Training¶
If Weights & Biases logging is enabled:
Visit wandb.ai to view training progress.
Second Experiment: Training TIGER¶
1. Ensure RQVAE is Trained¶
TIGER requires a pre-trained RQVAE model to generate semantic IDs:
2. Configure TIGER¶
Edit config/tiger/p5_amazon.gin:
3. Start Training¶
Understanding the Framework Structure¶
Data Processing Pipeline¶
graph TD
A[Raw Data] --> B[Data Download]
B --> C[Preprocessing]
C --> D[Text Encoding]
D --> E[Sequence Generation]
E --> F[Dataset]
Model Training Flow¶
graph TD
A[Config File] --> B[Dataset Loading]
B --> C[Model Initialization]
C --> D[Training Loop]
D --> E[Evaluation]
E --> F[Checkpoint Saving]
F --> D
Custom Configuration¶
Creating Custom Configuration¶
# my_config.gin
import genrec.data.p5_amazon
import genrec.models.rqvae
# Custom parameters
train.batch_size=32
train.learning_rate=0.001
train.vae_hidden_dims=[256, 128, 64]
# Use custom data path
train.dataset_folder="path/to/my/data"
Using Custom Configuration¶
Model Evaluation¶
RQVAE Evaluation¶
from genrec.models.rqvae import RqVae
from genrec.data.p5_amazon import P5AmazonItemDataset
# Load model
model = RqVae.load_from_checkpoint("path/to/checkpoint.pt")
# Load test data
test_dataset = P5AmazonItemDataset(
root="dataset/amazon",
train_test_split="eval"
)
# Evaluate reconstruction quality
reconstruction_loss = model.evaluate(test_dataset)
TIGER Evaluation¶
from genrec.models.tiger import Tiger
from genrec.modules.metrics import TopKAccumulator
# Load model
model = Tiger.load_from_checkpoint("path/to/checkpoint.pt")
# Calculate Recall@K
metrics = TopKAccumulator(k=10)
recall = metrics.compute_recall(model, test_dataloader)
Common Issues¶
Q: Out of memory?¶
A: Adjust these parameters:
Q: Training too slow?¶
A: Optimization suggestions: - Use larger batch sizes - Enable mixed precision training - Use multi-GPU training
Q: How to add new datasets?¶
A: Refer to the Custom Dataset Guide
Next Steps¶
- Learn about Model Architectures
- Understand Dataset Processing
- Check API Documentation
- Explore Advanced Examples