Contributing Guide¶

We welcome contributions to genrec! This guide will help you get started.

Development Setup¶

Prerequisites¶

Python 3.8 or higher
PyTorch >= 1.11.0
Git

Installation¶

Fork the repository on GitHub

Clone your fork:

git clone https://github.com/YOUR_USERNAME/genrec.git
cd genrec

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install in development mode:
```
pip install -e .[dev]
```

Development Workflow¶

Code Style¶

We follow PEP 8 style guidelines. Please run the following before submitting:

# Format code
black genrec/ tests/
isort genrec/ tests/

# Check style
flake8 genrec/ tests/
mypy genrec/

Testing¶

Run tests before submitting your changes:

# Run all tests
pytest

# Run with coverage
pytest --cov=genrec

# Run specific test
pytest tests/test_datasets.py::test_p5_amazon_dataset

Documentation¶

Update documentation when adding new features:

# Build documentation locally
cd docs
mkdocs serve

Contributing Guidelines¶

Issues¶

Search existing issues before creating new ones
Use clear, descriptive titles
Provide steps to reproduce for bugs
Include system information (OS, Python version, etc.)

Pull Requests¶

Create a feature branch from main:

git checkout -b feature/your-feature-name

Make your changes and commit:

git commit -m "Add: brief description of changes"

Push to your fork:

git push origin feature/your-feature-name

Create a Pull Request on GitHub

Commit Message Format¶

Use clear, descriptive commit messages:

Add: New features or functionality
Fix: Bug fixes
Update: Changes to existing functionality
Docs: Documentation changes
Test: Adding or updating tests
Refactor: Code refactoring without functional changes

Examples:

Add: TIGER model with transformer architecture
Fix: P5Amazon dataset loading for large categories
Update: configuration system to use dataclasses
Docs: API reference for dataset factory
Test: unit tests for text processors

Types of Contributions¶

Bug Fixes¶

Fix issues reported in GitHub Issues
Include test cases that reproduce the bug
Update documentation if needed

New Features¶

Before implementing major features: 1. Create an issue to discuss the feature 2. Get feedback from maintainers 3. Follow the existing architecture patterns

Documentation¶

Fix typos and improve clarity
Add examples and tutorials
Translate documentation (Chinese/English)
Improve API documentation

Performance Improvements¶

Profile code to identify bottlenecks
Include benchmarks showing improvements
Ensure changes don't break existing functionality

Code Architecture¶

Adding New Datasets¶

To add a new dataset, follow these steps:

Create the base dataset class:

from genrec.data.base_dataset import BaseRecommenderDataset

class MyDataset(BaseRecommenderDataset):
    def download(self):
        # Implement download logic
        pass

    def load_raw_data(self):
        # Implement data loading
        pass

    def preprocess_data(self, raw_data):
        # Implement preprocessing
        pass

Create wrapper classes:

from genrec.data.base_dataset import ItemDataset, SequenceDataset

class MyItemDataset(ItemDataset):
    def __init__(self, **kwargs):
        # Initialize with your dataset
        pass

class MySequenceDataset(SequenceDataset):
    def __init__(self, **kwargs):
        # Initialize with your dataset
        pass

Add configuration:

from genrec.data.configs import DatasetConfig

@dataclass
class MyDatasetConfig(DatasetConfig):
    # Add dataset-specific parameters
    special_param: str = "default_value"

Register the dataset:

from genrec.data.dataset_factory import DatasetFactory

DatasetFactory.register_dataset(
    name="my_dataset",
    base_class=MyDataset,
    item_class=MyItemDataset,
    sequence_class=MySequenceDataset
)

Add tests and documentation

For more details, please refer to the API Documentation.

Adding New Models¶

Inherit from base classes:
```
                        
```
name="__codelineno-14-1" href="#__codelineno-14-1">import torch.nn as nn class="k">class MyModel(nn.Module): def __init__( self, input_dim: int, hidden_dim: int, output_dim: int, dropout: float = 0.1, ) -> None: super().__init__() self.input_dim = input_dim self.hidden_dim = hidden_dim self.output_dim = output_dim # Define layers self.encoder = nn.Linear(input_dim, hidden_dim) self.decoder = nn.Linear(hidden_dim, output_dim) self.dropout = nn.Dropout(dropout) def forward(self, x): # Implement forward pass hidden = self.dropout(torch.relu(self.encoder(x))) output = self.decoder(hidden) return output

Add to Gin configuration system:

import gin

@gin.configurable
class MyModel(nn.Module):
    # Implementation

Create training utilities:

from genrec.trainers.base_trainer import BaseTrainer

class MyModelTrainer(BaseTrainer):
    def __init__(self, model, config):
        super().__init__(model, config)

    def training_step(self, batch, batch_idx):
        # Implement training logic
        pass

Add comprehensive tests
Update documentation

Testing Guidelines¶

Unit Tests¶

Test individual functions and classes
Use pytest fixtures for setup
Mock external dependencies
Aim for >90% code coverage

import pytest
from genrec.data import P5AmazonDataset

def test_p5_amazon_dataset_creation():
    config = P5AmazonConfig(
        root_dir="test_data",
        category="beauty"
    )
    dataset = P5AmazonDataset(config)
    assert dataset.category == "beauty"

Integration Tests¶

Test component interactions
Use sample datasets
Test end-to-end workflows

def test_full_training_pipeline():
    # Test complete training workflow
    pass

Performance Tests¶

Benchmark critical operations
Test with realistic data sizes
Monitor memory usage

Documentation Standards¶

Docstring Format¶

Use Google-style docstrings:

def process_data(data: pd.DataFrame, normalize: bool = True) -> pd.DataFrame:
    """Process input data with optional normalization.

    Args:
        data: Input DataFrame to process
        normalize: Whether to normalize numerical features

    Returns:
        Processed DataFrame

    Raises:
        ValueError: If data is empty

    Example:
        >>> df = pd.DataFrame({'col1': [1, 2, 3]})
        >>> result = process_data(df, normalize=True)
    """

API Documentation¶

Document all public methods and classes
Include usage examples
Explain parameters and return values
Add type hints

Tutorials and Guides¶

Provide step-by-step instructions
Include complete working examples
Explain the reasoning behind design decisions
Keep examples up-to-date with API changes

Release Process¶

Version Numbering¶

We follow Semantic Versioning (SemVer): - MAJOR: Breaking changes - MINOR: New features, backward compatible - PATCH: Bug fixes, backward compatible

Changelog¶

Update CHANGELOG.md with: - New features - Bug fixes - Breaking changes - Deprecations

Getting Help¶

Join our discussions on GitHub
Ask questions in Issues
Check existing documentation
Review code examples

Code of Conduct¶

Be respectful and inclusive
Focus on constructive feedback
Help maintain a welcoming community
Follow GitHub's Community Guidelines

Thank you for contributing to genrec!