Contributing Guide¶
We welcome contributions to genrec! This guide will help you get started.
Development Setup¶
Prerequisites¶
- Python 3.8 or higher
- PyTorch >= 1.11.0
- Git
Installation¶
- Fork the repository on GitHub
-
Clone your fork:
-
Create a virtual environment:
-
Install in development mode:
Development Workflow¶
Code Style¶
We follow PEP 8 style guidelines. Please run the following before submitting:
# Format code
black genrec/ tests/
isort genrec/ tests/
# Check style
flake8 genrec/ tests/
mypy genrec/
Testing¶
Run tests before submitting your changes:
# Run all tests
pytest
# Run with coverage
pytest --cov=genrec
# Run specific test
pytest tests/test_datasets.py::test_p5_amazon_dataset
Documentation¶
Update documentation when adding new features:
Contributing Guidelines¶
Issues¶
- Search existing issues before creating new ones
- Use clear, descriptive titles
- Provide steps to reproduce for bugs
- Include system information (OS, Python version, etc.)
Pull Requests¶
-
Create a feature branch from
main: -
Make your changes and commit:
-
Push to your fork:
-
Create a Pull Request on GitHub
Commit Message Format¶
Use clear, descriptive commit messages:
- Add: New features or functionality
- Fix: Bug fixes
- Update: Changes to existing functionality
- Docs: Documentation changes
- Test: Adding or updating tests
- Refactor: Code refactoring without functional changes
Examples:
Add: TIGER model with transformer architecture
Fix: P5Amazon dataset loading for large categories
Update: configuration system to use dataclasses
Docs: API reference for dataset factory
Test: unit tests for text processors
Types of Contributions¶
Bug Fixes¶
- Fix issues reported in GitHub Issues
- Include test cases that reproduce the bug
- Update documentation if needed
New Features¶
Before implementing major features: 1. Create an issue to discuss the feature 2. Get feedback from maintainers 3. Follow the existing architecture patterns
Documentation¶
- Fix typos and improve clarity
- Add examples and tutorials
- Translate documentation (Chinese/English)
- Improve API documentation
Performance Improvements¶
- Profile code to identify bottlenecks
- Include benchmarks showing improvements
- Ensure changes don't break existing functionality
Code Architecture¶
Adding New Datasets¶
To add a new dataset, follow these steps:
-
Create the base dataset class:
-
Create wrapper classes:
-
Add configuration:
-
Register the dataset:
-
Add tests and documentation
For more details, please refer to the API Documentation.
Adding New Models¶
-
Inherit from base classes:
import torch.nn as nn class MyModel(nn.Module): def __init__( self, input_dim: int, hidden_dim: int, output_dim: int, dropout: float = 0.1, ) -> None: super().__init__() self.input_dim = input_dim self.hidden_dim = hidden_dim self.output_dim = output_dim # Define layers self.encoder = nn.Linear(input_dim, hidden_dim) self.decoder = nn.Linear(hidden_dim, output_dim) self.dropout = nn.Dropout(dropout) def forward(self, x): # Implement forward pass hidden = self.dropout(torch.relu(self.encoder(x))) output = self.decoder(hidden) return output -
Add to Gin configuration system:
-
Create training utilities:
-
Add comprehensive tests
- Update documentation
Testing Guidelines¶
Unit Tests¶
- Test individual functions and classes
- Use pytest fixtures for setup
- Mock external dependencies
- Aim for >90% code coverage
import pytest
from genrec.data import P5AmazonDataset
def test_p5_amazon_dataset_creation():
config = P5AmazonConfig(
root_dir="test_data",
category="beauty"
)
dataset = P5AmazonDataset(config)
assert dataset.category == "beauty"
Integration Tests¶
- Test component interactions
- Use sample datasets
- Test end-to-end workflows
Performance Tests¶
- Benchmark critical operations
- Test with realistic data sizes
- Monitor memory usage
Documentation Standards¶
Docstring Format¶
Use Google-style docstrings:
def process_data(data: pd.DataFrame, normalize: bool = True) -> pd.DataFrame:
"""Process input data with optional normalization.
Args:
data: Input DataFrame to process
normalize: Whether to normalize numerical features
Returns:
Processed DataFrame
Raises:
ValueError: If data is empty
Example:
>>> df = pd.DataFrame({'col1': [1, 2, 3]})
>>> result = process_data(df, normalize=True)
"""
API Documentation¶
- Document all public methods and classes
- Include usage examples
- Explain parameters and return values
- Add type hints
Tutorials and Guides¶
- Provide step-by-step instructions
- Include complete working examples
- Explain the reasoning behind design decisions
- Keep examples up-to-date with API changes
Release Process¶
Version Numbering¶
We follow Semantic Versioning (SemVer): - MAJOR: Breaking changes - MINOR: New features, backward compatible - PATCH: Bug fixes, backward compatible
Changelog¶
Update CHANGELOG.md with: - New features - Bug fixes - Breaking changes - Deprecations
Getting Help¶
- Join our discussions on GitHub
- Ask questions in Issues
- Check existing documentation
- Review code examples
Code of Conduct¶
- Be respectful and inclusive
- Focus on constructive feedback
- Help maintain a welcoming community
- Follow GitHub's Community Guidelines
Thank you for contributing to genrec!