orion/docs/development/database-migrations.md

# Database Migrations Guide

This guide covers advanced database migration workflows for developers working on schema changes.

## Overview

Our project uses Alembic for database migrations. All schema changes must go through the migration system to ensure:
- Reproducible deployments
- Team synchronization
- Production safety
- Rollback capability

## Migration Commands Reference

### Creating Migrations
```bash
# Auto-generate migration from model changes
make migrate-create message="add_user_profile_table"

# Create empty migration template for manual changes
make migrate-create-manual message="add_custom_indexes"
```

### Applying Migrations
```bash
# Apply all pending migrations
make migrate-up

# Rollback last migration
make migrate-down

# Rollback to specific revision
make migrate-down-to revision="abc123"
```

### Migration Status
```bash
# Show current migration status
make migrate-status

# Show detailed migration history
alembic history --verbose

# Show specific migration details
make migrate-show revision="abc123"
```

### Backup and Safety
```bash
# Create database backup before major changes
make backup-db

# Verify database setup
make verify-setup
```

## Development Workflows

### Adding New Database Fields

1. **Modify your SQLAlchemy model**:
   ```python
   # In models/database/user.py
   class User(Base):
       # ... existing fields
       profile_image = Column(String, nullable=True)  # NEW FIELD
   ```

2. **Generate migration**:
   ```bash
   make migrate-create message="add_profile_image_to_users"
   ```

3. **Review generated migration**:
   ```python
   # Check alembic/versions/xxx_add_profile_image_to_users.py
   def upgrade() -> None:
       op.add_column('users', sa.Column('profile_image', sa.String(), nullable=True))

   def downgrade() -> None:
       op.drop_column('users', 'profile_image')
   ```

4. **Apply migration**:
   ```bash
   make migrate-up
   ```

### Adding Database Indexes

1. **Create manual migration**:
   ```bash
   make migrate-create-manual message="add_performance_indexes"
   ```

2. **Edit the migration file**:
   ```python
   def upgrade() -> None:
       # Add indexes for better performance
       op.create_index('idx_products_marketplace_shop', 'products', ['marketplace', 'shop_name'])
       op.create_index('idx_users_email_active', 'users', ['email', 'is_active'])

   def downgrade() -> None:
       op.drop_index('idx_users_email_active', table_name='users')
       op.drop_index('idx_products_marketplace_shop', table_name='products')
   ```

3. **Apply migration**:
   ```bash
   make migrate-up
   ```

### Complex Schema Changes

For complex changes that require data transformation:

1. **Create migration with data handling**:
   ```python
   def upgrade() -> None:
       # Create new column
       op.add_column('products', sa.Column('normalized_price', sa.Numeric(10, 2)))

       # Migrate data
       connection = op.get_bind()
       connection.execute(
           text("UPDATE products SET normalized_price = CAST(price AS NUMERIC) WHERE price ~ '^[0-9.]+$'")
       )

       # Make column non-nullable after data migration
       op.alter_column('products', 'normalized_price', nullable=False)

   def downgrade() -> None:
       op.drop_column('products', 'normalized_price')
   ```

## Production Deployment

### Pre-Deployment Checklist
- [ ] All migrations tested locally
- [ ] Database backup created
- [ ] Migration rollback plan prepared
- [ ] Team notified of schema changes

### Deployment Process
```bash
# 1. Pre-deployment checks
make pre-deploy-check

# 2. Backup production database
make backup-db

# 3. Deploy with migrations
make deploy-prod  # This includes migrate-up
```

### Rollback Process
```bash
# If deployment fails, rollback
make rollback-prod  # This includes migrate-down
```

## Best Practices

### Migration Naming
Use clear, descriptive names:
```bash
# Good examples
make migrate-create message="add_user_profile_table"
make migrate-create message="remove_deprecated_product_fields"
make migrate-create message="add_indexes_for_search_performance"

# Avoid vague names
make migrate-create message="update_database"  # Too vague
make migrate-create message="fix_stuff"        # Not descriptive
```

### Safe Schema Changes

**Always Safe**:
- Adding nullable columns
- Adding indexes
- Adding new tables
- Increasing column size (varchar(50) → varchar(100))

**Potentially Unsafe** (require careful planning):
- Dropping columns
- Changing column types
- Adding non-nullable columns without defaults
- Renaming tables or columns

**Multi-Step Process for Unsafe Changes**:
```python
# Step 1: Add new column
def upgrade() -> None:
    op.add_column('users', sa.Column('email_new', sa.String(255)))

# Step 2: Migrate data (separate migration)
def upgrade() -> None:
    connection = op.get_bind()
    connection.execute(text("UPDATE users SET email_new = email"))

# Step 3: Switch columns (separate migration)
def upgrade() -> None:
    op.drop_column('users', 'email')
    op.alter_column('users', 'email_new', new_column_name='email')
```

### Testing Migrations

1. **Test on copy of production data**:
   ```bash
   # Restore production backup to test database
   # Run migrations on test database
   # Verify data integrity
   ```

2. **Test rollback process**:
   ```bash
   make migrate-up     # Apply migration
   # Test application functionality
   make migrate-down   # Test rollback
   # Verify rollback worked correctly
   ```

## Advanced Features

### Environment-Specific Migrations

Use migration context to handle different environments:
```python
from alembic import context

def upgrade() -> None:
    # Only add sample data in development
    if context.get_x_argument(as_dictionary=True).get('dev_data', False):
        # Add development sample data
        pass

    # Always apply schema changes
    op.create_table(...)
```

Run with environment flag:
```bash
alembic upgrade head -x dev_data=true
```

### Data Migrations

For large data transformations, use batch processing:
```python
def upgrade() -> None:
    connection = op.get_bind()

    # Process in batches to avoid memory issues
    batch_size = 1000
    offset = 0

    while True:
        result = connection.execute(
            text(f"SELECT id, old_field FROM products LIMIT {batch_size} OFFSET {offset}")
        )
        rows = result.fetchall()

        if not rows:
            break

        for row in rows:
            # Transform data
            new_value = transform_function(row.old_field)
            connection.execute(
                text("UPDATE products SET new_field = :new_val WHERE id = :id"),
                {"new_val": new_value, "id": row.id}
            )

        offset += batch_size
```

## Troubleshooting

### Common Issues

**Migration conflicts**:
```bash
# When multiple developers create migrations simultaneously
# Resolve by creating a merge migration
alembic merge -m "merge migrations" head1 head2
```

**Failed migration**:
```bash
# Check current state
make migrate-status

# Manually fix database if needed
# Then mark migration as applied
alembic stamp head
```

**Out of sync database**:
```bash
# Reset to known good state
make backup-db
alembic downgrade base
make migrate-up
```

### Recovery Procedures

1. **Database corruption**: Restore from backup, replay migrations
2. **Failed deployment**: Use rollback process, investigate issue
3. **Development issues**: Reset local database, pull latest migrations

## Integration with CI/CD

Our deployment pipeline automatically:
1. Runs migration checks in CI
2. Creates database backups before deployment
3. Applies migrations during deployment
4. Provides rollback capability

Migration failures will halt deployment to prevent data corruption.

## Further Reading

- [Alembic Official Documentation](https://alembic.sqlalchemy.org/)
- [Database Schema Documentation](database-schema.md)
- [Deployment Guide](../deployment/production.md)