Files
orion/docs/operations/capacity-monitoring.md
Samir Boulahtit 25279a03d4 docs: update subscription billing and capacity monitoring documentation
- Expand subscription-billing.md with complete system documentation
  - Add background tasks section with scheduling examples
  - Add capacity forecasting with API examples
  - Document all new API endpoints (trends, recommendations, snapshot)
  - Add CapacitySnapshot model documentation
  - Include infrastructure scaling reference table

- Update capacity-monitoring.md with forecasting features
  - Add subscription capacity tracking section
  - Document growth trends API with example responses
  - Add scaling recommendations with severity levels
  - Include usage examples for capacity planning
  - Add historical data and export options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 20:56:22 +01:00

11 KiB

Capacity Monitoring

Detailed guide for monitoring and managing platform capacity, including growth forecasting and scaling recommendations.

Overview

The Capacity Monitoring system provides insights into resource consumption and helps plan infrastructure scaling. It includes:

  • Real-time metrics: Current resource usage and health status
  • Subscription capacity: Theoretical vs actual capacity based on vendor subscriptions
  • Growth forecasting: Historical trends and future projections
  • Scaling recommendations: Automated advice for infrastructure planning

API Endpoints

All capacity endpoints are under /api/v1/admin/platform-health:

Endpoint Method Description
/health GET Full platform health report
/capacity GET Capacity-focused metrics
/subscription-capacity GET Subscription-based capacity analysis
/trends GET Growth trends over specified period
/recommendations GET Prioritized scaling recommendations
/snapshot POST Manually capture capacity snapshot

Key Metrics

Client Metrics

Metric Description Threshold Indicator
Active Clients Vendors with activity in last 30 days Scale planning
Total Products Sum across all vendors Storage/DB sizing
Products per Client Average products per vendor Tier compliance
Monthly Orders Order volume this month Performance impact

Subscription Capacity

Track theoretical vs actual capacity based on all vendor subscriptions:

# GET /api/v1/admin/platform-health/subscription-capacity
{
    "total_subscriptions": 150,
    "tier_distribution": {
        "essential": 80,
        "professional": 50,
        "business": 18,
        "enterprise": 2
    },
    "products": {
        "actual": 125000,
        "theoretical_limit": 500000,
        "utilization_percent": 25.0,
        "headroom": 375000
    },
    "orders_monthly": {
        "actual": 45000,
        "theoretical_limit": 300000,
        "utilization_percent": 15.0,
        "headroom": 255000
    },
    "team_members": {
        "actual": 320,
        "theoretical_limit": 1500,
        "utilization_percent": 21.3,
        "headroom": 1180
    }
}

Storage Metrics

Metric Description Warning Critical
Image Files Total files in storage 80% of limit 95% of limit
Image Storage (GB) Total size in gigabytes 80% of disk 95% of disk
Database Size (GB) PostgreSQL data size 80% of allocation 95% of allocation
Backup Size (GB) Latest backup size Informational N/A

Performance Metrics

Metric Good Warning Critical
Avg Response Time < 100ms 100-300ms > 300ms
DB Query Time (p95) < 50ms 50-200ms > 200ms
Cache Hit Rate > 90% 70-90% < 70%
Connection Pool Usage < 70% 70-90% > 90%

Growth Forecasting

Capacity Snapshots

Daily snapshots are captured automatically by the capture_capacity_snapshot background task:

# Captured daily at midnight
class CapacitySnapshot:
    snapshot_date: datetime

    # Vendor metrics
    total_vendors: int
    active_vendors: int
    trial_vendors: int

    # Subscription metrics
    total_subscriptions: int
    active_subscriptions: int

    # Resource metrics
    total_products: int
    total_orders_month: int
    total_team_members: int

    # Storage metrics
    storage_used_gb: Decimal
    db_size_mb: Decimal

    # Capacity metrics
    theoretical_products_limit: int
    theoretical_orders_limit: int
    theoretical_team_limit: int

    # Tier distribution
    tier_distribution: dict

Analyze growth over any period:

# GET /api/v1/admin/platform-health/trends?days=30
{
    "period_days": 30,
    "snapshots_available": 30,
    "start_date": "2025-11-26",
    "end_date": "2025-12-26",
    "trends": {
        "vendors": {
            "start_value": 140,
            "current_value": 150,
            "change": 10,
            "growth_rate_percent": 7.14,
            "daily_growth_rate": 0.238,
            "monthly_projection": 161
        },
        "products": {
            "start_value": 115000,
            "current_value": 125000,
            "change": 10000,
            "growth_rate_percent": 8.7,
            "daily_growth_rate": 0.29,
            "monthly_projection": 136000
        },
        "orders": {
            "start_value": 40000,
            "current_value": 45000,
            "change": 5000,
            "growth_rate_percent": 12.5,
            "monthly_projection": 51000
        },
        "team_members": {...},
        "storage_gb": {
            "start_value": 150.5,
            "current_value": 165.2,
            "change": 14.7
        }
    }
}

Days Until Threshold

Calculate when a metric will reach a specific threshold:

# Service method
days = capacity_forecast_service.get_days_until_threshold(
    db,
    metric="total_products",
    threshold=500000
)
# Returns: 120 (days until products reach 500K)

Scaling Recommendations

The system generates automated recommendations based on current capacity and growth:

# GET /api/v1/admin/platform-health/recommendations
[
    {
        "category": "capacity",
        "severity": "warning",
        "title": "Product capacity approaching limit",
        "description": "Currently at 85% of theoretical product capacity",
        "action": "Consider upgrading vendor tiers or adding capacity"
    },
    {
        "category": "infrastructure",
        "severity": "info",
        "title": "Current tier: Medium",
        "description": "Next upgrade trigger: 300 vendors",
        "action": "Monitor growth and plan for infrastructure scaling"
    },
    {
        "category": "growth",
        "severity": "info",
        "title": "High vendor growth rate",
        "description": "Vendor base growing at 15.2% over last 30 days",
        "action": "Ensure infrastructure can scale to meet demand"
    },
    {
        "category": "storage",
        "severity": "warning",
        "title": "Storage usage high",
        "description": "Image storage at 850 GB",
        "action": "Plan for storage expansion or implement cleanup policies"
    }
]

Severity Levels

Severity Description Action Required
critical Immediate action needed Within 24 hours
warning Plan action soon Within 1-2 weeks
info Informational Monitor and plan

Threshold Configuration

Edit thresholds in the admin settings or via environment:

# Capacity thresholds (can be configured per deployment)
CAPACITY_THRESHOLDS = {
    # Products
    "products_total": {
        "warning": 400_000,
        "critical": 475_000,
        "limit": 500_000,
    },
    # Storage (GB)
    "storage_gb": {
        "warning": 800,
        "critical": 950,
        "limit": 1000,
    },
    # Database (GB)
    "db_size_gb": {
        "warning": 20,
        "critical": 24,
        "limit": 25,
    },
    # Monthly orders
    "monthly_orders": {
        "warning": 250_000,
        "critical": 280_000,
        "limit": 300_000,
    },
}

Infrastructure Scaling Reference

Clients vCPU RAM Storage Database Monthly Cost
1-50 2 4GB 100GB SQLite €30
50-100 4 8GB 250GB PostgreSQL €80
100-300 4 16GB 500GB PostgreSQL €150
300-500 8 32GB 1TB PostgreSQL + Redis €350
500-1000 16 64GB 2TB PostgreSQL + Redis €700
1000+ 32+ 128GB+ 4TB+ PostgreSQL cluster €1,500+

Background Tasks

Capacity Snapshot Task

# app/tasks/subscription_tasks.py

async def capture_capacity_snapshot():
    """
    Capture a daily snapshot of platform capacity metrics.
    Should run daily at midnight.
    """
    from app.services.capacity_forecast_service import capacity_forecast_service

    db = SessionLocal()
    try:
        snapshot = capacity_forecast_service.capture_daily_snapshot(db)
        db.commit()
        return {
            "snapshot_id": snapshot.id,
            "snapshot_date": snapshot.snapshot_date.isoformat(),
            "total_vendors": snapshot.total_vendors,
            "total_products": snapshot.total_products,
        }
    finally:
        db.close()

Manual Snapshot

Capture a snapshot on demand:

# Via API
curl -X POST /api/v1/admin/platform-health/snapshot \
  -H "Authorization: Bearer $TOKEN"

# Response
{
    "id": 42,
    "snapshot_date": "2025-12-26T00:00:00Z",
    "total_vendors": 150,
    "total_products": 125000,
    "message": "Snapshot captured successfully"
}

Alerts

Capacity alerts trigger when:

  1. Warning (Yellow): 80% of any threshold
  2. Critical (Red): 95% of any threshold
  3. Exceeded: 100%+ of threshold (immediate action)

Historical Data

Use the /trends endpoint with different day ranges:

# Last 7 days
GET /api/v1/admin/platform-health/trends?days=7

# Last 30 days (default)
GET /api/v1/admin/platform-health/trends?days=30

# Last 90 days
GET /api/v1/admin/platform-health/trends?days=90

Data Retention

  • Snapshots are stored indefinitely by default
  • Consider implementing cleanup for snapshots older than 2 years
  • At minimum, keep monthly aggregates for long-term trending

Export Reports

Generate capacity reports for planning:

  • Weekly summary: PDF or CSV
  • Monthly capacity report: Detailed analysis
  • Projection report: 3/6/12 month forecasts

Usage Examples

Check Current Capacity

from app.services.platform_health_service import platform_health_service
from app.services.capacity_forecast_service import capacity_forecast_service

# Get subscription capacity
capacity = platform_health_service.get_subscription_capacity(db)
print(f"Products: {capacity['products']['actual']} / {capacity['products']['theoretical_limit']}")
print(f"Utilization: {capacity['products']['utilization_percent']}%")

# Get growth trends
trends = capacity_forecast_service.get_growth_trends(db, days=30)
print(f"Vendor growth: {trends['trends']['vendors']['growth_rate_percent']}%")

# Get recommendations
recommendations = capacity_forecast_service.get_scaling_recommendations(db)
for rec in recommendations:
    print(f"[{rec['severity']}] {rec['title']}: {rec['action']}")

Project Future Capacity

# Calculate days until product limit
days = capacity_forecast_service.get_days_until_threshold(
    db,
    metric="total_products",
    threshold=500000
)
if days:
    print(f"Products will reach 500K in approximately {days} days")
else:
    print("Insufficient data or no growth detected")