Files

Samir Boulahtit dc7fb5ca19 feat: add capacity planning docs, image upload system, and platform health monitoring

Documentation:
- Add comprehensive capacity planning guide (docs/architecture/capacity-planning.md)
- Add operations docs: platform-health, capacity-monitoring, image-storage
- Link pricing strategy to capacity planning documentation
- Update mkdocs.yml with new Operations section

Image Upload System:
- Add ImageService with WebP conversion and sharded directory structure
- Generate multiple size variants (original, 800px, 200px)
- Add storage stats endpoint for monitoring
- Add Pillow dependency for image processing

Platform Health Monitoring:
- Add /admin/platform-health page with real-time metrics
- Show CPU, memory, disk usage with progress bars
- Display capacity thresholds with status indicators
- Generate scaling recommendations automatically
- Determine infrastructure tier based on usage
- Add psutil dependency for system metrics

Admin UI:
- Add Capacity Monitor to Platform Health section in sidebar
- Create platform-health.html template with stats cards
- Create platform-health.js for Alpine.js state management

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-25 17:17:09 +01:00

2.6 KiB

Raw Blame History

Platform Health Monitoring

This guide covers the platform health monitoring features available in the admin dashboard.

Overview

The Platform Health page (/admin/platform-health) provides real-time visibility into system performance, resource usage, and capacity thresholds.

Accessing Platform Health

Navigate to Admin > Platform Health in the sidebar, or go directly to /admin/platform-health.

Dashboard Sections

1. System Overview

Quick glance at overall platform status:

Indicator	Green	Yellow	Red
API Response Time	< 100ms	100-500ms	> 500ms
Error Rate	< 0.1%	0.1-1%	> 1%
Database Health	Connected	Slow queries	Disconnected
Storage	< 70%	70-85%	> 85%

2. Resource Usage

Real-time metrics:

CPU Usage: Current and 24h average
Memory Usage: Used vs available
Disk Usage: Storage consumption with trend
Network: Inbound/outbound throughput

3. Capacity Metrics

Track growth toward scaling thresholds:

Total Products: Count across all vendors
Total Images: Files stored in image system
Database Size: Current size vs recommended max
Active Clients: Monthly active vendor accounts

4. Performance Trends

Historical charts (7-day, 30-day):

API response times (p50, p95, p99)
Request volume by endpoint
Database query latency
Error rate over time

Alert Configuration

Threshold Alerts

Configure alerts for proactive monitoring:

# In app/core/config.py
HEALTH_THRESHOLDS = {
    "cpu_percent": {"warning": 70, "critical": 85},
    "memory_percent": {"warning": 75, "critical": 90},
    "disk_percent": {"warning": 70, "critical": 85},
    "response_time_ms": {"warning": 200, "critical": 500},
    "error_rate_percent": {"warning": 1.0, "critical": 5.0},
}

Notification Channels

Alerts can be sent via:

Email to admin users
Slack webhook (if configured)
Dashboard notifications

Capacity Monitoring - Detailed capacity metrics
Image Storage - Image system management
Capacity Planning - Infrastructure sizing guide

API Endpoints

The platform health page uses these admin API endpoints:

Endpoint	Description
`GET /api/v1/admin/platform/health`	Overall health status
`GET /api/v1/admin/platform/metrics`	Current metrics
`GET /api/v1/admin/platform/metrics/history`	Historical data
`GET /api/v1/admin/platform/capacity`	Capacity usage

2.6 KiB Raw Blame History