feat: add capacity planning docs, image upload system, and platform health monitoring

Documentation: - Add comprehensive capacity planning guide (docs/architecture/capacity-planning.md) - Add operations docs: platform-health, capacity-monitoring, image-storage - Link pricing strategy to capacity planning documentation - Update mkdocs.yml with new Operations section Image Upload System: - Add ImageService with WebP conversion and sharded directory structure - Generate multiple size variants (original, 800px, 200px) - Add storage stats endpoint for monitoring - Add Pillow dependency for image processing Platform Health Monitoring: - Add /admin/platform-health page with real-time metrics - Show CPU, memory, disk usage with progress bars - Display capacity thresholds with status indicators - Generate scaling recommendations automatically - Determine infrastructure tier based on usage - Add psutil dependency for system metrics Admin UI: - Add Capacity Monitor to Platform Health section in sidebar - Create platform-health.html template with stats cards - Create platform-health.js for Alpine.js state management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 17:17:09 +01:00
parent b25d119899
commit dc7fb5ca19
16 changed files with 2352 additions and 0 deletions
--- a/docs/architecture/capacity-planning.md
+++ b/docs/architecture/capacity-planning.md
@@ -0,0 +1,454 @@
+# Capacity Planning & Infrastructure Sizing
+
+This document provides comprehensive capacity planning guidelines for the Wizamart platform, including resource requirements, scaling thresholds, and monitoring recommendations.
+
+> **Related:** [Pricing Strategy](../marketing/pricing.md) for tier definitions and limits
+
+---
+
+## Tier Resource Allocations
+
+Based on our [pricing tiers](../marketing/pricing.md), here are the expected resource requirements per client:
+
+| Metric | Essential (€49) | Professional (€99) | Business (€199) | Enterprise (€399+) |
+|--------|-----------------|--------------------|-----------------|--------------------|
+| Products | 200 | 500 | 2,000 | Unlimited |
+| Images per product | 3 | 5 | 8 | 10+ |
+| Orders per month | 100 | 500 | 2,000 | Unlimited |
+| SKU variants | 1.2x | 1.5x | 2x | 3x |
+| Team members | 1 | 3 | 10 | Unlimited |
+| API requests/day | 1,000 | 5,000 | 20,000 | Unlimited |
+
+---
+
+## Scale Projections
+
+### Target: 1,000 Business Clients (€149/month tier)
+
+This represents our primary growth target. Here's the infrastructure impact:
+
+| Resource | Calculation | Total |
+|----------|-------------|-------|
+| **Products** | 1,000 clients × 500 products | **500,000** |
+| **Product Translations** | 500,000 × 4 languages | **2,000,000 rows** |
+| **Images (files)** | 500,000 × 5 images × 3 sizes | **7,500,000 files** |
+| **Image Storage** | 7.5M files × 200KB avg | **1.5 TB** |
+| **Database Size** | Products + translations + orders + indexes | **15-25 GB** |
+| **Monthly Orders** | 1,000 clients × 300 orders | **300,000 orders** |
+| **Order Items** | 300,000 × 2.5 avg items | **750,000 items/month** |
+| **Monthly API Requests** | 1,000 × 10,000 req/day × 30 | **300M requests** |
+
+### Multi-Tier Mix (Realistic Scenario)
+
+More realistic distribution across tiers:
+
+| Tier | Clients | Products Each | Total Products | Monthly Orders |
+|------|---------|---------------|----------------|----------------|
+| Essential | 500 | 100 | 50,000 | 50,000 |
+| Professional | 300 | 300 | 90,000 | 150,000 |
+| Business | 150 | 1,000 | 150,000 | 300,000 |
+| Enterprise | 50 | 3,000 | 150,000 | 200,000 |
+| **Total** | **1,000** | - | **440,000** | **700,000** |
+
+---
+
+## Server Sizing Recommendations
+
+### Infrastructure Tiers
+
+| Scale | Clients | vCPU | RAM | Storage | Database | Monthly Cost |
+|-------|---------|------|-----|---------|----------|--------------|
+| **Starter** | 1-50 | 2 | 4GB | 100GB SSD | SQLite | €20-40 |
+| **Small** | 50-100 | 4 | 8GB | 250GB SSD | PostgreSQL | €60-100 |
+| **Medium** | 100-300 | 4 | 16GB | 500GB SSD | PostgreSQL | €100-180 |
+| **Large** | 300-500 | 8 | 32GB | 1TB SSD | PostgreSQL + Redis | €250-400 |
+| **Scale** | 500-1000 | 16 | 64GB | 2TB SSD + CDN | PostgreSQL + Redis | €500-900 |
+| **Enterprise** | 1000+ | 32+ | 128GB+ | 4TB+ + CDN | PostgreSQL cluster | €1,500+ |
+
+### Recommended Configurations
+
+#### Starter (1-50 clients)
+```
+Single Server Setup:
+- Hetzner CX22 or similar (2 vCPU, 4GB RAM)
+- 100GB SSD storage
+- SQLite database
+- nginx for static files + reverse proxy
+- Estimated cost: €20-40/month
+```
+
+#### Small-Medium (50-300 clients)
+```
+Two-Server Setup:
+- App Server: 4 vCPU, 8-16GB RAM
+- Database: Managed PostgreSQL (basic tier)
+- Storage: Local SSD + backup
+- Optional: Redis for sessions/caching
+- Estimated cost: €80-180/month
+```
+
+#### Large (300-1000 clients)
+```
+Multi-Component Setup:
+- Load Balancer: nginx or cloud LB
+- App Servers: 2-4 × (4 vCPU, 8GB RAM)
+- Database: Managed PostgreSQL (production tier)
+- Cache: Redis (managed or self-hosted)
+- Storage: Object storage (S3-compatible) + CDN
+- Estimated cost: €400-900/month
+```
+
+#### Enterprise (1000+ clients)
+```
+Full Production Setup:
+- CDN: Cloudflare or similar
+- Load Balancer: Cloud-native with health checks
+- App Servers: 4-8 × (4 vCPU, 16GB RAM) with auto-scaling
+- Database: PostgreSQL with read replicas
+- Cache: Redis cluster
+- Storage: S3 + CloudFront or equivalent
+- Monitoring: Prometheus + Grafana
+- Estimated cost: €1,500+/month
+```
+
+---
+
+## Image Storage Architecture
+
+### Capacity Calculations
+
+| Image Size (optimized) | Files per 25GB | Files per 100GB | Files per 1TB |
+|------------------------|----------------|-----------------|---------------|
+| 100KB (thumbnails) | 250,000 | 1,000,000 | 10,000,000 |
+| 200KB (web-ready) | 125,000 | 500,000 | 5,000,000 |
+| 300KB (high quality) | 83,000 | 333,000 | 3,330,000 |
+| 500KB (original) | 50,000 | 200,000 | 2,000,000 |
+
+### Image Sizes Generated
+
+Each uploaded image generates 3 variants:
+
+| Variant | Dimensions | Typical Size | Use Case |
+|---------|------------|--------------|----------|
+| `thumb` | 200×200 | 10-20KB | List views, grids |
+| `medium` | 800×800 | 80-150KB | Product cards, previews |
+| `original` | As uploaded | 200-500KB | Detail views, zoom |
+
+**Storage per product:** ~600KB (with 3 sizes for main image + 2 additional images)
+
+### Directory Structure (Sharded)
+
+To prevent filesystem performance degradation, images are stored in a sharded directory structure:
+
+```
+/uploads/
+  └── products/
+      ├── 00/                    # First 2 chars of hash
+      │   ├── 1a/               # Next 2 chars
+      │   │   ├── 001a2b3c_original.webp
+      │   │   ├── 001a2b3c_800.webp
+      │   │   └── 001a2b3c_200.webp
+      │   └── 2b/
+      │       └── ...
+      ├── 01/
+      └── ...
+```
+
+This structure ensures:
+- Maximum ~256 subdirectories per level
+- ~16 files per leaf directory at 1M total images
+- Fast filesystem operations even at scale
+
+### Performance Thresholds
+
+| Files per Directory | Performance | Required Action |
+|---------------------|-------------|-----------------|
+| < 10,000 | Excellent | None |
+| 10,000 - 100,000 | Good | Monitor, plan sharding |
+| 100,000 - 500,000 | Degraded | **Implement sharding** |
+| > 500,000 | Poor | **Migrate to object storage** |
+
+---
+
+## Database Performance
+
+### Table Size Guidelines
+
+| Table | Rows | Query Time | Status |
+|-------|------|------------|--------|
+| < 10,000 | < 1ms | Excellent |
+| 10,000 - 100,000 | 1-10ms | Good |
+| 100,000 - 1,000,000 | 10-50ms | **Add indexes, optimize queries** |
+| 1,000,000 - 10,000,000 | 50-200ms | **Consider partitioning** |
+| > 10,000,000 | Variable | **Sharding or dedicated DB** |
+
+### Critical Indexes
+
+Ensure these indexes exist at scale:
+
+```sql
+-- Products
+CREATE INDEX idx_product_vendor_active ON products(vendor_id, is_active);
+CREATE INDEX idx_product_gtin ON products(gtin);
+CREATE INDEX idx_product_vendor_sku ON products(vendor_id, vendor_sku);
+
+-- Orders
+CREATE INDEX idx_order_vendor_status ON orders(vendor_id, status);
+CREATE INDEX idx_order_created ON orders(created_at DESC);
+CREATE INDEX idx_order_customer ON orders(customer_id);
+
+-- Inventory
+CREATE INDEX idx_inventory_product_location ON inventory(product_id, warehouse, bin_location);
+CREATE INDEX idx_inventory_vendor ON inventory(vendor_id);
+```
+
+### Database Size Estimates
+
+| Component | Size per 100K Products | Size per 1M Products |
+|-----------|------------------------|----------------------|
+| Products table | 100 MB | 1 GB |
+| Translations (4 langs) | 400 MB | 4 GB |
+| Orders (1 year) | 500 MB | 5 GB |
+| Order items | 200 MB | 2 GB |
+| Inventory | 50 MB | 500 MB |
+| Indexes | 300 MB | 3 GB |
+| **Total** | **~1.5 GB** | **~15 GB** |
+
+---
+
+## Bandwidth & Network
+
+### Monthly Bandwidth Estimates (1000 clients)
+
+| Traffic Type | Calculation | Monthly Volume |
+|--------------|-------------|----------------|
+| Image views | 500K products × 10 views × 500KB | **2.5 TB** |
+| API requests | 10K req/client/day × 1000 × 30 × 2KB | **600 GB** |
+| Static assets | CSS/JS cached, minimal | **50 GB** |
+| **Total Egress** | | **~3 TB/month** |
+
+### Bandwidth Costs (Approximate)
+
+| Provider | First 1TB | Additional per TB |
+|----------|-----------|-------------------|
+| Hetzner | Included | €1/TB |
+| AWS | $90 | $85/TB |
+| DigitalOcean | 1TB free | $10/TB |
+| Cloudflare | Unlimited (CDN) | Free |
+
+**Recommendation:** Use Cloudflare for image CDN to eliminate egress costs.
+
+---
+
+## Scaling Triggers & Thresholds
+
+### When to Scale Up
+
+| Metric | Warning | Critical | Action |
+|--------|---------|----------|--------|
+| CPU Usage | > 70% avg | > 85% avg | Add app server |
+| Memory Usage | > 75% | > 90% | Upgrade RAM or add server |
+| Disk Usage | > 70% | > 85% | Expand storage |
+| DB Query Time (p95) | > 100ms | > 500ms | Optimize queries, add indexes |
+| API Response Time (p95) | > 500ms | > 2000ms | Scale horizontally |
+| DB Connections | > 80% max | > 95% max | Add connection pooling |
+| Error Rate | > 1% | > 5% | Investigate and fix |
+
+### Architecture Transition Points
+
+```
+STARTER → SMALL (50 clients)
+├── Trigger: SQLite becomes bottleneck
+├── Action: Migrate to PostgreSQL
+└── Cost increase: +€40-60/month
+
+SMALL → MEDIUM (100 clients)
+├── Trigger: Single server at 70%+ CPU
+├── Action: Separate DB server
+└── Cost increase: +€50-80/month
+
+MEDIUM → LARGE (300 clients)
+├── Trigger: Need for caching, higher availability
+├── Action: Add Redis, consider multiple app servers
+└── Cost increase: +€150-200/month
+
+LARGE → SCALE (500 clients)
+├── Trigger: Storage >500GB, high traffic
+├── Action: Object storage + CDN, load balancing
+└── Cost increase: +€200-400/month
+
+SCALE → ENTERPRISE (1000+ clients)
+├── Trigger: High availability requirements, SLA
+├── Action: Full redundancy, read replicas, auto-scaling
+└── Cost increase: +€600-1000/month
+```
+
+---
+
+## Monitoring Requirements
+
+### Essential Metrics
+
+Track these metrics for capacity planning:
+
+#### Infrastructure
+- CPU utilization (per server)
+- Memory utilization
+- Disk I/O and usage
+- Network throughput
+
+#### Application
+- Request latency (p50, p95, p99)
+- Request rate (per endpoint)
+- Error rate by type
+- Active sessions
+
+#### Database
+- Query execution time
+- Connection pool usage
+- Table sizes
+- Index usage
+
+#### Business
+- Active clients
+- Products per client
+- Orders per day
+- API calls per client
+
+### Monitoring Dashboard
+
+The admin platform includes a **Capacity Monitoring** page at `/admin/platform-health` with:
+
+1. **Current Usage** - Real-time resource utilization
+2. **Growth Trends** - Historical charts for planning
+3. **Threshold Alerts** - Warning and critical indicators
+4. **Scaling Recommendations** - Automated suggestions
+
+See [Platform Health Monitoring](#platform-health-monitoring) section below.
+
+---
+
+## Cost Analysis
+
+### Infrastructure Cost per Client
+
+| Scale | Clients | Monthly Infra | Cost/Client |
+|-------|---------|---------------|-------------|
+| Starter | 25 | €30 | €1.20 |
+| Small | 75 | €80 | €1.07 |
+| Medium | 200 | €150 | €0.75 |
+| Large | 400 | €350 | €0.88 |
+| Scale | 800 | €700 | €0.88 |
+| Enterprise | 1500 | €1,800 | €1.20 |
+
+### Revenue vs Infrastructure Cost
+
+At 1,000 Business tier clients (€149/month):
+
+| Item | Monthly |
+|------|---------|
+| **Revenue** | €149,000 |
+| Infrastructure | €700-900 |
+| Support (est.) | €3,000 |
+| Development (est.) | €5,000 |
+| **Gross Margin** | **~96%** |
+
+---
+
+## Disaster Recovery
+
+### Backup Strategy by Scale
+
+| Scale | Database Backup | File Backup | RTO | RPO |
+|-------|----------------|-------------|-----|-----|
+| Starter | Daily SQLite copy | Daily rsync | 4h | 24h |
+| Small | Daily pg_dump | Daily sync | 2h | 12h |
+| Medium | Managed backups | S3 versioning | 1h | 6h |
+| Large | Point-in-time | S3 + cross-region | 30m | 1h |
+| Enterprise | Streaming replicas | Multi-region | 5m | 5m |
+
+---
+
+## Platform Health Monitoring
+
+The admin dashboard includes a dedicated capacity monitoring page that tracks:
+
+### Metrics Displayed
+
+1. **Client Growth**
+   - Total active clients
+   - New clients this month
+   - Churn rate
+
+2. **Resource Usage**
+   - Total products across all vendors
+   - Total images stored
+   - Database size
+   - Storage usage
+
+3. **Performance Indicators**
+   - Average API response time
+   - Database query latency
+   - Error rate
+
+4. **Threshold Status**
+   - Current infrastructure tier
+   - Distance to next threshold
+   - Recommended actions
+
+### Alert Configuration
+
+Configure alerts for proactive scaling:
+
+```python
+CAPACITY_THRESHOLDS = {
+    "products_total": {
+        "warning": 400_000,   # 80% of 500K
+        "critical": 475_000,  # 95% of 500K
+    },
+    "storage_gb": {
+        "warning": 800,       # 80% of 1TB
+        "critical": 950,
+    },
+    "db_size_gb": {
+        "warning": 20,
+        "critical": 24,
+    },
+    "avg_response_ms": {
+        "warning": 200,
+        "critical": 500,
+    },
+}
+```
+
+---
+
+## Quick Reference
+
+### TL;DR Sizing Guide
+
+| Clients | Server | RAM | Storage | Database | Monthly Cost |
+|---------|--------|-----|---------|----------|--------------|
+| 1-50 | 2 vCPU | 4GB | 100GB | SQLite | €30 |
+| 50-100 | 4 vCPU | 8GB | 250GB | PostgreSQL | €80 |
+| 100-300 | 4 vCPU | 16GB | 500GB | PostgreSQL | €150 |
+| 300-500 | 8 vCPU | 32GB | 1TB | PostgreSQL + Redis | €350 |
+| 500-1000 | 16 vCPU | 64GB | 2TB + CDN | PostgreSQL + Redis | €700 |
+| 1000+ | 32+ vCPU | 128GB+ | 4TB+ + CDN | PostgreSQL cluster | €1,500+ |
+
+### Key Formulas
+
+```
+Storage (GB) = (Products × Images × 3 sizes × 200KB) / 1,000,000
+DB Size (GB) = Products × 0.00003 + Orders × 0.00002
+Bandwidth (TB/mo) = Products × Daily Views × 500KB × 30 / 1,000,000,000
+```
+
+---
+
+## See Also
+
+- [Pricing Strategy](../marketing/pricing.md) - Tier definitions and limits
+- [Multi-Tenant Architecture](multi-tenant.md) - How client isolation works
+- [Background Tasks](background-tasks.md) - Task queue scaling
+- [Production Deployment](../deployment/production.md) - Deployment guidelines
--- a/docs/marketing/pricing.md
+++ b/docs/marketing/pricing.md
@@ -6,6 +6,8 @@

 A focused Order Management System built specifically for Luxembourg e-commerce. Works alongside Letzshop, not instead of it. Provides the operational tools Letzshop lacks: real inventory, correct invoicing, customer ownership.

+> **Infrastructure Planning:** See [Capacity Planning](../architecture/capacity-planning.md) for resource requirements, server sizing, and scaling guidelines per tier.
+
 ---

 ## Market Context
--- a/docs/operations/capacity-monitoring.md
+++ b/docs/operations/capacity-monitoring.md
@@ -0,0 +1,121 @@
+# Capacity Monitoring
+
+Detailed guide for monitoring and managing platform capacity.
+
+## Overview
+
+The Capacity Monitoring page (`/admin/platform-health/capacity`) provides insights into resource consumption and helps plan infrastructure scaling.
+
+## Key Metrics
+
+### Client Metrics
+
+| Metric | Description | Threshold Indicator |
+|--------|-------------|---------------------|
+| Active Clients | Vendors with activity in last 30 days | Scale planning |
+| Total Products | Sum across all vendors | Storage/DB sizing |
+| Products per Client | Average products per vendor | Tier compliance |
+| Monthly Orders | Order volume this month | Performance impact |
+
+### Storage Metrics
+
+| Metric | Description | Warning | Critical |
+|--------|-------------|---------|----------|
+| Image Files | Total files in storage | 80% of limit | 95% of limit |
+| Image Storage (GB) | Total size in gigabytes | 80% of disk | 95% of disk |
+| Database Size (GB) | PostgreSQL data size | 80% of allocation | 95% of allocation |
+| Backup Size (GB) | Latest backup size | Informational | N/A |
+
+### Performance Metrics
+
+| Metric | Good | Warning | Critical |
+|--------|------|---------|----------|
+| Avg Response Time | < 100ms | 100-300ms | > 300ms |
+| DB Query Time (p95) | < 50ms | 50-200ms | > 200ms |
+| Cache Hit Rate | > 90% | 70-90% | < 70% |
+| Connection Pool Usage | < 70% | 70-90% | > 90% |
+
+## Scaling Recommendations
+
+The system provides automatic scaling recommendations based on current usage:
+
+### Example Recommendations
+
+```
+Current Infrastructure: MEDIUM (100-300 clients)
+Current Usage: 85% of capacity
+
+Recommendations:
+1. [WARNING] Approaching product limit (420K of 500K)
+   → Consider upgrading to LARGE tier
+
+2. [INFO] Database size growing 5GB/month
+   → Plan storage expansion in 3 months
+
+3. [OK] API response times within normal range
+   → No action needed
+```
+
+## Threshold Configuration
+
+Edit thresholds in the admin settings or via environment:
+
+```python
+# Capacity thresholds (can be configured per deployment)
+CAPACITY_THRESHOLDS = {
+    # Products
+    "products_total": {
+        "warning": 400_000,
+        "critical": 475_000,
+        "limit": 500_000,
+    },
+    # Storage (GB)
+    "storage_gb": {
+        "warning": 800,
+        "critical": 950,
+        "limit": 1000,
+    },
+    # Database (GB)
+    "db_size_gb": {
+        "warning": 20,
+        "critical": 24,
+        "limit": 25,
+    },
+    # Monthly orders
+    "monthly_orders": {
+        "warning": 250_000,
+        "critical": 280_000,
+        "limit": 300_000,
+    },
+}
+```
+
+## Historical Trends
+
+View growth trends to plan ahead:
+
+- **30-day growth rate**: Products, storage, clients
+- **Projected capacity date**: When limits will be reached
+- **Seasonal patterns**: Order volume fluctuations
+
+## Alerts
+
+Capacity alerts trigger when:
+
+1. **Warning (Yellow)**: 80% of any threshold
+2. **Critical (Red)**: 95% of any threshold
+3. **Exceeded**: 100%+ of threshold (immediate action)
+
+## Export Reports
+
+Generate capacity reports for planning:
+
+- **Weekly summary**: PDF or CSV
+- **Monthly capacity report**: Detailed analysis
+- **Projection report**: 3/6/12 month forecasts
+
+## Related Documentation
+
+- [Capacity Planning](../architecture/capacity-planning.md) - Full sizing guide
+- [Platform Health](platform-health.md) - Real-time health monitoring
+- [Image Storage](image-storage.md) - Image system details
--- a/docs/operations/image-storage.md
+++ b/docs/operations/image-storage.md
@@ -0,0 +1,246 @@
+# Image Storage System
+
+Documentation for the platform's image storage and management system.
+
+## Overview
+
+The Wizamart platform uses a self-hosted image storage system with:
+
+- **Sharded directory structure** for filesystem performance
+- **Automatic WebP conversion** for optimization
+- **Multiple size variants** for different use cases
+- **CDN-ready architecture** for scaling
+
+## Storage Architecture
+
+### Directory Structure
+
+Images are stored in a sharded directory structure to prevent filesystem performance degradation:
+
+```
+/static/uploads/
+  └── products/
+      ├── 00/                    # First 2 chars of hash
+      │   ├── 1a/               # Next 2 chars
+      │   │   ├── 001a2b3c_original.webp
+      │   │   ├── 001a2b3c_800.webp
+      │   │   └── 001a2b3c_200.webp
+      │   └── 2b/
+      │       └── ...
+      ├── 01/
+      │   └── ...
+      └── ff/
+          └── ...
+```
+
+### Hash Generation
+
+The file hash is generated from:
+```python
+hash = md5(f"{vendor_id}:{product_id}:{timestamp}:{original_filename}")[:8]
+```
+
+This ensures:
+- Unique file paths
+- Even distribution across directories
+- Predictable file locations
+
+## Image Variants
+
+Each uploaded image generates multiple variants:
+
+| Variant | Max Dimensions | Format | Use Case |
+|---------|---------------|--------|----------|
+| `original` | As uploaded (max 2000px) | WebP | Detail view, zoom |
+| `800` | 800×800 | WebP | Product cards |
+| `200` | 200×200 | WebP | Thumbnails, grids |
+
+### Size Estimates
+
+| Original Size | After Processing | Storage per Image |
+|---------------|------------------|-------------------|
+| 2MB JPEG | ~200KB (original) + 80KB (800) + 15KB (200) | ~295KB |
+| 500KB JPEG | ~150KB (original) + 60KB (800) + 12KB (200) | ~222KB |
+| 100KB JPEG | ~80KB (original) + 40KB (800) + 10KB (200) | ~130KB |
+
+**Average: ~200KB per image (all variants)**
+
+## Upload Process
+
+### API Endpoint
+
+```http
+POST /api/v1/admin/images/upload
+Content-Type: multipart/form-data
+
+file: <binary>
+vendor_id: 123
+product_id: 456 (optional, for product images)
+type: product|category|banner
+```
+
+### Response
+
+```json
+{
+  "success": true,
+  "image": {
+    "id": "001a2b3c",
+    "urls": {
+      "original": "/uploads/products/00/1a/001a2b3c_original.webp",
+      "medium": "/uploads/products/00/1a/001a2b3c_800.webp",
+      "thumb": "/uploads/products/00/1a/001a2b3c_200.webp"
+    },
+    "size_bytes": 295000,
+    "dimensions": {
+      "width": 1200,
+      "height": 1200
+    }
+  }
+}
+```
+
+## Configuration
+
+### Environment Variables
+
+```bash
+# Image storage configuration
+IMAGE_UPLOAD_DIR=/var/www/uploads
+IMAGE_MAX_SIZE_MB=10
+IMAGE_ALLOWED_TYPES=jpg,jpeg,png,gif,webp
+IMAGE_QUALITY=85
+IMAGE_MAX_DIMENSION=2000
+```
+
+### Python Configuration
+
+```python
+# app/core/config.py
+class ImageSettings:
+    UPLOAD_DIR: str = "/static/uploads"
+    MAX_SIZE_MB: int = 10
+    ALLOWED_TYPES: list = ["jpg", "jpeg", "png", "gif", "webp"]
+    QUALITY: int = 85
+    MAX_DIMENSION: int = 2000
+
+    # Generated sizes
+    SIZES: dict = {
+        "original": None,  # No resize, just optimize
+        "medium": 800,
+        "thumb": 200,
+    }
+```
+
+## Performance Guidelines
+
+### Filesystem Limits
+
+| Files per Directory | Status | Action |
+|---------------------|--------|--------|
+| < 10,000 | OK | None needed |
+| 10,000 - 50,000 | Monitor | Plan migration |
+| 50,000 - 100,000 | Warning | Increase sharding depth |
+| > 100,000 | Critical | Migrate to object storage |
+
+### Capacity Planning
+
+| Products | Images (5/product) | Total Files (3 sizes) | Storage |
+|----------|--------------------|-----------------------|---------|
+| 10,000 | 50,000 | 150,000 | 30 GB |
+| 50,000 | 250,000 | 750,000 | 150 GB |
+| 100,000 | 500,000 | 1,500,000 | 300 GB |
+| 500,000 | 2,500,000 | 7,500,000 | 1.5 TB |
+
+## CDN Integration
+
+For production deployments, configure a CDN for image delivery:
+
+### Cloudflare (Recommended)
+
+1. Set up Cloudflare for your domain
+2. Configure page rules for `/uploads/*`:
+   - Cache Level: Cache Everything
+   - Edge Cache TTL: 1 month
+   - Browser Cache TTL: 1 week
+
+### nginx Configuration
+
+```nginx
+location /uploads/ {
+    alias /var/www/uploads/;
+    expires 30d;
+    add_header Cache-Control "public, immutable";
+    add_header X-Content-Type-Options nosniff;
+
+    # WebP fallback for older browsers
+    location ~ \.(jpg|jpeg|png)$ {
+        try_files $uri$webp_suffix $uri =404;
+    }
+}
+```
+
+## Maintenance
+
+### Cleanup Orphaned Images
+
+Remove images not referenced by any product:
+
+```bash
+# Run via admin CLI
+python -m scripts.cleanup_orphaned_images --dry-run
+python -m scripts.cleanup_orphaned_images --execute
+```
+
+### Regenerate Variants
+
+If image quality settings change:
+
+```bash
+# Regenerate all variants for a vendor
+python -m scripts.regenerate_images --vendor-id 123
+
+# Regenerate all variants (use with caution)
+python -m scripts.regenerate_images --all
+```
+
+## Monitoring
+
+### Metrics to Track
+
+- Total file count
+- Storage used (GB)
+- Files per directory (max)
+- Upload success rate
+- Average processing time
+
+### Health Checks
+
+The platform health page includes image storage metrics:
+
+- Current file count
+- Storage usage
+- Directory distribution
+- Processing queue status
+
+## Troubleshooting
+
+### Common Issues
+
+**Upload fails with "File too large"**
+- Check `IMAGE_MAX_SIZE_MB` setting
+- Verify nginx `client_max_body_size`
+
+**Images not displaying**
+- Check file permissions (should be readable by web server)
+- Verify URL paths match actual file locations
+
+**Slow uploads**
+- Check disk I/O performance
+- Consider async processing queue
+
+## Related Documentation
+
+- [Capacity Planning](../architecture/capacity-planning.md)
+- [Platform Health](platform-health.md)
+- [Capacity Monitoring](capacity-monitoring.md)
--- a/docs/operations/platform-health.md
+++ b/docs/operations/platform-health.md
@@ -0,0 +1,92 @@
+# Platform Health Monitoring
+
+This guide covers the platform health monitoring features available in the admin dashboard.
+
+## Overview
+
+The Platform Health page (`/admin/platform-health`) provides real-time visibility into system performance, resource usage, and capacity thresholds.
+
+## Accessing Platform Health
+
+Navigate to **Admin > Platform Health** in the sidebar, or go directly to `/admin/platform-health`.
+
+## Dashboard Sections
+
+### 1. System Overview
+
+Quick glance at overall platform status:
+
+| Indicator | Green | Yellow | Red |
+|-----------|-------|--------|-----|
+| API Response Time | < 100ms | 100-500ms | > 500ms |
+| Error Rate | < 0.1% | 0.1-1% | > 1% |
+| Database Health | Connected | Slow queries | Disconnected |
+| Storage | < 70% | 70-85% | > 85% |
+
+### 2. Resource Usage
+
+Real-time metrics:
+
+- **CPU Usage**: Current and 24h average
+- **Memory Usage**: Used vs available
+- **Disk Usage**: Storage consumption with trend
+- **Network**: Inbound/outbound throughput
+
+### 3. Capacity Metrics
+
+Track growth toward scaling thresholds:
+
+- **Total Products**: Count across all vendors
+- **Total Images**: Files stored in image system
+- **Database Size**: Current size vs recommended max
+- **Active Clients**: Monthly active vendor accounts
+
+### 4. Performance Trends
+
+Historical charts (7-day, 30-day):
+
+- API response times (p50, p95, p99)
+- Request volume by endpoint
+- Database query latency
+- Error rate over time
+
+## Alert Configuration
+
+### Threshold Alerts
+
+Configure alerts for proactive monitoring:
+
+```python
+# In app/core/config.py
+HEALTH_THRESHOLDS = {
+    "cpu_percent": {"warning": 70, "critical": 85},
+    "memory_percent": {"warning": 75, "critical": 90},
+    "disk_percent": {"warning": 70, "critical": 85},
+    "response_time_ms": {"warning": 200, "critical": 500},
+    "error_rate_percent": {"warning": 1.0, "critical": 5.0},
+}
+```
+
+### Notification Channels
+
+Alerts can be sent via:
+- Email to admin users
+- Slack webhook (if configured)
+- Dashboard notifications
+
+## Related Pages
+
+- [Capacity Monitoring](capacity-monitoring.md) - Detailed capacity metrics
+- [Image Storage](image-storage.md) - Image system management
+- [Capacity Planning](../architecture/capacity-planning.md) - Infrastructure sizing guide
+
+## API Endpoints
+
+The platform health page uses these admin API endpoints:
+
+| Endpoint | Description |
+|----------|-------------|
+| `GET /api/v1/admin/platform/health` | Overall health status |
+| `GET /api/v1/admin/platform/metrics` | Current metrics |
+| `GET /api/v1/admin/platform/metrics/history` | Historical data |
+| `GET /api/v1/admin/platform/capacity` | Capacity usage |