feat: add capacity planning docs, image upload system, and platform health monitoring

Documentation: - Add comprehensive capacity planning guide (docs/architecture/capacity-planning.md) - Add operations docs: platform-health, capacity-monitoring, image-storage - Link pricing strategy to capacity planning documentation - Update mkdocs.yml with new Operations section Image Upload System: - Add ImageService with WebP conversion and sharded directory structure - Generate multiple size variants (original, 800px, 200px) - Add storage stats endpoint for monitoring - Add Pillow dependency for image processing Platform Health Monitoring: - Add /admin/platform-health page with real-time metrics - Show CPU, memory, disk usage with progress bars - Display capacity thresholds with status indicators - Generate scaling recommendations automatically - Determine infrastructure tier based on usage - Add psutil dependency for system metrics Admin UI: - Add Capacity Monitor to Platform Health section in sidebar - Create platform-health.html template with stats cards - Create platform-health.js for Alpine.js state management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-25 17:17:09 +01:00
parent b25d119899
commit dc7fb5ca19
16 changed files with 2352 additions and 0 deletions
--- a/docs/operations/capacity-monitoring.md
+++ b/docs/operations/capacity-monitoring.md
@@ -0,0 +1,121 @@
+# Capacity Monitoring
+
+Detailed guide for monitoring and managing platform capacity.
+
+## Overview
+
+The Capacity Monitoring page (`/admin/platform-health/capacity`) provides insights into resource consumption and helps plan infrastructure scaling.
+
+## Key Metrics
+
+### Client Metrics
+
+| Metric | Description | Threshold Indicator |
+|--------|-------------|---------------------|
+| Active Clients | Vendors with activity in last 30 days | Scale planning |
+| Total Products | Sum across all vendors | Storage/DB sizing |
+| Products per Client | Average products per vendor | Tier compliance |
+| Monthly Orders | Order volume this month | Performance impact |
+
+### Storage Metrics
+
+| Metric | Description | Warning | Critical |
+|--------|-------------|---------|----------|
+| Image Files | Total files in storage | 80% of limit | 95% of limit |
+| Image Storage (GB) | Total size in gigabytes | 80% of disk | 95% of disk |
+| Database Size (GB) | PostgreSQL data size | 80% of allocation | 95% of allocation |
+| Backup Size (GB) | Latest backup size | Informational | N/A |
+
+### Performance Metrics
+
+| Metric | Good | Warning | Critical |
+|--------|------|---------|----------|
+| Avg Response Time | < 100ms | 100-300ms | > 300ms |
+| DB Query Time (p95) | < 50ms | 50-200ms | > 200ms |
+| Cache Hit Rate | > 90% | 70-90% | < 70% |
+| Connection Pool Usage | < 70% | 70-90% | > 90% |
+
+## Scaling Recommendations
+
+The system provides automatic scaling recommendations based on current usage:
+
+### Example Recommendations
+
+```
+Current Infrastructure: MEDIUM (100-300 clients)
+Current Usage: 85% of capacity
+
+Recommendations:
+1. [WARNING] Approaching product limit (420K of 500K)
+   → Consider upgrading to LARGE tier
+
+2. [INFO] Database size growing 5GB/month
+   → Plan storage expansion in 3 months
+
+3. [OK] API response times within normal range
+   → No action needed
+```
+
+## Threshold Configuration
+
+Edit thresholds in the admin settings or via environment:
+
+```python
+# Capacity thresholds (can be configured per deployment)
+CAPACITY_THRESHOLDS = {
+    # Products
+    "products_total": {
+        "warning": 400_000,
+        "critical": 475_000,
+        "limit": 500_000,
+    },
+    # Storage (GB)
+    "storage_gb": {
+        "warning": 800,
+        "critical": 950,
+        "limit": 1000,
+    },
+    # Database (GB)
+    "db_size_gb": {
+        "warning": 20,
+        "critical": 24,
+        "limit": 25,
+    },
+    # Monthly orders
+    "monthly_orders": {
+        "warning": 250_000,
+        "critical": 280_000,
+        "limit": 300_000,
+    },
+}
+```
+
+## Historical Trends
+
+View growth trends to plan ahead:
+
+- **30-day growth rate**: Products, storage, clients
+- **Projected capacity date**: When limits will be reached
+- **Seasonal patterns**: Order volume fluctuations
+
+## Alerts
+
+Capacity alerts trigger when:
+
+1. **Warning (Yellow)**: 80% of any threshold
+2. **Critical (Red)**: 95% of any threshold
+3. **Exceeded**: 100%+ of threshold (immediate action)
+
+## Export Reports
+
+Generate capacity reports for planning:
+
+- **Weekly summary**: PDF or CSV
+- **Monthly capacity report**: Detailed analysis
+- **Projection report**: 3/6/12 month forecasts
+
+## Related Documentation
+
+- [Capacity Planning](../architecture/capacity-planning.md) - Full sizing guide
+- [Platform Health](platform-health.md) - Real-time health monitoring
+- [Image Storage](image-storage.md) - Image system details
--- a/docs/operations/image-storage.md
+++ b/docs/operations/image-storage.md
@@ -0,0 +1,246 @@
+# Image Storage System
+
+Documentation for the platform's image storage and management system.
+
+## Overview
+
+The Wizamart platform uses a self-hosted image storage system with:
+
+- **Sharded directory structure** for filesystem performance
+- **Automatic WebP conversion** for optimization
+- **Multiple size variants** for different use cases
+- **CDN-ready architecture** for scaling
+
+## Storage Architecture
+
+### Directory Structure
+
+Images are stored in a sharded directory structure to prevent filesystem performance degradation:
+
+```
+/static/uploads/
+  └── products/
+      ├── 00/                    # First 2 chars of hash
+      │   ├── 1a/               # Next 2 chars
+      │   │   ├── 001a2b3c_original.webp
+      │   │   ├── 001a2b3c_800.webp
+      │   │   └── 001a2b3c_200.webp
+      │   └── 2b/
+      │       └── ...
+      ├── 01/
+      │   └── ...
+      └── ff/
+          └── ...
+```
+
+### Hash Generation
+
+The file hash is generated from:
+```python
+hash = md5(f"{vendor_id}:{product_id}:{timestamp}:{original_filename}")[:8]
+```
+
+This ensures:
+- Unique file paths
+- Even distribution across directories
+- Predictable file locations
+
+## Image Variants
+
+Each uploaded image generates multiple variants:
+
+| Variant | Max Dimensions | Format | Use Case |
+|---------|---------------|--------|----------|
+| `original` | As uploaded (max 2000px) | WebP | Detail view, zoom |
+| `800` | 800×800 | WebP | Product cards |
+| `200` | 200×200 | WebP | Thumbnails, grids |
+
+### Size Estimates
+
+| Original Size | After Processing | Storage per Image |
+|---------------|------------------|-------------------|
+| 2MB JPEG | ~200KB (original) + 80KB (800) + 15KB (200) | ~295KB |
+| 500KB JPEG | ~150KB (original) + 60KB (800) + 12KB (200) | ~222KB |
+| 100KB JPEG | ~80KB (original) + 40KB (800) + 10KB (200) | ~130KB |
+
+**Average: ~200KB per image (all variants)**
+
+## Upload Process
+
+### API Endpoint
+
+```http
+POST /api/v1/admin/images/upload
+Content-Type: multipart/form-data
+
+file: <binary>
+vendor_id: 123
+product_id: 456 (optional, for product images)
+type: product|category|banner
+```
+
+### Response
+
+```json
+{
+  "success": true,
+  "image": {
+    "id": "001a2b3c",
+    "urls": {
+      "original": "/uploads/products/00/1a/001a2b3c_original.webp",
+      "medium": "/uploads/products/00/1a/001a2b3c_800.webp",
+      "thumb": "/uploads/products/00/1a/001a2b3c_200.webp"
+    },
+    "size_bytes": 295000,
+    "dimensions": {
+      "width": 1200,
+      "height": 1200
+    }
+  }
+}
+```
+
+## Configuration
+
+### Environment Variables
+
+```bash
+# Image storage configuration
+IMAGE_UPLOAD_DIR=/var/www/uploads
+IMAGE_MAX_SIZE_MB=10
+IMAGE_ALLOWED_TYPES=jpg,jpeg,png,gif,webp
+IMAGE_QUALITY=85
+IMAGE_MAX_DIMENSION=2000
+```
+
+### Python Configuration
+
+```python
+# app/core/config.py
+class ImageSettings:
+    UPLOAD_DIR: str = "/static/uploads"
+    MAX_SIZE_MB: int = 10
+    ALLOWED_TYPES: list = ["jpg", "jpeg", "png", "gif", "webp"]
+    QUALITY: int = 85
+    MAX_DIMENSION: int = 2000
+
+    # Generated sizes
+    SIZES: dict = {
+        "original": None,  # No resize, just optimize
+        "medium": 800,
+        "thumb": 200,
+    }
+```
+
+## Performance Guidelines
+
+### Filesystem Limits
+
+| Files per Directory | Status | Action |
+|---------------------|--------|--------|
+| < 10,000 | OK | None needed |
+| 10,000 - 50,000 | Monitor | Plan migration |
+| 50,000 - 100,000 | Warning | Increase sharding depth |
+| > 100,000 | Critical | Migrate to object storage |
+
+### Capacity Planning
+
+| Products | Images (5/product) | Total Files (3 sizes) | Storage |
+|----------|--------------------|-----------------------|---------|
+| 10,000 | 50,000 | 150,000 | 30 GB |
+| 50,000 | 250,000 | 750,000 | 150 GB |
+| 100,000 | 500,000 | 1,500,000 | 300 GB |
+| 500,000 | 2,500,000 | 7,500,000 | 1.5 TB |
+
+## CDN Integration
+
+For production deployments, configure a CDN for image delivery:
+
+### Cloudflare (Recommended)
+
+1. Set up Cloudflare for your domain
+2. Configure page rules for `/uploads/*`:
+   - Cache Level: Cache Everything
+   - Edge Cache TTL: 1 month
+   - Browser Cache TTL: 1 week
+
+### nginx Configuration
+
+```nginx
+location /uploads/ {
+    alias /var/www/uploads/;
+    expires 30d;
+    add_header Cache-Control "public, immutable";
+    add_header X-Content-Type-Options nosniff;
+
+    # WebP fallback for older browsers
+    location ~ \.(jpg|jpeg|png)$ {
+        try_files $uri$webp_suffix $uri =404;
+    }
+}
+```
+
+## Maintenance
+
+### Cleanup Orphaned Images
+
+Remove images not referenced by any product:
+
+```bash
+# Run via admin CLI
+python -m scripts.cleanup_orphaned_images --dry-run
+python -m scripts.cleanup_orphaned_images --execute
+```
+
+### Regenerate Variants
+
+If image quality settings change:
+
+```bash
+# Regenerate all variants for a vendor
+python -m scripts.regenerate_images --vendor-id 123
+
+# Regenerate all variants (use with caution)
+python -m scripts.regenerate_images --all
+```
+
+## Monitoring
+
+### Metrics to Track
+
+- Total file count
+- Storage used (GB)
+- Files per directory (max)
+- Upload success rate
+- Average processing time
+
+### Health Checks
+
+The platform health page includes image storage metrics:
+
+- Current file count
+- Storage usage
+- Directory distribution
+- Processing queue status
+
+## Troubleshooting
+
+### Common Issues
+
+**Upload fails with "File too large"**
+- Check `IMAGE_MAX_SIZE_MB` setting
+- Verify nginx `client_max_body_size`
+
+**Images not displaying**
+- Check file permissions (should be readable by web server)
+- Verify URL paths match actual file locations
+
+**Slow uploads**
+- Check disk I/O performance
+- Consider async processing queue
+
+## Related Documentation
+
+- [Capacity Planning](../architecture/capacity-planning.md)
+- [Platform Health](platform-health.md)
+- [Capacity Monitoring](capacity-monitoring.md)
--- a/docs/operations/platform-health.md
+++ b/docs/operations/platform-health.md
@@ -0,0 +1,92 @@
+# Platform Health Monitoring
+
+This guide covers the platform health monitoring features available in the admin dashboard.
+
+## Overview
+
+The Platform Health page (`/admin/platform-health`) provides real-time visibility into system performance, resource usage, and capacity thresholds.
+
+## Accessing Platform Health
+
+Navigate to **Admin > Platform Health** in the sidebar, or go directly to `/admin/platform-health`.
+
+## Dashboard Sections
+
+### 1. System Overview
+
+Quick glance at overall platform status:
+
+| Indicator | Green | Yellow | Red |
+|-----------|-------|--------|-----|
+| API Response Time | < 100ms | 100-500ms | > 500ms |
+| Error Rate | < 0.1% | 0.1-1% | > 1% |
+| Database Health | Connected | Slow queries | Disconnected |
+| Storage | < 70% | 70-85% | > 85% |
+
+### 2. Resource Usage
+
+Real-time metrics:
+
+- **CPU Usage**: Current and 24h average
+- **Memory Usage**: Used vs available
+- **Disk Usage**: Storage consumption with trend
+- **Network**: Inbound/outbound throughput
+
+### 3. Capacity Metrics
+
+Track growth toward scaling thresholds:
+
+- **Total Products**: Count across all vendors
+- **Total Images**: Files stored in image system
+- **Database Size**: Current size vs recommended max
+- **Active Clients**: Monthly active vendor accounts
+
+### 4. Performance Trends
+
+Historical charts (7-day, 30-day):
+
+- API response times (p50, p95, p99)
+- Request volume by endpoint
+- Database query latency
+- Error rate over time
+
+## Alert Configuration
+
+### Threshold Alerts
+
+Configure alerts for proactive monitoring:
+
+```python
+# In app/core/config.py
+HEALTH_THRESHOLDS = {
+    "cpu_percent": {"warning": 70, "critical": 85},
+    "memory_percent": {"warning": 75, "critical": 90},
+    "disk_percent": {"warning": 70, "critical": 85},
+    "response_time_ms": {"warning": 200, "critical": 500},
+    "error_rate_percent": {"warning": 1.0, "critical": 5.0},
+}
+```
+
+### Notification Channels
+
+Alerts can be sent via:
+- Email to admin users
+- Slack webhook (if configured)
+- Dashboard notifications
+
+## Related Pages
+
+- [Capacity Monitoring](capacity-monitoring.md) - Detailed capacity metrics
+- [Image Storage](image-storage.md) - Image system management
+- [Capacity Planning](../architecture/capacity-planning.md) - Infrastructure sizing guide
+
+## API Endpoints
+
+The platform health page uses these admin API endpoints:
+
+| Endpoint | Description |
+|----------|-------------|
+| `GET /api/v1/admin/platform/health` | Overall health status |
+| `GET /api/v1/admin/platform/metrics` | Current metrics |
+| `GET /api/v1/admin/platform/metrics/history` | Historical data |
+| `GET /api/v1/admin/platform/capacity` | Capacity usage |