feat: add capacity planning docs, image upload system, and platform health monitoring

Documentation:
- Add comprehensive capacity planning guide (docs/architecture/capacity-planning.md)
- Add operations docs: platform-health, capacity-monitoring, image-storage
- Link pricing strategy to capacity planning documentation
- Update mkdocs.yml with new Operations section

Image Upload System:
- Add ImageService with WebP conversion and sharded directory structure
- Generate multiple size variants (original, 800px, 200px)
- Add storage stats endpoint for monitoring
- Add Pillow dependency for image processing

Platform Health Monitoring:
- Add /admin/platform-health page with real-time metrics
- Show CPU, memory, disk usage with progress bars
- Display capacity thresholds with status indicators
- Generate scaling recommendations automatically
- Determine infrastructure tier based on usage
- Add psutil dependency for system metrics

Admin UI:
- Add Capacity Monitor to Platform Health section in sidebar
- Create platform-health.html template with stats cards
- Create platform-health.js for Alpine.js state management

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-25 17:17:09 +01:00
parent b25d119899
commit dc7fb5ca19
16 changed files with 2352 additions and 0 deletions

View File

@@ -0,0 +1,121 @@
# Capacity Monitoring
Detailed guide for monitoring and managing platform capacity.
## Overview
The Capacity Monitoring page (`/admin/platform-health/capacity`) provides insights into resource consumption and helps plan infrastructure scaling.
## Key Metrics
### Client Metrics
| Metric | Description | Threshold Indicator |
|--------|-------------|---------------------|
| Active Clients | Vendors with activity in last 30 days | Scale planning |
| Total Products | Sum across all vendors | Storage/DB sizing |
| Products per Client | Average products per vendor | Tier compliance |
| Monthly Orders | Order volume this month | Performance impact |
### Storage Metrics
| Metric | Description | Warning | Critical |
|--------|-------------|---------|----------|
| Image Files | Total files in storage | 80% of limit | 95% of limit |
| Image Storage (GB) | Total size in gigabytes | 80% of disk | 95% of disk |
| Database Size (GB) | PostgreSQL data size | 80% of allocation | 95% of allocation |
| Backup Size (GB) | Latest backup size | Informational | N/A |
### Performance Metrics
| Metric | Good | Warning | Critical |
|--------|------|---------|----------|
| Avg Response Time | < 100ms | 100-300ms | > 300ms |
| DB Query Time (p95) | < 50ms | 50-200ms | > 200ms |
| Cache Hit Rate | > 90% | 70-90% | < 70% |
| Connection Pool Usage | < 70% | 70-90% | > 90% |
## Scaling Recommendations
The system provides automatic scaling recommendations based on current usage:
### Example Recommendations
```
Current Infrastructure: MEDIUM (100-300 clients)
Current Usage: 85% of capacity
Recommendations:
1. [WARNING] Approaching product limit (420K of 500K)
→ Consider upgrading to LARGE tier
2. [INFO] Database size growing 5GB/month
→ Plan storage expansion in 3 months
3. [OK] API response times within normal range
→ No action needed
```
## Threshold Configuration
Edit thresholds in the admin settings or via environment:
```python
# Capacity thresholds (can be configured per deployment)
CAPACITY_THRESHOLDS = {
# Products
"products_total": {
"warning": 400_000,
"critical": 475_000,
"limit": 500_000,
},
# Storage (GB)
"storage_gb": {
"warning": 800,
"critical": 950,
"limit": 1000,
},
# Database (GB)
"db_size_gb": {
"warning": 20,
"critical": 24,
"limit": 25,
},
# Monthly orders
"monthly_orders": {
"warning": 250_000,
"critical": 280_000,
"limit": 300_000,
},
}
```
## Historical Trends
View growth trends to plan ahead:
- **30-day growth rate**: Products, storage, clients
- **Projected capacity date**: When limits will be reached
- **Seasonal patterns**: Order volume fluctuations
## Alerts
Capacity alerts trigger when:
1. **Warning (Yellow)**: 80% of any threshold
2. **Critical (Red)**: 95% of any threshold
3. **Exceeded**: 100%+ of threshold (immediate action)
## Export Reports
Generate capacity reports for planning:
- **Weekly summary**: PDF or CSV
- **Monthly capacity report**: Detailed analysis
- **Projection report**: 3/6/12 month forecasts
## Related Documentation
- [Capacity Planning](../architecture/capacity-planning.md) - Full sizing guide
- [Platform Health](platform-health.md) - Real-time health monitoring
- [Image Storage](image-storage.md) - Image system details

View File

@@ -0,0 +1,246 @@
# Image Storage System
Documentation for the platform's image storage and management system.
## Overview
The Wizamart platform uses a self-hosted image storage system with:
- **Sharded directory structure** for filesystem performance
- **Automatic WebP conversion** for optimization
- **Multiple size variants** for different use cases
- **CDN-ready architecture** for scaling
## Storage Architecture
### Directory Structure
Images are stored in a sharded directory structure to prevent filesystem performance degradation:
```
/static/uploads/
└── products/
├── 00/ # First 2 chars of hash
│ ├── 1a/ # Next 2 chars
│ │ ├── 001a2b3c_original.webp
│ │ ├── 001a2b3c_800.webp
│ │ └── 001a2b3c_200.webp
│ └── 2b/
│ └── ...
├── 01/
│ └── ...
└── ff/
└── ...
```
### Hash Generation
The file hash is generated from:
```python
hash = md5(f"{vendor_id}:{product_id}:{timestamp}:{original_filename}")[:8]
```
This ensures:
- Unique file paths
- Even distribution across directories
- Predictable file locations
## Image Variants
Each uploaded image generates multiple variants:
| Variant | Max Dimensions | Format | Use Case |
|---------|---------------|--------|----------|
| `original` | As uploaded (max 2000px) | WebP | Detail view, zoom |
| `800` | 800×800 | WebP | Product cards |
| `200` | 200×200 | WebP | Thumbnails, grids |
### Size Estimates
| Original Size | After Processing | Storage per Image |
|---------------|------------------|-------------------|
| 2MB JPEG | ~200KB (original) + 80KB (800) + 15KB (200) | ~295KB |
| 500KB JPEG | ~150KB (original) + 60KB (800) + 12KB (200) | ~222KB |
| 100KB JPEG | ~80KB (original) + 40KB (800) + 10KB (200) | ~130KB |
**Average: ~200KB per image (all variants)**
## Upload Process
### API Endpoint
```http
POST /api/v1/admin/images/upload
Content-Type: multipart/form-data
file: <binary>
vendor_id: 123
product_id: 456 (optional, for product images)
type: product|category|banner
```
### Response
```json
{
"success": true,
"image": {
"id": "001a2b3c",
"urls": {
"original": "/uploads/products/00/1a/001a2b3c_original.webp",
"medium": "/uploads/products/00/1a/001a2b3c_800.webp",
"thumb": "/uploads/products/00/1a/001a2b3c_200.webp"
},
"size_bytes": 295000,
"dimensions": {
"width": 1200,
"height": 1200
}
}
}
```
## Configuration
### Environment Variables
```bash
# Image storage configuration
IMAGE_UPLOAD_DIR=/var/www/uploads
IMAGE_MAX_SIZE_MB=10
IMAGE_ALLOWED_TYPES=jpg,jpeg,png,gif,webp
IMAGE_QUALITY=85
IMAGE_MAX_DIMENSION=2000
```
### Python Configuration
```python
# app/core/config.py
class ImageSettings:
UPLOAD_DIR: str = "/static/uploads"
MAX_SIZE_MB: int = 10
ALLOWED_TYPES: list = ["jpg", "jpeg", "png", "gif", "webp"]
QUALITY: int = 85
MAX_DIMENSION: int = 2000
# Generated sizes
SIZES: dict = {
"original": None, # No resize, just optimize
"medium": 800,
"thumb": 200,
}
```
## Performance Guidelines
### Filesystem Limits
| Files per Directory | Status | Action |
|---------------------|--------|--------|
| < 10,000 | OK | None needed |
| 10,000 - 50,000 | Monitor | Plan migration |
| 50,000 - 100,000 | Warning | Increase sharding depth |
| > 100,000 | Critical | Migrate to object storage |
### Capacity Planning
| Products | Images (5/product) | Total Files (3 sizes) | Storage |
|----------|--------------------|-----------------------|---------|
| 10,000 | 50,000 | 150,000 | 30 GB |
| 50,000 | 250,000 | 750,000 | 150 GB |
| 100,000 | 500,000 | 1,500,000 | 300 GB |
| 500,000 | 2,500,000 | 7,500,000 | 1.5 TB |
## CDN Integration
For production deployments, configure a CDN for image delivery:
### Cloudflare (Recommended)
1. Set up Cloudflare for your domain
2. Configure page rules for `/uploads/*`:
- Cache Level: Cache Everything
- Edge Cache TTL: 1 month
- Browser Cache TTL: 1 week
### nginx Configuration
```nginx
location /uploads/ {
alias /var/www/uploads/;
expires 30d;
add_header Cache-Control "public, immutable";
add_header X-Content-Type-Options nosniff;
# WebP fallback for older browsers
location ~ \.(jpg|jpeg|png)$ {
try_files $uri$webp_suffix $uri =404;
}
}
```
## Maintenance
### Cleanup Orphaned Images
Remove images not referenced by any product:
```bash
# Run via admin CLI
python -m scripts.cleanup_orphaned_images --dry-run
python -m scripts.cleanup_orphaned_images --execute
```
### Regenerate Variants
If image quality settings change:
```bash
# Regenerate all variants for a vendor
python -m scripts.regenerate_images --vendor-id 123
# Regenerate all variants (use with caution)
python -m scripts.regenerate_images --all
```
## Monitoring
### Metrics to Track
- Total file count
- Storage used (GB)
- Files per directory (max)
- Upload success rate
- Average processing time
### Health Checks
The platform health page includes image storage metrics:
- Current file count
- Storage usage
- Directory distribution
- Processing queue status
## Troubleshooting
### Common Issues
**Upload fails with "File too large"**
- Check `IMAGE_MAX_SIZE_MB` setting
- Verify nginx `client_max_body_size`
**Images not displaying**
- Check file permissions (should be readable by web server)
- Verify URL paths match actual file locations
**Slow uploads**
- Check disk I/O performance
- Consider async processing queue
## Related Documentation
- [Capacity Planning](../architecture/capacity-planning.md)
- [Platform Health](platform-health.md)
- [Capacity Monitoring](capacity-monitoring.md)

View File

@@ -0,0 +1,92 @@
# Platform Health Monitoring
This guide covers the platform health monitoring features available in the admin dashboard.
## Overview
The Platform Health page (`/admin/platform-health`) provides real-time visibility into system performance, resource usage, and capacity thresholds.
## Accessing Platform Health
Navigate to **Admin > Platform Health** in the sidebar, or go directly to `/admin/platform-health`.
## Dashboard Sections
### 1. System Overview
Quick glance at overall platform status:
| Indicator | Green | Yellow | Red |
|-----------|-------|--------|-----|
| API Response Time | < 100ms | 100-500ms | > 500ms |
| Error Rate | < 0.1% | 0.1-1% | > 1% |
| Database Health | Connected | Slow queries | Disconnected |
| Storage | < 70% | 70-85% | > 85% |
### 2. Resource Usage
Real-time metrics:
- **CPU Usage**: Current and 24h average
- **Memory Usage**: Used vs available
- **Disk Usage**: Storage consumption with trend
- **Network**: Inbound/outbound throughput
### 3. Capacity Metrics
Track growth toward scaling thresholds:
- **Total Products**: Count across all vendors
- **Total Images**: Files stored in image system
- **Database Size**: Current size vs recommended max
- **Active Clients**: Monthly active vendor accounts
### 4. Performance Trends
Historical charts (7-day, 30-day):
- API response times (p50, p95, p99)
- Request volume by endpoint
- Database query latency
- Error rate over time
## Alert Configuration
### Threshold Alerts
Configure alerts for proactive monitoring:
```python
# In app/core/config.py
HEALTH_THRESHOLDS = {
"cpu_percent": {"warning": 70, "critical": 85},
"memory_percent": {"warning": 75, "critical": 90},
"disk_percent": {"warning": 70, "critical": 85},
"response_time_ms": {"warning": 200, "critical": 500},
"error_rate_percent": {"warning": 1.0, "critical": 5.0},
}
```
### Notification Channels
Alerts can be sent via:
- Email to admin users
- Slack webhook (if configured)
- Dashboard notifications
## Related Pages
- [Capacity Monitoring](capacity-monitoring.md) - Detailed capacity metrics
- [Image Storage](image-storage.md) - Image system management
- [Capacity Planning](../architecture/capacity-planning.md) - Infrastructure sizing guide
## API Endpoints
The platform health page uses these admin API endpoints:
| Endpoint | Description |
|----------|-------------|
| `GET /api/v1/admin/platform/health` | Overall health status |
| `GET /api/v1/admin/platform/metrics` | Current metrics |
| `GET /api/v1/admin/platform/metrics/history` | Historical data |
| `GET /api/v1/admin/platform/capacity` | Capacity usage |