Documentation: - Add comprehensive capacity planning guide (docs/architecture/capacity-planning.md) - Add operations docs: platform-health, capacity-monitoring, image-storage - Link pricing strategy to capacity planning documentation - Update mkdocs.yml with new Operations section Image Upload System: - Add ImageService with WebP conversion and sharded directory structure - Generate multiple size variants (original, 800px, 200px) - Add storage stats endpoint for monitoring - Add Pillow dependency for image processing Platform Health Monitoring: - Add /admin/platform-health page with real-time metrics - Show CPU, memory, disk usage with progress bars - Display capacity thresholds with status indicators - Generate scaling recommendations automatically - Determine infrastructure tier based on usage - Add psutil dependency for system metrics Admin UI: - Add Capacity Monitor to Platform Health section in sidebar - Create platform-health.html template with stats cards - Create platform-health.js for Alpine.js state management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
93 lines
2.6 KiB
Markdown
93 lines
2.6 KiB
Markdown
# Platform Health Monitoring
|
|
|
|
This guide covers the platform health monitoring features available in the admin dashboard.
|
|
|
|
## Overview
|
|
|
|
The Platform Health page (`/admin/platform-health`) provides real-time visibility into system performance, resource usage, and capacity thresholds.
|
|
|
|
## Accessing Platform Health
|
|
|
|
Navigate to **Admin > Platform Health** in the sidebar, or go directly to `/admin/platform-health`.
|
|
|
|
## Dashboard Sections
|
|
|
|
### 1. System Overview
|
|
|
|
Quick glance at overall platform status:
|
|
|
|
| Indicator | Green | Yellow | Red |
|
|
|-----------|-------|--------|-----|
|
|
| API Response Time | < 100ms | 100-500ms | > 500ms |
|
|
| Error Rate | < 0.1% | 0.1-1% | > 1% |
|
|
| Database Health | Connected | Slow queries | Disconnected |
|
|
| Storage | < 70% | 70-85% | > 85% |
|
|
|
|
### 2. Resource Usage
|
|
|
|
Real-time metrics:
|
|
|
|
- **CPU Usage**: Current and 24h average
|
|
- **Memory Usage**: Used vs available
|
|
- **Disk Usage**: Storage consumption with trend
|
|
- **Network**: Inbound/outbound throughput
|
|
|
|
### 3. Capacity Metrics
|
|
|
|
Track growth toward scaling thresholds:
|
|
|
|
- **Total Products**: Count across all vendors
|
|
- **Total Images**: Files stored in image system
|
|
- **Database Size**: Current size vs recommended max
|
|
- **Active Clients**: Monthly active vendor accounts
|
|
|
|
### 4. Performance Trends
|
|
|
|
Historical charts (7-day, 30-day):
|
|
|
|
- API response times (p50, p95, p99)
|
|
- Request volume by endpoint
|
|
- Database query latency
|
|
- Error rate over time
|
|
|
|
## Alert Configuration
|
|
|
|
### Threshold Alerts
|
|
|
|
Configure alerts for proactive monitoring:
|
|
|
|
```python
|
|
# In app/core/config.py
|
|
HEALTH_THRESHOLDS = {
|
|
"cpu_percent": {"warning": 70, "critical": 85},
|
|
"memory_percent": {"warning": 75, "critical": 90},
|
|
"disk_percent": {"warning": 70, "critical": 85},
|
|
"response_time_ms": {"warning": 200, "critical": 500},
|
|
"error_rate_percent": {"warning": 1.0, "critical": 5.0},
|
|
}
|
|
```
|
|
|
|
### Notification Channels
|
|
|
|
Alerts can be sent via:
|
|
- Email to admin users
|
|
- Slack webhook (if configured)
|
|
- Dashboard notifications
|
|
|
|
## Related Pages
|
|
|
|
- [Capacity Monitoring](capacity-monitoring.md) - Detailed capacity metrics
|
|
- [Image Storage](image-storage.md) - Image system management
|
|
- [Capacity Planning](../architecture/capacity-planning.md) - Infrastructure sizing guide
|
|
|
|
## API Endpoints
|
|
|
|
The platform health page uses these admin API endpoints:
|
|
|
|
| Endpoint | Description |
|
|
|----------|-------------|
|
|
| `GET /api/v1/admin/platform/health` | Overall health status |
|
|
| `GET /api/v1/admin/platform/metrics` | Current metrics |
|
|
| `GET /api/v1/admin/platform/metrics/history` | Historical data |
|
|
| `GET /api/v1/admin/platform/capacity` | Capacity usage |
|