feat: add capacity planning docs, image upload system, and platform health monitoring

Documentation:
- Add comprehensive capacity planning guide (docs/architecture/capacity-planning.md)
- Add operations docs: platform-health, capacity-monitoring, image-storage
- Link pricing strategy to capacity planning documentation
- Update mkdocs.yml with new Operations section

Image Upload System:
- Add ImageService with WebP conversion and sharded directory structure
- Generate multiple size variants (original, 800px, 200px)
- Add storage stats endpoint for monitoring
- Add Pillow dependency for image processing

Platform Health Monitoring:
- Add /admin/platform-health page with real-time metrics
- Show CPU, memory, disk usage with progress bars
- Display capacity thresholds with status indicators
- Generate scaling recommendations automatically
- Determine infrastructure tier based on usage
- Add psutil dependency for system metrics

Admin UI:
- Add Capacity Monitor to Platform Health section in sidebar
- Create platform-health.html template with stats cards
- Create platform-health.js for Alpine.js state management

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2025-12-25 17:17:09 +01:00
parent b25d119899
commit dc7fb5ca19
16 changed files with 2352 additions and 0 deletions

View File

@@ -0,0 +1,454 @@
# Capacity Planning & Infrastructure Sizing
This document provides comprehensive capacity planning guidelines for the Wizamart platform, including resource requirements, scaling thresholds, and monitoring recommendations.
> **Related:** [Pricing Strategy](../marketing/pricing.md) for tier definitions and limits
---
## Tier Resource Allocations
Based on our [pricing tiers](../marketing/pricing.md), here are the expected resource requirements per client:
| Metric | Essential (€49) | Professional (€99) | Business (€199) | Enterprise (€399+) |
|--------|-----------------|--------------------|-----------------|--------------------|
| Products | 200 | 500 | 2,000 | Unlimited |
| Images per product | 3 | 5 | 8 | 10+ |
| Orders per month | 100 | 500 | 2,000 | Unlimited |
| SKU variants | 1.2x | 1.5x | 2x | 3x |
| Team members | 1 | 3 | 10 | Unlimited |
| API requests/day | 1,000 | 5,000 | 20,000 | Unlimited |
---
## Scale Projections
### Target: 1,000 Business Clients (€149/month tier)
This represents our primary growth target. Here's the infrastructure impact:
| Resource | Calculation | Total |
|----------|-------------|-------|
| **Products** | 1,000 clients × 500 products | **500,000** |
| **Product Translations** | 500,000 × 4 languages | **2,000,000 rows** |
| **Images (files)** | 500,000 × 5 images × 3 sizes | **7,500,000 files** |
| **Image Storage** | 7.5M files × 200KB avg | **1.5 TB** |
| **Database Size** | Products + translations + orders + indexes | **15-25 GB** |
| **Monthly Orders** | 1,000 clients × 300 orders | **300,000 orders** |
| **Order Items** | 300,000 × 2.5 avg items | **750,000 items/month** |
| **Monthly API Requests** | 1,000 × 10,000 req/day × 30 | **300M requests** |
### Multi-Tier Mix (Realistic Scenario)
More realistic distribution across tiers:
| Tier | Clients | Products Each | Total Products | Monthly Orders |
|------|---------|---------------|----------------|----------------|
| Essential | 500 | 100 | 50,000 | 50,000 |
| Professional | 300 | 300 | 90,000 | 150,000 |
| Business | 150 | 1,000 | 150,000 | 300,000 |
| Enterprise | 50 | 3,000 | 150,000 | 200,000 |
| **Total** | **1,000** | - | **440,000** | **700,000** |
---
## Server Sizing Recommendations
### Infrastructure Tiers
| Scale | Clients | vCPU | RAM | Storage | Database | Monthly Cost |
|-------|---------|------|-----|---------|----------|--------------|
| **Starter** | 1-50 | 2 | 4GB | 100GB SSD | SQLite | €20-40 |
| **Small** | 50-100 | 4 | 8GB | 250GB SSD | PostgreSQL | €60-100 |
| **Medium** | 100-300 | 4 | 16GB | 500GB SSD | PostgreSQL | €100-180 |
| **Large** | 300-500 | 8 | 32GB | 1TB SSD | PostgreSQL + Redis | €250-400 |
| **Scale** | 500-1000 | 16 | 64GB | 2TB SSD + CDN | PostgreSQL + Redis | €500-900 |
| **Enterprise** | 1000+ | 32+ | 128GB+ | 4TB+ + CDN | PostgreSQL cluster | €1,500+ |
### Recommended Configurations
#### Starter (1-50 clients)
```
Single Server Setup:
- Hetzner CX22 or similar (2 vCPU, 4GB RAM)
- 100GB SSD storage
- SQLite database
- nginx for static files + reverse proxy
- Estimated cost: €20-40/month
```
#### Small-Medium (50-300 clients)
```
Two-Server Setup:
- App Server: 4 vCPU, 8-16GB RAM
- Database: Managed PostgreSQL (basic tier)
- Storage: Local SSD + backup
- Optional: Redis for sessions/caching
- Estimated cost: €80-180/month
```
#### Large (300-1000 clients)
```
Multi-Component Setup:
- Load Balancer: nginx or cloud LB
- App Servers: 2-4 × (4 vCPU, 8GB RAM)
- Database: Managed PostgreSQL (production tier)
- Cache: Redis (managed or self-hosted)
- Storage: Object storage (S3-compatible) + CDN
- Estimated cost: €400-900/month
```
#### Enterprise (1000+ clients)
```
Full Production Setup:
- CDN: Cloudflare or similar
- Load Balancer: Cloud-native with health checks
- App Servers: 4-8 × (4 vCPU, 16GB RAM) with auto-scaling
- Database: PostgreSQL with read replicas
- Cache: Redis cluster
- Storage: S3 + CloudFront or equivalent
- Monitoring: Prometheus + Grafana
- Estimated cost: €1,500+/month
```
---
## Image Storage Architecture
### Capacity Calculations
| Image Size (optimized) | Files per 25GB | Files per 100GB | Files per 1TB |
|------------------------|----------------|-----------------|---------------|
| 100KB (thumbnails) | 250,000 | 1,000,000 | 10,000,000 |
| 200KB (web-ready) | 125,000 | 500,000 | 5,000,000 |
| 300KB (high quality) | 83,000 | 333,000 | 3,330,000 |
| 500KB (original) | 50,000 | 200,000 | 2,000,000 |
### Image Sizes Generated
Each uploaded image generates 3 variants:
| Variant | Dimensions | Typical Size | Use Case |
|---------|------------|--------------|----------|
| `thumb` | 200×200 | 10-20KB | List views, grids |
| `medium` | 800×800 | 80-150KB | Product cards, previews |
| `original` | As uploaded | 200-500KB | Detail views, zoom |
**Storage per product:** ~600KB (with 3 sizes for main image + 2 additional images)
### Directory Structure (Sharded)
To prevent filesystem performance degradation, images are stored in a sharded directory structure:
```
/uploads/
└── products/
├── 00/ # First 2 chars of hash
│ ├── 1a/ # Next 2 chars
│ │ ├── 001a2b3c_original.webp
│ │ ├── 001a2b3c_800.webp
│ │ └── 001a2b3c_200.webp
│ └── 2b/
│ └── ...
├── 01/
└── ...
```
This structure ensures:
- Maximum ~256 subdirectories per level
- ~16 files per leaf directory at 1M total images
- Fast filesystem operations even at scale
### Performance Thresholds
| Files per Directory | Performance | Required Action |
|---------------------|-------------|-----------------|
| < 10,000 | Excellent | None |
| 10,000 - 100,000 | Good | Monitor, plan sharding |
| 100,000 - 500,000 | Degraded | **Implement sharding** |
| > 500,000 | Poor | **Migrate to object storage** |
---
## Database Performance
### Table Size Guidelines
| Table | Rows | Query Time | Status |
|-------|------|------------|--------|
| < 10,000 | < 1ms | Excellent |
| 10,000 - 100,000 | 1-10ms | Good |
| 100,000 - 1,000,000 | 10-50ms | **Add indexes, optimize queries** |
| 1,000,000 - 10,000,000 | 50-200ms | **Consider partitioning** |
| > 10,000,000 | Variable | **Sharding or dedicated DB** |
### Critical Indexes
Ensure these indexes exist at scale:
```sql
-- Products
CREATE INDEX idx_product_vendor_active ON products(vendor_id, is_active);
CREATE INDEX idx_product_gtin ON products(gtin);
CREATE INDEX idx_product_vendor_sku ON products(vendor_id, vendor_sku);
-- Orders
CREATE INDEX idx_order_vendor_status ON orders(vendor_id, status);
CREATE INDEX idx_order_created ON orders(created_at DESC);
CREATE INDEX idx_order_customer ON orders(customer_id);
-- Inventory
CREATE INDEX idx_inventory_product_location ON inventory(product_id, warehouse, bin_location);
CREATE INDEX idx_inventory_vendor ON inventory(vendor_id);
```
### Database Size Estimates
| Component | Size per 100K Products | Size per 1M Products |
|-----------|------------------------|----------------------|
| Products table | 100 MB | 1 GB |
| Translations (4 langs) | 400 MB | 4 GB |
| Orders (1 year) | 500 MB | 5 GB |
| Order items | 200 MB | 2 GB |
| Inventory | 50 MB | 500 MB |
| Indexes | 300 MB | 3 GB |
| **Total** | **~1.5 GB** | **~15 GB** |
---
## Bandwidth & Network
### Monthly Bandwidth Estimates (1000 clients)
| Traffic Type | Calculation | Monthly Volume |
|--------------|-------------|----------------|
| Image views | 500K products × 10 views × 500KB | **2.5 TB** |
| API requests | 10K req/client/day × 1000 × 30 × 2KB | **600 GB** |
| Static assets | CSS/JS cached, minimal | **50 GB** |
| **Total Egress** | | **~3 TB/month** |
### Bandwidth Costs (Approximate)
| Provider | First 1TB | Additional per TB |
|----------|-----------|-------------------|
| Hetzner | Included | €1/TB |
| AWS | $90 | $85/TB |
| DigitalOcean | 1TB free | $10/TB |
| Cloudflare | Unlimited (CDN) | Free |
**Recommendation:** Use Cloudflare for image CDN to eliminate egress costs.
---
## Scaling Triggers & Thresholds
### When to Scale Up
| Metric | Warning | Critical | Action |
|--------|---------|----------|--------|
| CPU Usage | > 70% avg | > 85% avg | Add app server |
| Memory Usage | > 75% | > 90% | Upgrade RAM or add server |
| Disk Usage | > 70% | > 85% | Expand storage |
| DB Query Time (p95) | > 100ms | > 500ms | Optimize queries, add indexes |
| API Response Time (p95) | > 500ms | > 2000ms | Scale horizontally |
| DB Connections | > 80% max | > 95% max | Add connection pooling |
| Error Rate | > 1% | > 5% | Investigate and fix |
### Architecture Transition Points
```
STARTER → SMALL (50 clients)
├── Trigger: SQLite becomes bottleneck
├── Action: Migrate to PostgreSQL
└── Cost increase: +€40-60/month
SMALL → MEDIUM (100 clients)
├── Trigger: Single server at 70%+ CPU
├── Action: Separate DB server
└── Cost increase: +€50-80/month
MEDIUM → LARGE (300 clients)
├── Trigger: Need for caching, higher availability
├── Action: Add Redis, consider multiple app servers
└── Cost increase: +€150-200/month
LARGE → SCALE (500 clients)
├── Trigger: Storage >500GB, high traffic
├── Action: Object storage + CDN, load balancing
└── Cost increase: +€200-400/month
SCALE → ENTERPRISE (1000+ clients)
├── Trigger: High availability requirements, SLA
├── Action: Full redundancy, read replicas, auto-scaling
└── Cost increase: +€600-1000/month
```
---
## Monitoring Requirements
### Essential Metrics
Track these metrics for capacity planning:
#### Infrastructure
- CPU utilization (per server)
- Memory utilization
- Disk I/O and usage
- Network throughput
#### Application
- Request latency (p50, p95, p99)
- Request rate (per endpoint)
- Error rate by type
- Active sessions
#### Database
- Query execution time
- Connection pool usage
- Table sizes
- Index usage
#### Business
- Active clients
- Products per client
- Orders per day
- API calls per client
### Monitoring Dashboard
The admin platform includes a **Capacity Monitoring** page at `/admin/platform-health` with:
1. **Current Usage** - Real-time resource utilization
2. **Growth Trends** - Historical charts for planning
3. **Threshold Alerts** - Warning and critical indicators
4. **Scaling Recommendations** - Automated suggestions
See [Platform Health Monitoring](#platform-health-monitoring) section below.
---
## Cost Analysis
### Infrastructure Cost per Client
| Scale | Clients | Monthly Infra | Cost/Client |
|-------|---------|---------------|-------------|
| Starter | 25 | €30 | €1.20 |
| Small | 75 | €80 | €1.07 |
| Medium | 200 | €150 | €0.75 |
| Large | 400 | €350 | €0.88 |
| Scale | 800 | €700 | €0.88 |
| Enterprise | 1500 | €1,800 | €1.20 |
### Revenue vs Infrastructure Cost
At 1,000 Business tier clients (€149/month):
| Item | Monthly |
|------|---------|
| **Revenue** | €149,000 |
| Infrastructure | €700-900 |
| Support (est.) | €3,000 |
| Development (est.) | €5,000 |
| **Gross Margin** | **~96%** |
---
## Disaster Recovery
### Backup Strategy by Scale
| Scale | Database Backup | File Backup | RTO | RPO |
|-------|----------------|-------------|-----|-----|
| Starter | Daily SQLite copy | Daily rsync | 4h | 24h |
| Small | Daily pg_dump | Daily sync | 2h | 12h |
| Medium | Managed backups | S3 versioning | 1h | 6h |
| Large | Point-in-time | S3 + cross-region | 30m | 1h |
| Enterprise | Streaming replicas | Multi-region | 5m | 5m |
---
## Platform Health Monitoring
The admin dashboard includes a dedicated capacity monitoring page that tracks:
### Metrics Displayed
1. **Client Growth**
- Total active clients
- New clients this month
- Churn rate
2. **Resource Usage**
- Total products across all vendors
- Total images stored
- Database size
- Storage usage
3. **Performance Indicators**
- Average API response time
- Database query latency
- Error rate
4. **Threshold Status**
- Current infrastructure tier
- Distance to next threshold
- Recommended actions
### Alert Configuration
Configure alerts for proactive scaling:
```python
CAPACITY_THRESHOLDS = {
"products_total": {
"warning": 400_000, # 80% of 500K
"critical": 475_000, # 95% of 500K
},
"storage_gb": {
"warning": 800, # 80% of 1TB
"critical": 950,
},
"db_size_gb": {
"warning": 20,
"critical": 24,
},
"avg_response_ms": {
"warning": 200,
"critical": 500,
},
}
```
---
## Quick Reference
### TL;DR Sizing Guide
| Clients | Server | RAM | Storage | Database | Monthly Cost |
|---------|--------|-----|---------|----------|--------------|
| 1-50 | 2 vCPU | 4GB | 100GB | SQLite | €30 |
| 50-100 | 4 vCPU | 8GB | 250GB | PostgreSQL | €80 |
| 100-300 | 4 vCPU | 16GB | 500GB | PostgreSQL | €150 |
| 300-500 | 8 vCPU | 32GB | 1TB | PostgreSQL + Redis | €350 |
| 500-1000 | 16 vCPU | 64GB | 2TB + CDN | PostgreSQL + Redis | €700 |
| 1000+ | 32+ vCPU | 128GB+ | 4TB+ + CDN | PostgreSQL cluster | €1,500+ |
### Key Formulas
```
Storage (GB) = (Products × Images × 3 sizes × 200KB) / 1,000,000
DB Size (GB) = Products × 0.00003 + Orders × 0.00002
Bandwidth (TB/mo) = Products × Daily Views × 500KB × 30 / 1,000,000,000
```
---
## See Also
- [Pricing Strategy](../marketing/pricing.md) - Tier definitions and limits
- [Multi-Tenant Architecture](multi-tenant.md) - How client isolation works
- [Background Tasks](background-tasks.md) - Task queue scaling
- [Production Deployment](../deployment/production.md) - Deployment guidelines