docs(hetzner): record 25/465 egress block + mail1 SMTP setup (5h debug payback)
Some checks failed
CI / ruff (push) Successful in 18s
CI / validate (push) Has been cancelled
CI / dependency-scanning (push) Has been cancelled
CI / docs (push) Has been cancelled
CI / deploy (push) Has been cancelled
CI / pytest (push) Has been cancelled

Hetzner Cloud silently blocks outbound TCP 25 and 465 on every Cloud
Server. The block sits upstream of the VM — UFW and iptables look
completely clean — so it presents as a generic "connection times out"
that's easy to misdiagnose as a credential or DNS issue. Spent ~5 hours
on 2026-05-30 working through swaks/tcpdump/auth-backend hypotheses
before finding Hetzner's own docs that mention the policy.

Two doc additions:

- Step 4 (Firewall Configuration) gets a warning admonition right after
  the UFW status check. Explains the upstream nature of the block,
  gives the symptom signature (nc to 587 succeeds, nc to 465 silently
  times out), and includes the auto-approved unblock ticket template
  with sample text.

- Step 19.5 (Alertmanager SMTP) gets a "live prod uses
  mail1.myservices.hosting:465" callout reflecting the reality that
  the SendGrid setup documented in that section is no longer how this
  prod env is wired. The callout captures the actual smarthost config
  (with smtp_auth_password kept gitignored, only .example ships in
  repo), the two prerequisites (Hetzner unblock + implicit-TLS-aware
  smarthost port), and the redacted swaks verification command. The
  rest of §19.5 stays as a reference for greenfield deploys that
  prefer SendGrid.

Saves the next person from repeating the same hours-long detour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-30 19:54:10 +02:00
parent e44f5c0458
commit 1227567d08

View File

@@ -264,6 +264,41 @@ OpenSSH ALLOW Anywhere
443/tcp ALLOW Anywhere 443/tcp ALLOW Anywhere
``` ```
!!! warning "Hetzner Cloud blocks outbound TCP 25 and 465 by default"
Hetzner Cloud applies an **upstream egress block on TCP ports 25 and 465** to every Cloud Server, as their published anti-spam policy. This block sits *above* UFW/iptables on the VM — `ufw status` won't show it, and `iptables -L OUTPUT` looks completely clean. The symptom is that SYN packets to those ports simply time out at the network layer while every other port (including 587) works.
If your monitoring stack (Step 19) or any other service needs to send via port 465 (SMTPS / implicit TLS), you must request the unblock from Hetzner support:
1. **Test first** — confirm it's actually the Hetzner block, not something on your VM:
```bash
timeout 5 nc -4 -zv <mail-host> 465 # silent timeout → likely Hetzner upstream
timeout 5 nc -4 -zv <mail-host> 587 # succeeds → general egress is fine, only 465 is blocked
```
2. **Submit unblock request** via [console.hetzner.cloud](https://console.hetzner.cloud) → Support → New ticket. Hetzner's docs invite this explicitly: *"Outgoing traffic to ports 25 and 465 are blocked by default on all Cloud Servers. Send us a request to unblock these ports."*
Sample ticket text:
```
Hi,
Please unblock outbound TCP port 465 for my Cloud server:
Project: <project name>
Server: <server name>
IPv4: <server IPv4>
Reason: legitimate SMTP submission via my mail provider's documented
SMTPS endpoint. Confirmed via UFW, iptables, nftables, and Hetzner
Cloud Firewall that no rule on my side blocks the port; the block
is upstream.
Volume: monitoring alert emails, ~10/day.
Thanks.
```
Hetzner usually auto-approves within minutes for legitimate use cases.
Real prod incident this caused: 5 hours of "is my SMTP password wrong?" debugging on 2026-05-30 before discovering the egress block. Don't repeat that — if you see a port-465 connection time out from a Cloud Server, suspect the upstream block first.
## Step 5: Harden SSH ## Step 5: Harden SSH
!!! warning "Before doing this step" !!! warning "Before doing this step"
@@ -1631,6 +1666,40 @@ Alertmanager needs SMTP to send email notifications. SendGrid handles both trans
**Free trial**: 100 emails/day for 60 days. Covers alerting + transactional emails through launch. After 60 days, upgrade to a paid plan (Essentials starts at ~$20/mo for 50K emails/mo). **Free trial**: 100 emails/day for 60 days. Covers alerting + transactional emails through launch. After 60 days, upgrade to a paid plan (Essentials starts at ~$20/mo for 50K emails/mo).
!!! info "Live prod uses mail1.myservices.hosting:465, not SendGrid"
The current prod env migrated away from SendGrid to the mailbox-hosting provider's SMTP relay (`mail1.myservices.hosting`) earlier in 2026. Both the app's `/admin/settings` SMTP block and `monitoring/alertmanager/alertmanager.yml` point at it. The SendGrid steps in this section are kept as a working reference for greenfield deploys; if you're rehydrating the existing prod, use the mailbox-hosting setup instead.
Quick summary of the live alertmanager SMTP block (don't commit the real password — `alertmanager.yml` is gitignored, only `.example` ships in repo):
```yaml
global:
smtp_smarthost: 'mail1.myservices.hosting:465' # implicit TLS, not 587
smtp_from: 'alerts@wizard.lu'
smtp_auth_username: 'support@wizard.lu'
smtp_auth_password: '<from /admin/settings SMTP block>'
smtp_require_tls: true
```
Two prerequisites for this to work:
1. **Hetzner outbound TCP 465 must be unblocked** (see warning in Step 4 — Cloud Servers block 25 and 465 by default; submit a one-paragraph ticket to lift it, auto-approved in minutes).
2. **Port 465 = implicit TLS** (TLS-on-connect, not STARTTLS). Alertmanager's email integration handles this natively when the smarthost port is `465`; you only need `smtp_require_tls: true`, no extra `smtp_tls_config` block.
Verification with swaks (redacts the credential automatically):
```bash
swaks --to admin@wizard.lu \
--from alerts@wizard.lu \
--server mail1.myservices.hosting:465 \
--auth PLAIN \
--auth-user support@wizard.lu \
--tls-on-connect \
--header "Subject: smoke test" \
2>&1 | sed -E 's/^( ~> [A-Za-z0-9+\/=]{12,})$/ ~> [REDACTED]/'
```
Expected: `235 Authentication successful` then `250 2.0.0 Ok: queued`. If you see `535 Authentication failed: The provided authorization grant is invalid, expired, or revoked` on port **587**, that's the provider's PLAIN backend being OAuth-wired — switch to port 465 instead, which routes through the password backend.
**1. Create SendGrid account:** **1. Create SendGrid account:**
1. Sign up at [sendgrid.com](https://sendgrid.com/) (free plan) 1. Sign up at [sendgrid.com](https://sendgrid.com/) (free plan)