HydraPipeline

HydraGuard Runbook

WireGuard mesh connecting venues, air units, and cloud infrastructure to the Hydra platform.

Infrastructure

Resource Value
Hub server 141.227.136.199 (OVHcloud b3-16, Brussels EU-WEST-LZ-BRU-A)
Hub DNS hydraguard.experiencenet.com
Hub WG address 10.10.0.1/24
Hub public key VGA6ETZB2XFVRRb5KmcFvQ+Ybfh9KKfcWuXfP1IuvQE=
WireGuard port 51820/udp
Mesh file /root/.hydraguard/mesh.yaml
Hub private key /etc/wireguard/hub.key
WG config (generated) /etc/wireguard/wg0.conf
API server http://hydraguard.experiencenet.com:8081
API config /root/.hydraguard/api.yaml
Requests store /root/.hydraguard/requests.yaml
Audit log /root/.hydraguard/audit.log
Service (API) systemctl status hydraguard
Service (WireGuard) systemctl status wg-quick@wg0
Logs journalctl -u hydraguard -f
SSH ubuntu@141.227.136.12
Shared with hydraneckwebrtc controller + worker on the same instance

API warning: Keep the HydraGuard API service stopped (systemctl stop hydraguard) when not actively enrolling new peers. The auto_apply: true setting causes /api/v1/air/provision to regenerate peer keypairs on every call, breaking existing connections. See issue #64.

Previous hub (pending decommission)

Resource Value
Server 89.167.57.232 (Hetzner cx23, Falkenstein)
Status WireGuard stopped, server still running
Action Decommission after all peers validated on Brussels

Current Mesh

Peer Type WG Address LAN Guard Notes
AD6 venue 10.10.1.1/32 10.0.0.0/24 omada Overijse
air-001 air 10.10.100.1/32 -- --
air-tvl-one air 10.10.100.2/32 -- --
air-cederiks24 air 10.10.100.3/32 -- --
air-hydraneckwebrtc air 10.10.100.4/32 -- -- Old Hetzner neckwebrtc (retired)
air-hydra-0000 air 10.10.100.5/32 -- --
air-sneaky-squid-86 air 10.10.100.6/32 -- -- bxl1 body
air-boom-pickle-38 air 10.10.100.7/32 -- -- bxl1 body
air-wobbly-llama-92 air 10.10.100.8/32 -- -- bxl1 body

Address Scheme

Type WG tunnel range LAN range Capacity
Hub 10.10.0.1/24 -- 1
Venues 10.10.1-49.1/32 10.0.X.0/24 (auto or custom) 49
Neck Air 10.10.50-99.1/32 10.0.X.0/24 50
Hydra Air 10.10.100.1-254/32 -- (no LAN) 254

SSH Access

ssh ubuntu@141.227.136.12

Health Check

curl -s http://hydraguard.experiencenet.com:8081/api/v1/health

Operations

All commands run on the hub server unless stated otherwise.

Check status

hydraguard status

Example output:

PEER               TYPE             ADDRESS         HANDSHAKE       TRANSFER
AD6                venue/omada      10.10.1.1       12s ago         4.86 KiB / 3.78 KiB
air-hydraneckwebrtc air              10.10.100.4     53s ago         4.41 KiB / 2.48 KiB
air-001            air              10.10.100.1     -- (offline)    0 / 0

Raw WireGuard status

wg show wg0

Shows endpoints, allowed IPs, transfer bytes, and last handshake per peer.

View logs

journalctl -u hydraguard -f              # Follow live
journalctl -u hydraguard -n 100 --no-pager  # Last 100 lines

Restart

systemctl restart hydraguard

Update

hydraguard check-update    # Check if a new version is available
hydraguard update          # Download and install the latest version

Never manually deploy. Always use the release pipeline (tag + push to trigger CI).


Adding Peers

Every add command:

  1. Generates a WireGuard keypair
  2. Stores the public key in mesh.yaml
  3. Prints the private key to stdout (save it, it is only shown once)
  4. Auto-assigns the next available address

After adding any peer, always run hydraguard apply.

Add a Venue

hydraguard venue add <name> --location <city> --guard <omada|citymesh|linuxvm|gateway> [--lan <cidr>]
hydraguard apply

Guard types:

Guard type Use case Notes Full procedure
omada TP-Link Omada ER605/ER7212 Configured via Omada SDN Controller API omada-venue.md
citymesh Citymesh Guard (Mikrotik) Bare WireGuard config citymesh-venue.md
linuxvm Linux VM gateway (Azure/GCP/AWS) Adds PostUp for IP forwarding and masquerade --
gateway On-prem LAN gateway (behind FortiGate) iptables -I FORWARD 1 (priority), masquerade, MSS clamping --

Add a Hydra Air Unit

Standalone render nodes with WireGuard running directly on Windows.

hydraguard air add <id>
hydraguard apply
hydraguard air config <id>    # Get Windows .conf

Add a Neck Air Unit

Mobile venue-in-a-box setups with a Mikrotik router.

hydraguard neckair add <id>
hydraguard apply
hydraguard neckair config <id>    # Get Mikrotik .conf

Get a peer config

hydraguard venue config <name>
hydraguard air config <id>
hydraguard neckair config <id>

Removing peers

hydraguard venue remove <name> && hydraguard apply
hydraguard air remove <id> && hydraguard apply
hydraguard neckair remove <id> && hydraguard apply

The peer is instantly unreachable after apply.


Applying Changes

hydraguard apply

This regenerates /etc/wireguard/wg0.conf and runs wg syncconf to hot-reload. Existing connections are not disrupted. If wg0 is not up, it runs wg-quick up wg0 instead.

Full restart (when syncconf is not enough)

wg-quick down wg0
wg-quick up wg0

After a full restart, peers behind NAT need up to 25 seconds to re-establish their handshake (PersistentKeepalive interval).


Self-Registration API

Peers can register themselves via the HTTP API instead of requiring SSH access.

Workflow

  1. Client generates a WireGuard keypair locally
  2. Client submits public key via POST /api/v1/register (requires API bearer token)
  3. Request appears as "pending" in requests.yaml
  4. Admin reviews and approves via CLI
  5. Client polls for approval, then fetches its WireGuard config

Managing requests

hydraguard requests list              # Show pending
hydraguard requests list --all        # Show all (including approved/denied)
hydraguard requests approve <id>      # Approve, adds peer to mesh
hydraguard requests deny <id>
hydraguard requests delete <id>

When --auto-apply is enabled, the hub config is automatically updated after approval.


Automatic backups of mesh.yaml

Every mutation that writes mesh.yaml (CLI commands like venue add, air add, neckair add, requests approve, and any other path that calls Mesh.Save) snapshots the previous file into /root/.hydraguard/backups/mesh-YYYYMMDD-HHMMSS.uuuuuu.yaml first. The most recent 50 backups are kept; older entries are pruned automatically.

This makes a botched edit (manual or otherwise) recoverable with a one-line cp:

ls -lt /root/.hydraguard/backups/ | head
cp /root/.hydraguard/backups/mesh-<timestamp>.yaml /root/.hydraguard/mesh.yaml
hydraguard apply

The backup is taken before the new file is written, so even if Save fails partway, the prior state is already preserved. Failure to write the backup aborts the save — better to refuse the mutation than overwrite without a rollback path.

Backup

The only critical file is mesh.yaml. Back it up:

cp ~/.hydraguard/mesh.yaml ~/.hydraguard/mesh.yaml.bak
scp ubuntu@141.227.136.12:~/.hydraguard/mesh.yaml ./mesh-backup-$(date +%Y%m%d).yaml

The private key at /etc/wireguard/hub.key should also be backed up securely. If lost, you need to regenerate it and update all peer configs with the new public key.

hydrabackup also backs up the mesh.yaml to hydramirror automatically.


Troubleshooting

Peer shows "offline" / no handshake

  1. Check firewall on hub: ufw status — port 51820/udp must be open
  2. Check peer's internet: Can the peer reach the internet?
  3. Verify keys match: The peer's config must have the hub's public key, and the hub's mesh.yaml must have the peer's public key
  4. Check PersistentKeepalive: Must be 25 in peer configs (HydraGuard sets this automatically)
  5. Check endpoint: Peer config should have Endpoint = hydraguard.experiencenet.com:51820

Body not connecting after hub IP change

After migrating the hub to a new server/IP, bodies may fail to connect even though DNS is updated. Three common causes:

1. Cached WG endpoint. WireGuard resolves the endpoint hostname once at interface startup and caches the IP. After a hub IP change, bodies keep trying the old IP.

# Check current endpoint on body (via hydracluster exec):
wg show hydraguard-air endpoints
# Update to new IP:
wg set hydraguard-air peer <HUB_PUBLIC_KEY> endpoint <NEW_IP>:51820
# Trigger handshake:
ping -n 3 10.10.0.1

2. Public key mismatch. HydraGuard Air may regenerate its keypair (e.g., after reinstall or update). The hub silently rejects handshakes from unknown keys — no error in logs.

# Check body's actual public key (via hydracluster exec):
wg show hydraguard-air
# Compare with hub:
wg show wg0 | grep -A1 "<body AllowedIPs>"
# If mismatch, update hub:
sudo wg set wg0 peer <OLD_KEY> remove
sudo wg set wg0 peer <BODY_ACTUAL_KEY> allowed-ips <BODY_WG_IP>/32
# Persist change in /etc/wireguard/wg0.conf

3. DNS not propagated. Check with nslookup hydraguard.experiencenet.com on the body. DNS is managed via Hetzner DNS API:

export HCLOUD_CONTEXT=hydraexperiencenet
hcloud zone rrset list 788422 --type A | grep hydraguard
# Update: delete + create with new IP
hcloud zone rrset delete 788422 hydraguard A
hcloud zone rrset create --name hydraguard --type A --record <NEW_IP> --ttl 300 788422

Handshake works but no data flows

This happens when the WireGuard tunnel negotiates successfully but actual traffic (pings, connections) does not pass through. Common causes:

  1. UFW blocking FORWARD chain on hub. The wg-quick PostUp rule must insert (not append) the FORWARD rule before UFW's default DROP:

    # Check current FORWARD chain
    iptables -L FORWARD -n | head -5
    # If the wg0 ACCEPT rule is after ufw-reject-forward, fix it:
    iptables -I FORWARD 1 -i wg0 -o wg0 -j ACCEPT
    

    The generated wg0.conf uses iptables -I FORWARD 1 to avoid this. If you see -A FORWARD in the conf, update it.

  2. Peer behind NAT took too long to re-handshake. After a hub wg-quick down/up, peers behind NAT must re-initiate. Wait 25 seconds for the keepalive. Check:

    wg show wg0 | grep -A5 "endpoint"
    

    If the peer has an endpoint but "latest handshake" is blank, the peer hasn't sent a keepalive yet.

  3. Routing table missing. After wg-quick down/up, verify routes exist:

    ip route show dev wg0
    

    Should show routes for each peer's AllowedIPs. hydraguard apply automatically syncs kernel routes after wg syncconf, but if routes are still missing, run:

    wg-quick down wg0 && wg-quick up wg0
    

Can't reach a venue's LAN devices

Test connectivity step by step:

ping 10.10.X.1    # 1. VPN box tunnel address (WG layer)
ping 10.0.X.1     # 2. VPN box LAN gateway (routing through VPN box)
ping 10.0.X.100   # 3. A device on the LAN

If step 1 works but step 2/3 fails:

If step 1 fails:

Bodies (Windows render nodes) unreachable via ping but online

Windows Firewall blocks ICMP by default. The bodies may be online and functional even if ping fails. Verify by:

Inter-peer traffic not forwarding (e.g., hydraneckwebrtc cannot reach venue LAN)

Traffic between two WG peers (e.g., hydraneckwebrtc at 10.10.100.4 reaching AD6 LAN at 10.0.0.0/24) must be forwarded by the hub. Check:

  1. IP forwarding enabled: cat /proc/sys/net/ipv4/ip_forward (must be 1)
  2. iptables FORWARD rule: iptables -L FORWARD -n | head -3 -- the ACCEPT rule for wg0 must be before any DROP/REJECT
  3. Both peers connected: Both the source peer and the destination venue must have active handshakes

WireGuard interface won't come up

ip link show wg0           # Check if interface exists
wg-quick strip wg0         # Check config syntax
journalctl -u wg-quick@wg0 # Check logs

DNS not resolving

dig +short hydraguard.experiencenet.com @8.8.8.8

If DNS doesn't resolve, check the A record in Hetzner DNS (zone 788422).

mesh.yaml out of sync with wg0.conf

hydraguard apply    # Regenerates wg0.conf from mesh.yaml and syncs

Full reset

wg-quick down wg0
rm /etc/wireguard/wg0.conf
hydraguard apply

Releasing

git tag v1.1.0
git push origin v1.1.0

This triggers CI which builds binaries for linux/darwin x amd64/arm64 and publishes them as a GitHub Release. The hub picks up new versions via hydraguard update.