Webhook Receiver Design¶

Status: Draft Date: 2026-02-15 Scope: Internet-facing webhook ingress for TriOnyx agents via Cloudflare Tunnel

Problem¶

TriOnyx agents need to receive events from external services (GitHub, Slack, monitoring tools, CI systems, etc.) via webhooks. The current POST /webhooks/:agent_name endpoint has no authentication — anyone who knows the agent name can trigger it. We need a secure webhook receiver that:

Is safe to expose directly to the internet (behind Cloudflare Tunnel)
Uniquely identifies each webhook entry point
Binds each entry point to one or more agents
Authenticates senders without requiring complex integrations
Maintains TriOnyx's taint-by-default posture for untrusted input

Architecture Overview¶

External Service (GitHub, Slack, etc.)
    │
    │  POST https://<tunnel>.cfargotunnel.com/hooks/<endpoint_id>
    │  X-Webhook-Signature: sha256=<hmac_hex>
    │  X-Webhook-Timestamp: <unix_epoch>
    │
    ▼
┌──────────────────┐
│  Cloudflare      │  TLS termination, DDoS protection, IP filtering
│  Tunnel          │  (transport layer — NOT an auth layer)
└──────────────────┘
    │
    ▼
┌──────────────────────────────────────────────────────────────┐
│  WebhookReceiver Plug Pipeline                               │
│                                                              │
│  1. Rate limiter (per endpoint_id, per source IP)            │
│  2. Path lookup: endpoint_id → WebhookEndpoint config        │
│  3. HMAC signature verification (X-Webhook-Signature)        │
│  4. Timestamp validation (replay window)                     │
│  5. Payload size + JSON validation                           │
│  6. Dispatch to bound agent(s) via TriggerRouter             │
└──────────────────────────────────────────────────────────────┘
    │
    ▼
┌──────────────────┐
│  TriggerRouter   │  Existing dispatch — spawns/routes to AgentSession
└──────────────────┘
    │
    ▼
┌──────────────────┐
│  AgentSession    │  Tainted immediately (webhook = untrusted)
└──────────────────┘

Security Model: Defense in Depth¶

The webhook receiver uses four layers of defense. No single layer is the sole gatekeeper — compromise of one layer does not grant access.

Layer 1: Cloudflare Tunnel (Transport)¶

TLS termination — payloads encrypted in transit
DDoS mitigation and bot filtering
The gateway never binds to a public IP; Cloudflare Tunnel dials out
Optional: Cloudflare WAF rules to block non-POST, wrong content-type, etc.
Not an authentication layer — provides transport security only

Layer 2: Unguessable Endpoint ID (Path Token)¶

Each webhook endpoint gets a random, unguessable identifier:

POST /hooks/whk_7f3a9b2c4e1d8f6a5b0c3e7d9f1a2b4c

128-bit random hex (32 chars) prefixed with whk_ for identification
Acts as a first-line filter: scanners, bots, and accidental requests are rejected before any crypto is performed
NOT sufficient as sole authentication (URLs appear in logs, monitoring, error messages, Cloudflare analytics, Referer headers, etc.)
Cheap to validate: O(1) ETS/map lookup

Layer 3: HMAC Signature Verification (Authentication)¶

The real authentication layer. Each endpoint has a signing secret; the sender must include a signature header computed over the request body:

X-Webhook-Signature: sha256=<hex(HMAC-SHA256(signing_secret, body))>

Why HMAC over path-only auth:

Concern	Path token only	HMAC signature
Secret in logs?	Yes (URL logged everywhere)	No (secret never sent)
Payload integrity?	No	Yes (body is signed)
Replay protection?	No	Yes (with timestamp)
Rotation?	Downtime required	Dual-secret window
Sender impersonation?	Easy (copy URL)	Requires secret

Verification algorithm:

1. Extract X-Webhook-Signature header → "sha256=<received_hex>"
2. Extract X-Webhook-Timestamp header → timestamp_str
3. Reject if timestamp is outside ±5 minute window (replay protection)
4. Compute: expected = hex(HMAC-SHA256(signing_secret, timestamp_str <> "." <> raw_body))
5. Constant-time compare received_hex vs expected
6. Reject if mismatch

The timestamp is included in the signed material to prevent replay attacks — a captured request cannot be replayed after the window expires.

Compatibility note: Many webhook providers (GitHub, Stripe, Slack) send their own signature headers. The receiver should support provider-specific verification modes alongside the default TriOnyx scheme:

Provider	Header	Algorithm
Default	`X-Webhook-Signature`	HMAC-SHA256 + timestamp
GitHub	`X-Hub-Signature-256`	HMAC-SHA256 of body
Stripe	`Stripe-Signature`	HMAC-SHA256 + timestamp
Slack	`X-Slack-Signature`	HMAC-SHA256 + timestamp
None	(skip verification)	Path token only

The None mode exists for providers that don't support signing. In this mode, the path token becomes the sole authentication — the endpoint should be flagged as reduced-security in the audit log and the operator should be warned at registration time.

Layer 4: Rate Limiting¶

Per-endpoint, per-source-IP rate limiting to bound abuse even with valid credentials:

Default: 60 requests/minute per endpoint per source IP
Configurable per endpoint
Uses a token bucket algorithm (GenServer or ETS-based)
Returns 429 Too Many Requests with Retry-After header

Data Model¶

WebhookEndpoint¶

A webhook endpoint is a persistent configuration object stored in the gateway.

defmodule TriOnyx.WebhookEndpoint do
  @type t :: %__MODULE__{
    id: String.t(),                    # "whk_<32 hex chars>"
    label: String.t(),                 # Human-readable name, e.g. "github-push"
    agents: [String.t()],             # Bound agent names (fan-out)
    signing_secret: String.t(),       # HMAC signing secret (generated)
    signing_mode: signing_mode(),     # :default | :github | :stripe | :slack | :none
    enabled: boolean(),               # Soft disable without deleting
    rate_limit: pos_integer(),        # Requests per minute per source IP
    allowed_ips: [String.t()] | nil,  # Optional IP allowlist (nil = any)
    created_at: DateTime.t(),
    rotated_at: DateTime.t() | nil,   # Last secret rotation
    previous_secret: String.t() | nil # Old secret during rotation window
  }

  @type signing_mode :: :default | :github | :stripe | :slack | :none
end

Storage¶

Webhook endpoints are stored in a JSON file at ~/.tri-onyx/webhooks.json, loaded into an ETS table at startup by a WebhookRegistry GenServer. This mirrors the pattern used by AuditLog for file-based persistence.

[
  {
    "id": "whk_7f3a9b2c4e1d8f6a5b0c3e7d9f1a2b4c",
    "label": "github-push",
    "agents": ["code-reviewer"],
    "signing_secret": "<encrypted or raw — see Key Management>",
    "signing_mode": "github",
    "enabled": true,
    "rate_limit": 60,
    "allowed_ips": null,
    "created_at": "2026-02-15T12:00:00Z"
  }
]

Key Management¶

Signing secrets should be generated with :crypto.strong_rand_bytes(32) and stored as hex. For the initial implementation, secrets are stored in plaintext in the webhooks.json file (the file should be permission-restricted to 0600). A future iteration can encrypt at rest using a master key derived from an environment variable.

API Endpoints¶

Webhook Ingress (Internet-Facing)¶

POST /hooks/:endpoint_id

This is the only endpoint exposed through the Cloudflare Tunnel. All other management endpoints remain on the local-only port 4000.

Request:

POST /hooks/whk_7f3a9b2c4e1d8f6a5b0c3e7d9f1a2b4c HTTP/1.1
Content-Type: application/json
X-Webhook-Signature: sha256=a1b2c3d4...
X-Webhook-Timestamp: 1739577600

{"event": "push", "ref": "refs/heads/main", ...}

Responses:

Status	Meaning
202	Accepted — dispatched to agent(s)
400	Invalid JSON or missing required headers
401	Invalid or missing signature
404	Unknown endpoint ID (no timing leak — constant)
408	Timestamp outside replay window
413	Payload too large (>1 MB)
429	Rate limit exceeded

Important: The 404 response for unknown endpoint IDs must use constant-time behavior — always perform the same amount of work regardless of whether the ID exists, to prevent endpoint enumeration via timing side-channels. In practice: look up the endpoint, if not found, still compute a dummy HMAC before returning.

Management Endpoints (Local Only)¶

These endpoints are served on the existing port 4000 (not exposed through the tunnel). They allow the operator to manage webhook endpoints.

GET    /webhook-endpoints                    # List all endpoints
POST   /webhook-endpoints                    # Create new endpoint
GET    /webhook-endpoints/:id                # Get endpoint details
PUT    /webhook-endpoints/:id                # Update endpoint
DELETE /webhook-endpoints/:id                # Delete endpoint
POST   /webhook-endpoints/:id/rotate-secret  # Rotate signing secret
GET    /webhook-endpoints/:id/deliveries     # Recent delivery log

Create Endpoint¶

POST /webhook-endpoints
Content-Type: application/json

{
  "label": "github-push",
  "agents": ["code-reviewer"],
  "signing_mode": "github",
  "rate_limit": 60,
  "allowed_ips": ["140.82.112.0/20"]
}

Response (201):

{
  "id": "whk_7f3a9b2c4e1d8f6a5b0c3e7d9f1a2b4c",
  "label": "github-push",
  "agents": ["code-reviewer"],
  "signing_secret": "e3b0c44298fc1c149afbf4c8996fb924...",
  "signing_mode": "github",
  "enabled": true,
  "rate_limit": 60,
  "webhook_url": "https://<tunnel>/hooks/whk_7f3a9b2c4e1d8f6a5b0c3e7d9f1a2b4c",
  "created_at": "2026-02-15T12:00:00Z"
}

The signing_secret is returned ONLY on creation and rotation. It is never returned in GET responses (write-once, display-once pattern).

Rotate Secret¶

POST /webhook-endpoints/whk_7f3a9b2c4e1d8f6a5b0c3e7d9f1a2b4c/rotate-secret

Response (200):

{
  "new_secret": "d7a8fbb307d7809469ca9abcb0082e4f...",
  "previous_secret_valid_until": "2026-02-15T13:00:00Z",
  "message": "Both old and new secrets will be accepted for 1 hour"
}

During the rotation window, both the old and new secrets are accepted. This allows the sender to be updated without downtime.

Elixir Module Structure¶

lib/tri_onyx/
├── webhook_endpoint.ex          # Struct + validation
├── webhook_registry.ex          # GenServer — ETS-backed endpoint store
├── webhook_receiver.ex          # Plug pipeline for /hooks/:id
├── webhook_signature.ex         # HMAC verification (multi-provider)
├── webhook_rate_limiter.ex      # Token bucket rate limiter
└── triggers/
    └── webhook.ex               # (existing — updated to accept endpoint metadata)

WebhookRegistry (GenServer)¶

Owns an ETS table (:webhook_endpoints, :set, read_concurrency: true)
Loads from ~/.tri-onyx/webhooks.json on init
Persists on every mutation (create/update/delete/rotate)
Added to the supervision tree after AuditLog, before TriggerRouter
Public API: lookup/1, create/1, update/2, delete/1, rotate_secret/1, list/0
lookup/1 is a direct ETS read (no GenServer call) for hot-path performance

WebhookReceiver (Plug)¶

The ingress pipeline, mounted in the Router at /hooks/:endpoint_id:

post "/hooks/:endpoint_id" do
  # 1. Lookup endpoint (ETS — no GenServer bottleneck)
  # 2. Check enabled
  # 3. Check rate limit
  # 4. Check IP allowlist (if configured)
  # 5. Verify signature (provider-specific)
  # 6. Validate payload (size, JSON)
  # 7. Fan-out dispatch to all bound agents via TriggerRouter
  # 8. Audit log the delivery
  # 9. Return 202
end

WebhookSignature¶

Pure module with verification functions per provider:

defmodule TriOnyx.WebhookSignature do
  @spec verify(signing_mode, secret, raw_body, headers) :: :ok | {:error, reason}

  # :default — X-Webhook-Signature + X-Webhook-Timestamp
  # :github  — X-Hub-Signature-256 (HMAC-SHA256 of body)
  # :stripe  — Stripe-Signature (HMAC-SHA256 + timestamp)
  # :slack   — X-Slack-Signature (HMAC-SHA256 + timestamp)
  # :none    — always passes (path token is sole auth)
end

All comparisons use :crypto.hash_equals/2 (constant-time).

Integration with Existing Systems¶

TriggerRouter¶

No changes needed to TriggerRouter.dispatch/2. The webhook receiver constructs the same trigger event shape that the current Webhook.handle/3 produces:

%{
  type: :webhook,
  agent_name: agent_name,
  payload: body,
  metadata: %{
    endpoint_id: endpoint.id,
    endpoint_label: endpoint.label,
    signing_mode: endpoint.signing_mode,
    source_ip: source_ip,
    received_at: DateTime.utc_now() |> DateTime.to_iso8601(),
    content_type: "application/json"
  }
}

For fan-out (one endpoint bound to multiple agents), the receiver dispatches one event per agent. Each agent gets its own session, its own taint status.

InformationClassifier¶

No changes needed. Webhook triggers already classify as high taint. The metadata now carries richer context (endpoint_id, source_ip) for audit purposes, but the taint classification is unchanged.

AuditLog¶

Webhook deliveries are logged as existing trigger audit events. The additional metadata (endpoint_id, source_ip, signature_valid) is included in the event payload for forensic analysis.

Supervision Tree¶

Updated startup order in application.ex:

1. AuditLog
2. EventBus.Registry
3. WebhookRegistry          ← NEW (must start before Router)
4. WebhookRateLimiter       ← NEW
5. AgentSupervisor
6. TriggerRouter
7. Scheduler
8. ConnectorRegistry
9. Bandit HTTP Server

Cloudflare Tunnel Configuration¶

The Cloudflare Tunnel should be configured to forward ONLY the webhook ingress path to the gateway. All management endpoints stay local-only.

# cloudflared config.yml
tunnel: tri-onyx-webhooks
credentials-file: /etc/cloudflared/credentials.json

ingress:
  # Only expose the webhook ingress path
  - hostname: hooks.example.com
    path: /hooks/*
    service: http://localhost:4000
    originRequest:
      noTLSVerify: true
  # Block everything else
  - service: http_status:404

This ensures that even if the tunnel hostname is known, only /hooks/* is reachable. The management API, SSE streams, WebSocket connectors, and all other endpoints remain accessible only from localhost.

Migration from Current Webhook Endpoint¶

The existing POST /webhooks/:agent_name endpoint should be deprecated but kept during migration:

Add a deprecation warning log on each call
Document the new /hooks/:endpoint_id path as the replacement
Remove POST /webhooks/:agent_name in a future release

The old endpoint uses agent name as the identifier with no auth — it should never be exposed through the tunnel.

What This Design Does NOT Cover (Future Work)¶

Webhook delivery retries (outbound): This is an inbound-only receiver. If TriOnyx needs to send webhooks, that's a separate design.
Payload schema validation per endpoint: The receiver validates JSON structure but does not enforce provider-specific schemas. Agent prompts can extract what they need.
Encryption at rest for secrets: The initial implementation stores secrets in plaintext in a permission-restricted file. A future iteration can use envelope encryption.
Webhook delivery log with response/retry tracking: The audit log captures delivery events, but there's no dedicated UI for browsing them. The GET /webhook-endpoints/:id/deliveries endpoint is a wrapper around the audit log filtered by endpoint ID.

Implementation Order¶

WebhookEndpoint struct + validation
WebhookRegistry GenServer + ETS + JSON persistence
WebhookSignature verification module (default + GitHub modes first)
WebhookRateLimiter (ETS-based token bucket)
WebhookReceiver plug + Router integration
Management API endpoints in Router
Supervision tree wiring
Tests (unit for signature verification, integration for full pipeline)
Cloudflare Tunnel configuration documentation