Six items from claude's incremental review (`12:48:43Z`):
- monitoring/dashboard.json: Group the outcomes widget by both
`metric.label.outcome` and `metric.label.reason`. Previously all
failures collapsed into a single `turnstile_failed` series, which
contradicted the README claim that the `reason` label drives the
breakdown.
- monitoring/metrics/*.yaml: Narrow the metric filter to
`jsonPayload.event=("turnstile_siteverify" OR "pack_completed")`.
Without this anchor, any future code path attaching
`siteverifyDurationMs` to an unrelated log silently joins the
distribution and creates new metric label values.
- usePackRequest.ts: Mirror `progressMessage.value = null` alongside
the `progressStage.value = null` clear on token-acquisition aborted /
error branches. Prevents a future edit setting a verifying message
from leaking prior-run state.
- turnstile.test.ts: Add a focused `describe` block with five tests
asserting `siteverifyDurationMs` is attached to every post-siteverify
log (one success path + four reject branches). The metric YAML
filters on field presence, so a refactor that drops the field on any
branch would silently break the metric without other tests failing.
Uses the existing `vi.spyOn(logger, ...)` pattern; no clock injection
needed.
- monitoring/README.md: Note that the metric filter pins
`service_name="repomix-server-us"`, so future regions (`-eu`,
`-asia`) silently drop out until the filter is broadened or
per-region counterparts applied.
- monitoring/README.md: Add a `gcloud logging metrics describe` snippet
for verifying a YAML edit was actually applied (gcloud update is
silent on no-op vs effective change).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six items from gemini, claude initial review, and claude follow-up:
- turnstile.ts: Update misleading comment that claimed the metric filters
on `event=turnstile_siteverify` and `outcome=success`. The actual
Cloud Monitoring metrics in `monitoring/metrics/` filter on
`siteverifyDurationMs` field presence, which uniformly captures both
the parallel success log (event=turnstile_siteverify) and the four
rejectAndLog failure paths (event=pack_completed). The comment
contradicted README and YAML and would mislead future readers.
- turnstile.ts: Wrap rejectAndLog in a local `rejectWithDuration` helper
so every post-siteverify branch automatically carries
`siteverifyDurationMs`. Prevents drift if a fifth reject reason gets
added later.
- client.ts: Split the wire-protocol `PackProgressStage` (server-emitted
SSE values) from the display-only `DisplayProgressStage` superset that
adds `verifying`. Keeping the synthetic stage out of the wire type
prevents silent divergence with the server's `PackProgressStage`.
- usePackRequest.ts, TryItLoading.vue, TryItResult.vue: Switch the
display-side type to `DisplayProgressStage`. `onProgress` callbacks
still take the wire `PackProgressStage`.
- usePackRequest.ts: Clear `progressStage` on token-acquisition failure
branches (aborted / error). Functionally invisible since loading=false
hides the loading UI, but prevents the next submit's verifying flash
from briefly showing the previous run's stale state.
- monitoring/metrics/turnstile_siteverify_duration.yaml: Retune the
exponential bucket layout for the 100ms-1s SLO band where decisions
get made. Doubling buckets only placed ~3 boundaries between 100ms
and 1s; growthFactor=1.5 with scale=10 places ~8 boundaries there.
18 finite buckets cover 10ms to ~9.85s, comfortably above the 5s
siteverify timeout so timeouts don't land in overflow.
- monitoring/README.md: Document that pre-network rejections
(secret_missing, missing_token, token_too_long) intentionally don't
carry siteverifyDurationMs, so they're excluded from both metrics
but still appear in the existing pack_requests metric.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the metric setup from prose-only README instructions to checked-in
YAML files under `monitoring/metrics/` so the dashboard, the metrics it
depends on, and the apply commands all live next to each other.
- `turnstile_siteverify_duration.yaml`: distribution metric on
`jsonPayload.siteverifyDurationMs`, exponential buckets 1ms-32s.
- `turnstile_siteverify_outcomes.yaml`: counter metric with `outcome`
and `reason` labels for the success-vs-failure breakdown widget.
README updated with the gcloud commands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire up two new tiles to surface the `siteverifyDurationMs` field added
in the previous commit:
- "Turnstile siteverify latency P50 / P95 / P99" — line chart with a
1s threshold marker so a steady regression jumps off the chart.
- "Turnstile siteverify outcomes (by outcome)" — stacked area
breaking down success vs turnstile_failed counts over time.
Both depend on log-based metrics `turnstile_siteverify_duration`
(distribution) and `turnstile_siteverify_outcomes` (counter) that need
to be created once in the GCP Console — README documents the filter,
field, and label extractors so the setup is reproducible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intent(scope-reduction): this repo operates a single GCP project — IaC
for log-based metrics adds maintenance cost (Console drift, stale
YAMLs) without payoff. Metrics are recreated ad-hoc from the Console
when needed
decision(keep-dashboard): the dashboard definition is still worth
committing because it's the user-facing layout, easy to damage by
accidental Console edit, and trivial to restore via
`gcloud monitoring dashboards create --config-from-file=dashboard.json`
decision(readme-slim): strip README to just the restore command —
longer prose belonged to the earlier IaC-lite layout that's now gone
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intent(observability): make the new pack_completed / source / durationMs
log fields queryable as time series in Cloud Monitoring without
re-running ad-hoc gcloud logging read pipelines during incidents
decision(iac-lite): store metric definitions as standalone YAMLs under
website/server/monitoring/metrics/ + an idempotent bash apply script,
rather than Terraform. The project has no existing IaC for monitoring
and adding a full Terraform setup would dwarf the actual change
decision(labels): constrain pack_requests labels to
outcome/source/input_type/cached/format (max 96 time series). Omitted
repoHost because its cardinality is unbounded — raw value remains in
the log jsonPayload for ad-hoc queries
decision(oom-split): separate oom_terminations and container_killed
counters. They fire on distinct textPayloads from Cloud Run's memory
enforcement paths; keeping them separate makes alert tuning easier
rejected(repoHost-as-label): too many unique hostnames — would create a
time series per repo host and cost scale with traffic diversity
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>