Commit Graph

9 Commits

Author SHA1 Message Date
Kazuki Yamada 54c6a3d238 fix(website): Address claude third-pass review on siteverify metric
Six items from claude's incremental review (`12:48:43Z`):

- monitoring/dashboard.json: Group the outcomes widget by both
  `metric.label.outcome` and `metric.label.reason`. Previously all
  failures collapsed into a single `turnstile_failed` series, which
  contradicted the README claim that the `reason` label drives the
  breakdown.
- monitoring/metrics/*.yaml: Narrow the metric filter to
  `jsonPayload.event=("turnstile_siteverify" OR "pack_completed")`.
  Without this anchor, any future code path attaching
  `siteverifyDurationMs` to an unrelated log silently joins the
  distribution and creates new metric label values.
- usePackRequest.ts: Mirror `progressMessage.value = null` alongside
  the `progressStage.value = null` clear on token-acquisition aborted /
  error branches. Prevents a future edit setting a verifying message
  from leaking prior-run state.
- turnstile.test.ts: Add a focused `describe` block with five tests
  asserting `siteverifyDurationMs` is attached to every post-siteverify
  log (one success path + four reject branches). The metric YAML
  filters on field presence, so a refactor that drops the field on any
  branch would silently break the metric without other tests failing.
  Uses the existing `vi.spyOn(logger, ...)` pattern; no clock injection
  needed.
- monitoring/README.md: Note that the metric filter pins
  `service_name="repomix-server-us"`, so future regions (`-eu`,
  `-asia`) silently drop out until the filter is broadened or
  per-region counterparts applied.
- monitoring/README.md: Add a `gcloud logging metrics describe` snippet
  for verifying a YAML edit was actually applied (gcloud update is
  silent on no-op vs effective change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:59:53 +09:00
Kazuki Yamada fa06e5059c fix(website): Address PR review feedback on siteverify metric
Six items from gemini, claude initial review, and claude follow-up:

- turnstile.ts: Update misleading comment that claimed the metric filters
  on `event=turnstile_siteverify` and `outcome=success`. The actual
  Cloud Monitoring metrics in `monitoring/metrics/` filter on
  `siteverifyDurationMs` field presence, which uniformly captures both
  the parallel success log (event=turnstile_siteverify) and the four
  rejectAndLog failure paths (event=pack_completed). The comment
  contradicted README and YAML and would mislead future readers.
- turnstile.ts: Wrap rejectAndLog in a local `rejectWithDuration` helper
  so every post-siteverify branch automatically carries
  `siteverifyDurationMs`. Prevents drift if a fifth reject reason gets
  added later.
- client.ts: Split the wire-protocol `PackProgressStage` (server-emitted
  SSE values) from the display-only `DisplayProgressStage` superset that
  adds `verifying`. Keeping the synthetic stage out of the wire type
  prevents silent divergence with the server's `PackProgressStage`.
- usePackRequest.ts, TryItLoading.vue, TryItResult.vue: Switch the
  display-side type to `DisplayProgressStage`. `onProgress` callbacks
  still take the wire `PackProgressStage`.
- usePackRequest.ts: Clear `progressStage` on token-acquisition failure
  branches (aborted / error). Functionally invisible since loading=false
  hides the loading UI, but prevents the next submit's verifying flash
  from briefly showing the previous run's stale state.
- monitoring/metrics/turnstile_siteverify_duration.yaml: Retune the
  exponential bucket layout for the 100ms-1s SLO band where decisions
  get made. Doubling buckets only placed ~3 boundaries between 100ms
  and 1s; growthFactor=1.5 with scale=10 places ~8 boundaries there.
  18 finite buckets cover 10ms to ~9.85s, comfortably above the 5s
  siteverify timeout so timeouts don't land in overflow.
- monitoring/README.md: Document that pre-network rejections
  (secret_missing, missing_token, token_too_long) intentionally don't
  carry siteverifyDurationMs, so they're excluded from both metrics
  but still appear in the existing pack_requests metric.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:42:21 +09:00
Kazuki Yamada 76522fc13e chore(monitoring): Check in Turnstile siteverify metric definitions
Move the metric setup from prose-only README instructions to checked-in
YAML files under `monitoring/metrics/` so the dashboard, the metrics it
depends on, and the apply commands all live next to each other.

- `turnstile_siteverify_duration.yaml`: distribution metric on
  `jsonPayload.siteverifyDurationMs`, exponential buckets 1ms-32s.
- `turnstile_siteverify_outcomes.yaml`: counter metric with `outcome`
  and `reason` labels for the success-vs-failure breakdown widget.

README updated with the gcloud commands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:24:48 +09:00
Kazuki Yamada aea2401bfc chore(monitoring): Add Turnstile siteverify dashboard widgets
Wire up two new tiles to surface the `siteverifyDurationMs` field added
in the previous commit:

- "Turnstile siteverify latency P50 / P95 / P99" — line chart with a
  1s threshold marker so a steady regression jumps off the chart.
- "Turnstile siteverify outcomes (by outcome)" — stacked area
  breaking down success vs turnstile_failed counts over time.

Both depend on log-based metrics `turnstile_siteverify_duration`
(distribution) and `turnstile_siteverify_outcomes` (counter) that need
to be created once in the GCP Console — README documents the filter,
field, and label extractors so the setup is reproducible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:21:47 +09:00
Kazuki Yamada f327590555 fix(monitoring): Correct dashboard legend template variable syntax
intent(readability): dashboard panels for options usage and cache hit
  ratio were rendering with empty legend values (e.g. "compress=",
  "removeComments=") making it impossible to distinguish true vs false
  series. Every other signal on the dashboard was fine — this was a
  subtle API mismatch
decision(labels-plural): GCP Cloud Monitoring widget legend templates
  use `${metric.labels.X}` (plural). The dashboard had
  `${metric.label.X}` (singular), which silently resolves to the empty
  string instead of erroring out. Replaced all 5 occurrences

Applied to the live GCP dashboard via `gcloud monitoring dashboards
update` before committing — legends now render correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:39:43 +09:00
Kazuki Yamada a85a63e183 fix(server): Address PR review feedback
Addresses PR #1493 review comments from gemini-code-assist,
devin-ai-integration, coderabbitai, and claude.

intent(classifier-fix): fix classifyRejectReason returning 'unknown'
  for every validation_error — gemini and devin both flagged that
  validateRequest wraps ZodError in AppError, which strips the
  .issues array the classifier needs. This made the new "Validation
  rejections (by reason)" dashboard panel effectively useless, all
  rejections collapsing into one 'unknown' bucket
decision(cause-chain): use the native `Error.cause` option
  (ES2022) to attach the original ZodError when wrapping in AppError.
  Less invasive than adding a custom field; classifyRejectReason now
  checks both `.issues` directly on the error AND on `.cause`, so
  callers don't need to know which layer wrapped the error
decision(json-catch-preserve): change the JSON.parse bare `catch {}`
  to `catch (jsonError)` and pass the original SyntaxError to
  logError. Without this the `invalid_json` bucket shows only a
  count, leaving an operator no breadcrumbs when it spikes. Also
  drops the classifyRejectReason round-trip — rejectReason is
  statically known at this call site
rejected(packOptions-snake-case): gemini suggested renaming the
  packOptions log fields to snake_case so they literally match the
  GCP metric label names. Keeping camelCase in TS code (project
  convention) + snake_case in GCP labels (platform convention) is
  deliberate — the `EXTRACT(...)` path on the metric bridges them
  explicitly. Forcing one to match the other breaks one of the two
  ecosystems' idioms
improve(options-panels): coderabbit noted only `compress` had a
  dedicated panel even though pack_options_usage exposes 4 labels.
  Split into four quarter-width tiles covering compress,
  removeComments, outputParsable, and hasIncludePatterns. Also
  dropped the "(Tree-sitter)" parenthetical from the compress title
  — meaningful to the author, confusing to an on-caller

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:17:01 +09:00
Kazuki Yamada 628227cdde feat(server): Log pack options and validation reject reasons
intent(observability): answer three questions that the initial
  observability PR couldn't:
    (1) what drives OOM risk — size distribution heatmap needs the
        data out of pack_completed's `metrics.totalTokens` as a
        distribution metric, not just a scalar log field
    (2) which differentiating features users actually adopt
        (compress / removeComments / outputParsable / custom
        include-patterns)
    (3) what inputs validation rejects and why — surfaces feature
        requests (non-github hosts, oversized files) and abuse patterns
decision(packOptions-in-log): flatten option booleans under a
  `packOptions` nested object on pack_completed success logs so a
  log-based metric with option labels can extract them without
  exploding the existing pack_requests cardinality. Patterns are
  logged as `hasIncludePatterns` / `hasIgnorePatterns` booleans only —
  raw user input never enters metric labels
decision(rejectReason-enum): classifyRejectReason is a switch on zod
  error message strings since the messages are stable (defined in
  packRequestSchema.ts in this repo). Non-zod errors (notably the
  pre-zod `Invalid JSON in options` path) get an explicit branch so
  they don't fall into the generic `unknown` bucket
decision(invalid-json-logging): the JSON-parse catch previously
  returned 400 without logging, so `rejectReason=invalid_json` would
  have been invisible. Added an explicit logError call on that path
rejected(per-option-metrics): creating separate metrics like
  `pack_with_compress` was considered and rejected — 1 metric with
  4 boolean labels gives 16 series total (manageable) and lets the
  dashboard group by any option without schema sprawl

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:06:55 +09:00
Kazuki Yamada c95034bafc chore(monitoring): Keep dashboard only, drop metric YAMLs
intent(scope-reduction): this repo operates a single GCP project — IaC
  for log-based metrics adds maintenance cost (Console drift, stale
  YAMLs) without payoff. Metrics are recreated ad-hoc from the Console
  when needed
decision(keep-dashboard): the dashboard definition is still worth
  committing because it's the user-facing layout, easy to damage by
  accidental Console edit, and trivial to restore via
  `gcloud monitoring dashboards create --config-from-file=dashboard.json`
decision(readme-slim): strip README to just the restore command —
  longer prose belonged to the earlier IaC-lite layout that's now gone

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:50:32 +09:00
Kazuki Yamada bd8e069440 chore(monitoring): Add log-based metric definitions and apply script
intent(observability): make the new pack_completed / source / durationMs
  log fields queryable as time series in Cloud Monitoring without
  re-running ad-hoc gcloud logging read pipelines during incidents
decision(iac-lite): store metric definitions as standalone YAMLs under
  website/server/monitoring/metrics/ + an idempotent bash apply script,
  rather than Terraform. The project has no existing IaC for monitoring
  and adding a full Terraform setup would dwarf the actual change
decision(labels): constrain pack_requests labels to
  outcome/source/input_type/cached/format (max 96 time series). Omitted
  repoHost because its cardinality is unbounded — raw value remains in
  the log jsonPayload for ad-hoc queries
decision(oom-split): separate oom_terminations and container_killed
  counters. They fire on distinct textPayloads from Cloud Run's memory
  enforcement paths; keeping them separate makes alert tuning easier
rejected(repoHost-as-label): too many unique hostnames — would create a
  time series per repo host and cost scale with traffic diversity

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:38:52 +09:00