Parky 運用設計 Parky Operations

運用の SSoT — Strategy + Runbook 集 Operations SSoT — Strategy & runbooks

Parky の運用設計 (通知 / SLO / インシデント対応 / observability / deploy / セキュリティ) の すべてをここに集約。最上位に 戦略 doc (SSoT) を置き、各 runbook が それに従って動く。新規 alert / runbook を作るときは必ず本章のルールに沿うこと。

The single home for Parky operations — notification strategy, SLOs, incident response, observability, deployment, and security. Top-level strategy docs (SSoT) dictate how each runbook behaves. All new alerts/runbooks must conform.

運用設計 (SSoT) Operational design (SSoT)

これらは "戦略" レベルの doc。 個別 runbook はすべてこれに従う。 新規通知 / alert / SLO 追加時はまずこちらを update してから個別 runbook を実装する。

These are strategy docs. Every runbook conforms to them. Update strategy first, then individual runbooks.

Observability — 計測と通知の配線 Observability — wiring up metrics & notifications

通知戦略 (SSoT) の P0/P1/P2/P3 を実際に動かす配線群。

The wiring that makes the notification strategy actually fire alerts.

🛡

Sentry setup

Sentry org / project の初期セットアップ手順。dev Worker DSN 投入済 + alert rule 12 本準備済。

Initial Sentry org/project setup. dev Worker DSN wired + 12 alert rules prepared.

🔔

Sentry alert rules

12 alert rule の定義 (5xx burst / Error budget burn 等)。各 rule の severity 振り分けは notification-strategy.md 参照。

Definitions of 12 alert rules (5xx burst, Error budget burn, etc.). Severity routing per notification-strategy.md.

🔌

Observability 配線 Observability hookup

Sentry 残チャネル DSN / Honeycomb signup / Discord native integration の手順を一本化。

Remaining Sentry DSN, Honeycomb signup, Discord native integration — consolidated.

💓

Synthetic healthcheck

5 分 cron で /healthz を probe。Cloudflare Health Checks 本格化までの暫定 (GH Actions cron)。

5-min cron probing /healthz. GH Actions interim until Cloudflare Health Checks rolls out.

📡

OpenTelemetry

OTel collector 配線 (Honeycomb 等)。lib/otel.ts は完備、endpoint 投入待ち。

OTel collector wiring (e.g., Honeycomb). lib/otel.ts ready; endpoint pending.

📜

Logging

log level 設計、scope / resource 構造、createLogger 利用パターン。

Log level design, scope/resource structure, createLogger patterns.

Logpush

Workers Logs を R2 に push する設定 (job 1606777 / 1606779)。

Pushing Workers Logs to R2 (jobs 1606777 / 1606779).

🚀

Sentry/Logpush rollout

Sentry + Logpush の段階展開計画。

Phased rollout plan for Sentry + Logpush.

📈

Analytics Engine

Cloudflare Analytics Engine (CAE) の利用方針。SLO 計測 / cost report の集計基盤。

Cloudflare Analytics Engine (CAE) usage. Aggregation backbone for SLO and cost reports.

Deployment & Infra Deployment & infra

🛂

GitHub Environments

prod approval gate のセットアップ。Free plan の制限と Team plan upgrade 判断材料も記載。

Production approval gate setup, with notes on Free vs Team-plan trade-offs.

Deploy rollback

wrangler rollback / Pages rollback / DB migration revert の手順。

wrangler rollback, Pages rollback, and DB migration revert procedures.

🤖

Auto rollback (SLO burn) Auto rollback (SLO burn)

SLO burn rate (error × 3 / p99 × 2) を 5 min cron で監視し、dev では自動で scripts/deploy/rollback.sh api dev を発火。prod 用は auto-rollback-prod.yml に同構造で配置済だが vars.AUTO_ROLLBACK_PROD_ENABLED で gate (初期 disabled)。dev で 2 週間 false-positive ゼロを実証してから prod 有効化を検討。

5-min cron monitors SLO burn rate (error × 3 / p99 × 2) and auto-fires scripts/deploy/rollback.sh api dev in dev. Prod variant exists with the same structure but is gated by vars.AUTO_ROLLBACK_PROD_ENABLED (disabled initially). Enable prod only after 2 weeks of zero false-positives in dev.

🌀

Chaos engineering Chaos engineering

四半期 game day の runbook。Supabase pause / Hyperdrive 切断 / R2 障害 / 外部 API 5xx / Auth 障害の 6 シナリオ + 記録テンプレ + 整備すべき仕組み (MSW / circuit breaker)。chaos-fault-inject.yml で Discord に drill シグナル投稿。

Quarterly game day runbook — 6 scenarios (Supabase pause, Hyperdrive disconnect, R2 fail, external API 5xx, Auth outage), recording template, and tooling backlog (MSW / circuit breaker). chaos-fault-inject.yml posts drill signals to Discord.

🐤

Canary deploy Canary deploy

Cloudflare Workers Versions API を使った 1 → 10 → 50 → 100% の段階配信。dev/stg は ENABLED、prod は if: false で温存中 (動作実証後に main thread が flag on)。

Gradual rollout 1 → 10 → 50 → 100% via Cloudflare Workers Versions API. dev/stg ENABLED; prod kept disabled (if: false) until validated.

🧩

Split worker deploy Split worker deploy

ADR-0010 で決定した 4 worker (public / admin / marketing / store-sync) への本番カットオーバー手順。store-sync → public → admin → marketing の順序、secret 投入 (secret-keys-1p-map.json)、smoke test、rollback。legacy monolith [env.prod] は 2026-05-01 にコメントアウト済 (npm run deploy:prod は意図的に失敗)。

Production cutover for the 4 split workers (public / admin / marketing / store-sync) per ADR-0010. Deploy order (store-sync → public → admin → marketing), secret rollout via secret-keys-1p-map.json, smoke tests, and rollback. Legacy monolith [env.prod] commented out 2026-05-01 — npm run deploy:prod intentionally fails.

📈

DORA Weekly Metrics

毎週月曜 09:00 JST に GitHub API から DORA 4 指標 (Deployment Frequency / Lead Time / MTTR / Change Failure Rate) を集計し、#p2-deploys へ Discord embed 投稿。閾値表と Parky 現状フェーズの解釈を併記。

Weekly DORA 4-key metrics (Deployment Frequency, Lead Time, MTTR, Change Failure Rate) collected every Monday 09:00 JST from the GitHub API and posted to #p2-deploys. Includes threshold table and Parky-phase-aware interpretation.

🔐

サプライチェーン セキュリティ Supply-chain security

SBOM (CycloneDX) / SLSA L3 provenance attestation / Dependency Review / Sigstore による多層防御。SLSA レベル達成度と検証手順 (gh attestation verify / cosign) も記載。

Multi-layer defense via SBOM (CycloneDX), SLSA L3 provenance attestation, Dependency Review, and Sigstore. Includes self-assessment of SLSA levels and verification commands (gh attestation verify / cosign).

🌏

Regional rollout

将来の段階展開戦略 (Tokyo → 関東 → 全国)。

Future phased rollout (Tokyo → Kanto → nationwide).

🏗

Terraform backend

local state → R2 backend への移行計画。drift detection の前提整備。

Migrating local state to an R2 backend; prerequisites for drift detection.

🌳

Supabase branching

Supabase database branching の導入計画。PR 単位の DB プレビュー。

Plan for Supabase database branching — per-PR DB previews.

Build time scaling

CI 時間最適化 (turbo cache / job 並列化 / matrix 統合)。

CI time optimization (turbo cache, job parallelism, matrix consolidation).

Security

Storage & Performance Storage & performance

関連 Related

運用に関わる監査結果は .work/parky/2026-04-29_001_parky_comprehensive_evaluation_v2.html (Operations 軸 62/100) と .work/parky/2026-04-29_002_parky_notification_strategy.html (本章の HTML reference 版) に残してある。

Operations audit at .work/parky/2026-04-29_001_parky_comprehensive_evaluation_v2.html (Operations axis 62/100), with the HTML reference of this chapter at .work/parky/2026-04-29_002_parky_notification_strategy.html.