Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .claude/commands/dashboard-dev.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
description: Guide for contributing to and deploying the Delivery Dashboard
---

# Dashboard Dev Command

Load the dashboard-dev skill and assist with the user's request.

## Execution

Load and follow the dashboard-dev skill from `.claude/skills/dashboard-dev/SKILL.md`.

Use it to help with:
- Forking and setting up the repo for development
- Running the dashboard locally
- Deploying to an OpenShift cluster
- Adding new pages, queries, or features
- Debugging a running deployment on cluster
128 changes: 128 additions & 0 deletions .claude/skills/dashboard-dev/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
---
name: dashboard-dev
description: Guide for contributing to and deploying the Delivery Dashboard
allowed-tools: [Bash, Read, Grep, Glob, Write, Edit, TodoWrite]
---

# Delivery Dashboard Development Skill

## Purpose

Help developers contribute to, run locally, and deploy the Delivery Dashboard — a web UI showing operator pipeline status across stage and integration environments, backed by SQLite, SQS, and S3.

---

## Codebase Layout

```
pkg/dashboard/
models/types.go # data models (PipelineRun, FailureGroup, etc.)
store/store.go # SQLite queries
server/server.go # HTTP handlers and routes
server/templates/ # Go HTML templates
base.html # nav, layout
operators.html # deliverables/pipelines page
pipeline-detail.html # per-operator history
analysis.html # failure grouping by AI root cause
usage.html # infra/clusters page
cmd/osde2e/dashboard/ # CLI entry point (flags, wiring)
scripts/dashboard/
deploy.sh # local dev deploy to OpenShift cluster
verify-build.sh # sanity check binary + templates
configs/local/
dashboard-build/ # podman build context (Dockerfile committed, binary gitignored)
```

Manifests live in the adjacent **hp-delivery-apps** repo:
```
delivery-dashboard/
base/ # Deployment + Service
overlays/
local/ # personal dev cluster (gitignored, manually provisioned secrets)
stage/ # vault ExternalSecrets
prod/ # vault ExternalSecrets
```

---

## Local Development (native, no container)

```bash
make dashboard
```

Builds the binary and runs it at http://localhost:8080/dashboard/deliverables against `./dashboard.db`.


## Deploying to Your Own OpenShift Cluster

### Prerequisites

- `podman login quay.io`
- `oc login <cluster-url>`
- hp-delivery-apps repo cloned adjacent to this repo
- Secrets pre-created in the target namespace (see hp-delivery-apps/delivery-dashboard/README.md)

### Create secrets (local overlay — vault handles stage/prod automatically)

```bash
oc create secret generic osde2e-ocm-credentials \
--from-literal=ocm-client-id=<id> \
--from-literal=ocm-client-secret=<secret> \
-n <namespace>

oc create secret generic osde2e-aws-credentials \
--from-literal=aws-access-key-id=<key> \
--from-literal=aws-secret-access-key=<secret> \
-n <namespace>
```

### Set SQS_QUEUE_URL

Edit `hp-delivery-apps/delivery-dashboard/overlays/local/configmap.yaml` directly — it is gitignored.

### Deploy

```bash
DASHBOARD_QUAY_IMAGE=quay.io/<your-username>/delivery-dashboard:latest \
QUAY_EXPIRE=26w \
./scripts/dashboard/deploy.sh
```

The script:
1. Checks required secrets exist (fails fast if not)
2. Compiles linux/amd64 binary → `configs/local/dashboard-build/osde2e`
3. Builds slim image via podman and pushes to quay
4. Applies `kustomize build overlays/local | oc apply`
5. Waits for rollout, prints URL

Route URL: `https://live-<namespace>.apps.<cluster-domain>/dashboard/deliverables`

### When to rebuild vs re-apply

| Change type | Action |
|-------------|--------|
| Go source / templates | Re-run `deploy.sh` |
| ConfigMap / env vars | Edit overlay configmap, `kustomize build \| oc apply -f -` |
| Route / Service | Same as above, no restart needed |

---

## Common Development Tasks

- **Add a new page**: template in `server/templates/`, handler in `server.go`, route in `setupRoutes()`, nav link in `base.html`
- **Add a data query**: method in `store/store.go`, model in `models/types.go`
- **Check logs**: `oc logs -f deployment/delivery-dashboard -n <namespace>`
- **Check pod status**: `oc get pods -n <namespace>`
- **Rolling restart**: `oc rollout restart deployment/delivery-dashboard -n <namespace>`

---

## Architecture

- **Pipeline data**: SQS listener polls for S3 event notifications; each event points to a test result JSON, downloaded and parsed into `pipeline_runs` SQLite table
- **Pipeline Backfill**: on startup with `--backfill`, scans S3 bucket directly for historical results
- **Pipeline LLM analysis**: stored in `llm_analysis` column as JSON; parsed to extract `root_cause` and `recommendations`
- **OCM data**: collectors query OCM API for cluster reserves, usage metrics, and environment status (stage/int/prod)
- **Local Storage**: single SQLite file at `/data/dashboard.db`, mounted via `emptyDir` (repopulated from S3 + OCM on each start)
- **UI Templates**: standard Go `html/template`, server-side rendered, no JS framework
5 changes: 4 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,16 @@ Osde2e is End-to-end testing framework for Managed services for OSD/ROSA.
3. Integration test failures? Check credentials/env vars
4. Always use `gofumpt`, not `gofmt`
5. Check git status before committing
6. Dashboard work? Use the `/dashboard-dev` skill — it has deploy steps, architecture, and local dev instructions

## Architecture
```
osde2e
├── cmd/osde2e/ # CLI commands (provision, test, cleanup, krknai)
├── cmd/osde2e/ # CLI commands (provision, test, cleanup, krknai, dashboard)
├── pkg/common/ # Core logic (config, providers, helpers)
├── pkg/dashboard/ # Delivery Dashboard (server, store, collectors, models)
├── internal/ # LLM analysis (llm, sanitizer, prompts)
├── .claude/skills/ # Claude Code skills (use /dashboard-dev for dashboard work)
└── test/ # Standalone Ginkgo test suites
```

Expand Down
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: check generate test
.PHONY: check generate test dashboard

PKG := github.com/openshift/osde2e
DOC_PKG := $(PKG)/cmd/osde2e-docs
Expand Down Expand Up @@ -37,6 +37,9 @@ build:
mkdir -p "$(OUT_DIR)"
go build -o "$(OUT_DIR)" "$(DIR)cmd/..."

dashboard: build
"$(OUT_DIR)/osde2e" dashboard --db="$(DIR)dashboard.db" --backfill --port=8080

diffproviders.txt:
"$(DIR)scripts/generate-providers-import.sh" > diffproviders.txt

Expand Down
190 changes: 190 additions & 0 deletions cmd/osde2e/dashboard/cmd.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
package dashboard

import (
"context"
"fmt"
"log"
"os"
"os/signal"
"syscall"

"github.com/openshift/osde2e/cmd/osde2e/common"
"github.com/openshift/osde2e/cmd/osde2e/helpers"
viper "github.com/openshift/osde2e/pkg/common/concurrentviper"
"github.com/openshift/osde2e/pkg/common/providers/ocmprovider"
"github.com/openshift/osde2e/pkg/dashboard/collectors"
"github.com/openshift/osde2e/pkg/dashboard/config"
"github.com/openshift/osde2e/pkg/dashboard/server"
"github.com/openshift/osde2e/pkg/dashboard/store"
"github.com/spf13/cobra"
)

var Cmd = &cobra.Command{
Use: "dashboard",
Short: "Start osde2e dashboard web server",
Long: "Start a web dashboard that aggregates cluster reserves, usage metrics, and test results from OCM and S3.",
Args: cobra.NoArgs,
Run: run,
}

var args struct {
configString string
secretLocations string
environment string
port int
maxResults int
sqsQueueURL string
dbPath string
backfill bool
}

func init() {
pfs := Cmd.PersistentFlags()

pfs.StringVar(&args.configString, "configs", "", "A comma separated list of built in configs to use")
_ = Cmd.RegisterFlagCompletionFunc("configs", helpers.ConfigComplete)

pfs.StringVar(&args.secretLocations, "secret-locations", "",
"A comma separated list of possible secret directory locations for loading secret configs.")

pfs.StringVarP(&args.environment, "environment", "e", "",
"Filter clusters by environment (stage, prod, integration, all). Defaults to 'all'.")

pfs.IntVarP(&args.port, "port", "p", config.DefaultPort, "HTTP port for the dashboard server")

pfs.IntVar(&args.maxResults, "max-results", config.DefaultMaxTestResults,
"Maximum number of test results to display")

pfs.StringVar(&args.sqsQueueURL, "sqs-queue-url", "",
"SQS queue URL receiving S3 ObjectCreated notifications. When set, enables event-driven DB updates.")

pfs.StringVar(&args.dbPath, "db", "dashboard.db",
"Path to the SQLite database file. Use ':memory:' for an ephemeral in-memory DB.")

pfs.BoolVar(&args.backfill, "backfill", false,
"Scan all historical S3 objects and populate the DB before starting the server.")

// Bind flags to viper
_ = viper.BindPFlag(config.Port, pfs.Lookup("port"))
_ = viper.BindPFlag(config.Environment, pfs.Lookup("environment"))
_ = viper.BindPFlag(config.MaxTestResults, pfs.Lookup("max-results"))
_ = viper.BindPFlag(ocmprovider.Env, pfs.Lookup("environment"))
_ = viper.BindPFlag(config.SQSQueueURL, pfs.Lookup("sqs-queue-url"))
_ = viper.BindPFlag(config.DBPath, pfs.Lookup("db"))
}

func run(cmd *cobra.Command, argv []string) {
log.Println("==== Starting osde2e Dashboard ====")

// Unset personal OCM token so the dashboard authenticates via OCM_CLIENT_ID/SECRET only.
os.Unsetenv("OCM_TOKEN")

// Load configurations
if err := common.LoadConfigs(args.configString, "", args.secretLocations); err != nil {
log.Printf("Error loading initial configuration: %v", err)
os.Exit(1)
}

// Set dashboard defaults
config.SetDefaults()

// Override with CLI flags if explicitly set
if cmd.PersistentFlags().Changed("port") {
viper.Set(config.Port, args.port)
}
if cmd.PersistentFlags().Changed("environment") {
viper.Set(config.Environment, args.environment)
viper.Set(ocmprovider.Env, args.environment)
}
if cmd.PersistentFlags().Changed("max-results") {
viper.Set(config.MaxTestResults, args.maxResults)
}
if cmd.PersistentFlags().Changed("sqs-queue-url") {
viper.Set(config.SQSQueueURL, args.sqsQueueURL)
}
if cmd.PersistentFlags().Changed("db") {
viper.Set(config.DBPath, args.dbPath)
}

// Load dashboard configuration
dashboardConfig := config.LoadConfig()

// Validate configuration
if dashboardConfig.OCMConfigPath == "" {
log.Println("Warning: OCM_CONFIG not set. OCM features may not work.")
}
if dashboardConfig.S3Bucket == "" {
log.Println("Warning: LOG_BUCKET not set. S3 test results will not be available.")
}

log.Printf("Dashboard Configuration:")
log.Printf(" Port: %d", dashboardConfig.Port)
log.Printf(" S3 Bucket: %s", dashboardConfig.S3Bucket)
log.Printf(" S3 Region: %s", dashboardConfig.S3Region)
log.Printf(" Environment: %s", dashboardConfig.Environment)
log.Printf(" DB Path: %s", dashboardConfig.DBPath)
log.Printf(" SQS Queue URL: %s", dashboardConfig.SQSQueueURL)

// Open the SQLite store
st, err := store.Open(dashboardConfig.DBPath)
if err != nil {
log.Printf("Failed to open store at %s: %v", dashboardConfig.DBPath, err)
os.Exit(1)
}
defer st.Close()

// Top-level context — cancelled on Ctrl+C or SIGTERM, shuts down everything.
ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer cancel()

// Optionally backfill historical S3 data into the DB
if args.backfill || dashboardConfig.SQSQueueURL != "" {
if dashboardConfig.S3Bucket == "" {
log.Println("Warning: --backfill requested but LOG_BUCKET is not set; skipping.")
} else {
consumer, err := collectors.NewSQSConsumer(
dashboardConfig.SQSQueueURL,
dashboardConfig.S3Bucket,
dashboardConfig.S3Region,
st,
)
if err != nil {
log.Printf("Warning: failed to create SQS consumer: %v", err)
} else {
if args.backfill {
log.Println("Truncating DB before backfill...")
if err := st.Truncate(); err != nil {
log.Printf("Warning: truncate failed: %v", err)
}
log.Println("Running backfill — this may take a few minutes...")
if err := consumer.Backfill(); err != nil {
log.Printf("Backfill error: %v", err)
}
}

// Start the SQS consumer goroutine (only when queue URL is configured)
if dashboardConfig.SQSQueueURL != "" {
go consumer.Run(ctx)
log.Printf("SQS consumer started")
}
}
}
}

// Create and start the HTTP server
srv, err := server.NewServer(dashboardConfig)
if err != nil {
log.Printf("Failed to create dashboard server: %v", err)
os.Exit(1)
}
srv.WithStore(st)

addr := fmt.Sprintf(":%d", dashboardConfig.Port)
log.Printf("Dashboard server starting on http://localhost%s", addr)
log.Printf("Press Ctrl+C to stop")

if err := srv.Start(addr, ctx); err != nil {
log.Printf("Server error: %v", err)
os.Exit(1)
}
}
2 changes: 2 additions & 0 deletions cmd/osde2e/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import (
"github.com/openshift/osde2e/cmd/osde2e/arguments"
"github.com/openshift/osde2e/cmd/osde2e/cleanup"
"github.com/openshift/osde2e/cmd/osde2e/completion"
"github.com/openshift/osde2e/cmd/osde2e/dashboard"
"github.com/openshift/osde2e/cmd/osde2e/healthcheck"
"github.com/openshift/osde2e/cmd/osde2e/krknai"
"github.com/openshift/osde2e/cmd/osde2e/provision"
Expand Down Expand Up @@ -46,6 +47,7 @@ func init() {
root.AddCommand(completion.Cmd)
root.AddCommand(cleanup.Cmd)
root.AddCommand(krknai.Cmd)
root.AddCommand(dashboard.Cmd)
}

func main() {
Expand Down
1 change: 1 addition & 0 deletions configs/local/dashboard-build/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
osde2e
Loading