fix(migrations): propagate pg_migrate.sh failure to exit code#828
Merged
Conversation
The `migrate` subcommand (and every other binary entrypoint) called ddl.RunMigrations() and discarded its returned error, so when pg_migrate.sh exited non-zero the process still reached os.Exit(0) in the switch's `case "migrate":` arm. This silently broke the K8s migration Job's success contract: a failed migration ended in a successful-looking pod, the Job flipped to condition=Complete, the dependent serving Deployments' DependsOn unblocked, and pods rolled out against a half-migrated DB. Discovered today (2026-05-19) when a manually-killed 0201 backfill exited 2 but the bridge Deployment rolled anyway. The k8s/Pulumi side was behaving correctly given the inputs it got — the API binary was just reporting success on failure. Check the error and os.Exit(1) at the migration site, before the switch. Affects every command (server, indexer, es-indexer, solana-indexer, migrate) — none of them should run against a half-migrated DB.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
api/main.go:30calledddl.RunMigrations()and discarded its returned error. Whenpg_migrate.shexited non-zero, the process kept going and reachedos.Exit(0)incase "migrate":— so a failed migration produced a successful-looking pod exit.os.Exit(1)immediately so the failure propagates out of the container. Affects every binary entrypoint (server,indexer,es-indexer,solana-indexer,migrate); none should run against a half-migrated DB.Background
Discovered today (2026-05-19). The migration Job in audius-k8s#496 is supposed to block its dependent Deployments via Pulumi
DependsOnwhenever the migration fails. During an incident, the 0201 backfill was manually killed (pg_terminate_backendon the stuck session) and pg_migrate.sh exited with code 2, but the bridge Deployment rolled out anyway.Tracing the failure path:
pg_migrate.shexited 2. ✓cmd.CombinedOutput()inddl/run_migrations.go:19returned*exec.ExitError. ✓RunMigrations()printed"Error running pg_migrate.sh: exit status 2"and returned the error. ✓main.go:30ignored the return value. ✗ ← bugcase "migrate":ranos.Exit(0). Container exited 0.condition=Complete.DependsOnunblocked → Deployments rolled.K8s, Pulumi, and the Job spec all behaved correctly given the inputs they got — the API binary was reporting success on failure.
Test plan
bridge migrateagainst a DB with a deliberately-broken migration (e.g., temporarily put a syntax error in the latestddl/migrations/*.sql). Confirm exit code is now non-zero.bridge migrateagainst a clean DB. Confirm exit 0 and no behavior change.bridge migrateinvocation should leave the migration Job incondition=Failedand block the Deployment rollout (per the audius-k8s side'sDependsOn).🤖 Generated with Claude Code