Fix cross-test mock leak behind the flaky parallel test failures by blaipr · Pull Request #388 · ctrliq/ascender

blaipr · 2026-06-11T11:35:28Z

Merge PR #388 first — it makes full parallel Python test runs deterministic for reviewing everything else.

SUMMARY

Fixes #379 — and the root cause turned out to be none of the suspects in the issue. Six unit tests assigned a Mock directly onto the model manager, with no patch machinery:

task.model.objects.get = mock.Mock(return_value=job)   # task.model is the Job / AdHocCommand class

That permanently replaces Manager.get for the remainder of the pytest-xdist worker process. Whichever functional test files later shared that worker became the 'flaky' victims, and the leaked mock explains every observed failure family:

api/test_survey_spec.py: Job.objects.get(...) returned a MagicMock → json.loads(job.extra_vars) failed with 'must be str … not MagicMock' (up to 13 parametrized failures per run).
utils/test_update_model.py: update_model() returned the unit test's leftover in-memory Job built with Project(pk=1); the test saved it, inserting a row whose project_id=1 references nothing — which Django's whole-DB check_constraints() at teardown reported as the mysterious 'main_job … invalid foreign key' IntegrityError.
commands/test_secret_key_regeneration.py: same frankenjob → garbage decrypted start_args.

The xdist scheduling lottery decided which files shared a worker with the leakers each run — hence the random victim set, why every victim passed in isolation, and why pairwise reproductions failed.

How it was found

A temporary pytest plugin scanned every model manager for Mock instance attributes after each test and logged transitions per worker. Both leak sources (unit/test_tasks.py, unit/tasks/test_jobs.py) were caught within two full parallel runs, attributed to the exact tests.

The fix

All six sites become mocker.patch.object(task.model.objects, 'get', return_value=...) — same behavior inside the test, automatically unwound by pytest-mock at teardown.

ISSUE TYPE

Bug, Docs Fix or other nominal change

COMPONENT NAME

API

ASCENDER VERSION

awx: 25.4.1.dev5+gcda0899.d20260610

ADDITIONAL INFORMATION

Validation: five consecutive full parallel runs (py.test --create-db -n auto --dist=loadfile awx/main/tests/unit awx/main/tests/functional awx/conf/tests awx/sso/tests) all returned 3476 passed, 0 failed, with the leak detector confirming no manager is ever left mocked. Before the fix, roughly two of every three such runs failed.

Note: the TransactionTestCase classes called out in the original issue text are innocent — their teardown flush verifiably leaves all tables empty. Test-only change; production code untouched. Independent of all open PRs.

Six unit tests assigned a Mock directly onto the Job/AdHocCommand manager (task.model.objects.get = mock.Mock(return_value=job)) with no patch machinery, permanently replacing Manager.get for the remainder of the worker process. Under pytest-xdist, whichever functional test files later shared that worker became the 'flaky' victims: - api/test_survey_spec.py: Job.objects.get returned a MagicMock, so json.loads(job.extra_vars) blew up with 'must be str ... not MagicMock' (up to 13 parametrized failures per run) - utils/test_update_model.py: update_model() returned the unit test's leftover in-memory Job (built with Project(pk=1)); saving it inserted a job row whose project_id=1 references nothing, which Django's whole-DB check_constraints() at teardown reported as the mysterious 'main_job ... invalid foreign key' IntegrityError - commands/test_secret_key_regeneration.py: same frankenjob, decrypted start_args garbage The leaks were localized with a temporary pytest plugin that scanned every model manager for Mock instance attributes after each test and logged the transition - both leak sources reproduced within two full parallel runs. Fix: scope all six with mocker.patch.object(task.model.objects, 'get', return_value=...), which pytest-mock unwinds at teardown. Validation: five consecutive full parallel runs (py.test --create-db -n auto --dist=loadfile awx/main/tests/unit awx/main/tests/functional awx/conf/tests awx/sso/tests) all came back 3476 passed, 0 failed, with the leak detector confirming no manager is ever left mocked. Before the fix, roughly two of every three runs failed. Fixes ctrliq#379

cigamit

Thanks for tracking this down, was bugging me for a while but was so sporadic (only failed for me 1 in 10 times) that it was hard to pin down. Ran through the tests multiple times and it has not occurred again.

cigamit approved these changes Jun 11, 2026

View reviewed changes

cigamit merged commit 73fbea8 into ctrliq:main Jun 11, 2026

cigamit self-assigned this Jun 11, 2026

cigamit added the bug Something isn't working label Jun 11, 2026

blaipr mentioned this pull request Jun 11, 2026

Show elapsed time on the workflow job output toolbar #393

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cross-test mock leak behind the flaky parallel test failures#388

Fix cross-test mock leak behind the flaky parallel test failures#388
cigamit merged 1 commit into
ctrliq:mainfrom
blaipr:fix/transactiontestcase-flake

blaipr commented Jun 11, 2026 •

edited

Loading

Uh oh!

cigamit left a comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

blaipr commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

How it was found

The fix

ISSUE TYPE

COMPONENT NAME

ASCENDER VERSION

ADDITIONAL INFORMATION

Uh oh!

cigamit left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

blaipr commented Jun 11, 2026 •

edited

Loading