Fix cross-test mock leak behind the flaky parallel test failures#388
Merged
Conversation
Six unit tests assigned a Mock directly onto the Job/AdHocCommand manager (task.model.objects.get = mock.Mock(return_value=job)) with no patch machinery, permanently replacing Manager.get for the remainder of the worker process. Under pytest-xdist, whichever functional test files later shared that worker became the 'flaky' victims: - api/test_survey_spec.py: Job.objects.get returned a MagicMock, so json.loads(job.extra_vars) blew up with 'must be str ... not MagicMock' (up to 13 parametrized failures per run) - utils/test_update_model.py: update_model() returned the unit test's leftover in-memory Job (built with Project(pk=1)); saving it inserted a job row whose project_id=1 references nothing, which Django's whole-DB check_constraints() at teardown reported as the mysterious 'main_job ... invalid foreign key' IntegrityError - commands/test_secret_key_regeneration.py: same frankenjob, decrypted start_args garbage The leaks were localized with a temporary pytest plugin that scanned every model manager for Mock instance attributes after each test and logged the transition - both leak sources reproduced within two full parallel runs. Fix: scope all six with mocker.patch.object(task.model.objects, 'get', return_value=...), which pytest-mock unwinds at teardown. Validation: five consecutive full parallel runs (py.test --create-db -n auto --dist=loadfile awx/main/tests/unit awx/main/tests/functional awx/conf/tests awx/sso/tests) all came back 3476 passed, 0 failed, with the leak detector confirming no manager is ever left mocked. Before the fix, roughly two of every three runs failed. Fixes ctrliq#379
This was referenced Jun 11, 2026
cigamit
approved these changes
Jun 11, 2026
cigamit
left a comment
Contributor
There was a problem hiding this comment.
Thanks for tracking this down, was bugging me for a while but was so sporadic (only failed for me 1 in 10 times) that it was hard to pin down. Ran through the tests multiple times and it has not occurred again.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merge PR #388 first — it makes full parallel Python test runs deterministic for reviewing everything else.
SUMMARY
Fixes #379 — and the root cause turned out to be none of the suspects in the issue. Six unit tests assigned a Mock directly onto the model manager, with no patch machinery:
That permanently replaces
Manager.getfor the remainder of the pytest-xdist worker process. Whichever functional test files later shared that worker became the 'flaky' victims, and the leaked mock explains every observed failure family:api/test_survey_spec.py:Job.objects.get(...)returned a MagicMock →json.loads(job.extra_vars)failed with 'must be str … not MagicMock' (up to 13 parametrized failures per run).utils/test_update_model.py:update_model()returned the unit test's leftover in-memory Job built withProject(pk=1); the test saved it, inserting a row whoseproject_id=1references nothing — which Django's whole-DBcheck_constraints()at teardown reported as the mysterious 'main_job … invalid foreign key' IntegrityError.commands/test_secret_key_regeneration.py: same frankenjob → garbage decryptedstart_args.The xdist scheduling lottery decided which files shared a worker with the leakers each run — hence the random victim set, why every victim passed in isolation, and why pairwise reproductions failed.
How it was found
A temporary pytest plugin scanned every model manager for Mock instance attributes after each test and logged transitions per worker. Both leak sources (
unit/test_tasks.py,unit/tasks/test_jobs.py) were caught within two full parallel runs, attributed to the exact tests.The fix
All six sites become
mocker.patch.object(task.model.objects, 'get', return_value=...)— same behavior inside the test, automatically unwound by pytest-mock at teardown.ISSUE TYPE
COMPONENT NAME
ASCENDER VERSION
ADDITIONAL INFORMATION
Validation: five consecutive full parallel runs (
py.test --create-db -n auto --dist=loadfile awx/main/tests/unit awx/main/tests/functional awx/conf/tests awx/sso/tests) all returned 3476 passed, 0 failed, with the leak detector confirming no manager is ever left mocked. Before the fix, roughly two of every three such runs failed.Note: the
TransactionTestCaseclasses called out in the original issue text are innocent — their teardown flush verifiably leaves all tables empty. Test-only change; production code untouched. Independent of all open PRs.