Skip to content

Issue 8124 8126 fix 7660 7682 2 6266 after review#8154

Open
melton-jason wants to merge 56 commits into
v7_12_0_7_basefrom
issue-8124-8126-fix-7660-7682-2-after-review-2
Open

Issue 8124 8126 fix 7660 7682 2 6266 after review#8154
melton-jason wants to merge 56 commits into
v7_12_0_7_basefrom
issue-8124-8126-fix-7660-7682-2-after-review-2

Conversation

@melton-jason
Copy link
Copy Markdown
Contributor

@melton-jason melton-jason commented Jun 1, 2026

Taken from #8142 (comment) by @CarolineDenis:

Fixes #8124
Fixes #8126
Fixes #8101
Fixes #8065

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list
  • Add automated tests
  • Add a reverse migration if a migration is present in the PR
  • Add migration function to
    def fix_schema_config(stdout: WriteToStdOut | None = None):

Testing instructions

Summary by CodeRabbit

  • Bug Fixes

    • Corrected schema configuration references in data migrations
    • Fixed discipline and tree definition linking issues
    • Improved admin and permission initialization logic
    • Enhanced query logging controls based on debug settings
  • Refactor

    • Streamlined migration utilities for improved efficiency
    • Simplified multi-database routing during migrations
    • Optimized permission seeding with per-user checks
    • Restructured role assignment logic using set-based operations
  • Chores

    • Removed legacy debugging code and unused migration helpers
    • Updated function signatures to remove unnecessary parameters

Fix: compare full rule definitions before deleting anything in uniqueness

(cherry picked from commit c61a530)
(cherry picked from commit b1f0346)
(cherry picked from commit d4c0231)
(cherry picked from commit bab5083)
(cherry picked from commit 11b6608)
(cherry picked from commit af4c963)
(cherry picked from commit 57bca64)
(cherry picked from commit e07aaeb)
(cherry picked from commit b183031)
(cherry picked from commit 2e9e4ad)
(cherry picked from commit 2254fc4)
(cherry picked from commit d8bd82d)
@melton-jason
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
specifyweb/backend/permissions/initialize.py (1)

89-92: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Duplicate UserPolicy assignment.

UserPolicy is assigned on line 89 and again on line 91. The second assignment is redundant.

🧹 Suggested fix
 def assign_users_to_roles(apps=apps) -> None:
     Role = apps.get_model('permissions', 'Role')
     UserPolicy = apps.get_model('permissions', 'UserPolicy')
     Agent = apps.get_model('specify', 'Agent')
-    UserPolicy = apps.get_model('permissions', 'UserPolicy')
     UserRole = apps.get_model('permissions', 'UserRole')
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/backend/permissions/initialize.py` around lines 89 - 92, Remove
the redundant reassignment of UserPolicy: the code calls
apps.get_model('permissions', 'UserPolicy') twice (once assigned to UserPolicy
on the same block alongside Agent and UserRole); delete the duplicate line so
UserPolicy is only assigned once and keep the existing Agent and UserRole
assignments intact to avoid shadowing or confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specifyweb/specify/migration_utils/tectonic_ranks.py`:
- Around line 97-106: Tighten root-node detection by requiring both parent=None
and rankid=0 when identifying roots: update the Exists filter on TectonicUnit
(used in the TectonicUnitTreeDef.objects.annotate root_node_exists) to include
rankid=0 alongside parent=None and definition=OuterRef("pk"), and make the same
change in the revert/delete path that currently selects root candidates by name
(the logic around lines referenced for deletion) so both forward and revert
paths use parent=None AND rankid=0 when locating root TectonicUnit rows.

In `@specifyweb/specify/migrations/0002_geo.py`:
- Line 75: The migration call to create_default_collection_types currently omits
the migration DB alias and thus writes to the default DB; change the call in the
RunPython migration in 0002_geo.py to pass the migration alias (use
schema_editor.connection.alias, e.g. create_default_collection_types(apps,
schema_editor.connection.alias)), then update the
create_default_collection_types function signature to accept a db_alias
parameter and ensure all DB operations inside (queries/creates/updates)
explicitly use .using(db_alias) or the equivalent to avoid cross-database
writes.

---

Outside diff comments:
In `@specifyweb/backend/permissions/initialize.py`:
- Around line 89-92: Remove the redundant reassignment of UserPolicy: the code
calls apps.get_model('permissions', 'UserPolicy') twice (once assigned to
UserPolicy on the same block alongside Agent and UserRole); delete the duplicate
line so UserPolicy is only assigned once and keep the existing Agent and
UserRole assignments intact to avoid shadowing or confusion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1694a881-7df6-4fad-8cb1-edb41d3c353b

📥 Commits

Reviewing files that changed from the base of the PR and between 3202d36 and 39ec012.

📒 Files selected for processing (16)
  • specifyweb/backend/businessrules/migration_utils.py
  • specifyweb/backend/businessrules/migrations/0004_catnum_uniquerule.py
  • specifyweb/backend/businessrules/uniqueness_rules.py
  • specifyweb/backend/patches/migration_utils.py
  • specifyweb/backend/permissions/initialize.py
  • specifyweb/backend/stored_queries/execution.py
  • specifyweb/backend/stored_queries/utils.py
  • specifyweb/specify/api/utils.py
  • specifyweb/specify/management/commands/run_key_migration_functions.py
  • specifyweb/specify/migration_utils/default_cots.py
  • specifyweb/specify/migration_utils/sp7_schemaconfig.py
  • specifyweb/specify/migration_utils/tectonic_ranks.py
  • specifyweb/specify/migration_utils/update_schema_config.py
  • specifyweb/specify/migrations/0002_geo.py
  • specifyweb/specify/migrations/0008_ageCitations_fix.py
  • specifyweb/specify/migrations/0009_tectonic_ranks.py

Comment thread specifyweb/specify/migration_utils/tectonic_ranks.py
def consolidated_python_django_migration_operations(apps, schema_editor):
db_alias = schema_editor.connection.alias or 'migrator'
create_default_collection_types(apps, using=db_alias)
create_default_collection_types(apps)
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass the migration DB alias into create_default_collection_types to avoid cross-database writes.

This call now executes on the default connection, while the migration is alias-aware elsewhere; on non-default aliases, this can populate/update records in the wrong database.

Proposed fix (cross-file)
--- a/specifyweb/specify/migrations/0002_geo.py
+++ b/specifyweb/specify/migrations/0002_geo.py
@@
-        create_default_collection_types(apps)
+        create_default_collection_types(apps, using=db_alias)
--- a/specifyweb/specify/migration_utils/default_cots.py
+++ b/specifyweb/specify/migration_utils/default_cots.py
@@
-def create_default_collection_types(apps):
+def create_default_collection_types(apps, using='default'):
@@
-    for collection in Collection.objects.filter(collectionobjecttype__isnull=True):
+    for collection in Collection.objects.using(using).filter(collectionobjecttype__isnull=True):
@@
-        cot, created = Collectionobjecttype.objects.get_or_create(
+        cot, created = Collectionobjecttype.objects.using(using).get_or_create(
@@
-        Collectionobject.objects.filter(
+        Collectionobject.objects.using(using).filter(
             collection=collection).update(collectionobjecttype=cot)
         collection.collectionobjecttype = cot
-        collection.save()
+        collection.save(using=using)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/specify/migrations/0002_geo.py` at line 75, The migration call to
create_default_collection_types currently omits the migration DB alias and thus
writes to the default DB; change the call in the RunPython migration in
0002_geo.py to pass the migration alias (use schema_editor.connection.alias,
e.g. create_default_collection_types(apps, schema_editor.connection.alias)),
then update the create_default_collection_types function signature to accept a
db_alias parameter and ensure all DB operations inside (queries/creates/updates)
explicitly use .using(db_alias) or the equivalent to avoid cross-database
writes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented a more general way of ensuring all current and future functions within run_key_migration_functions will be executed with the same database connection that Migrations are run in.
The important changes are included in
9f9df79 (this PR), can you take a look and give me your thoughts?

The only difference between the migrations database and the default database is the user that is used to connect.
Ideally, the functions executed as part of run_key_migration_functions should never require the elevated level of permissions that is provided to the migrations user (the ability to perform DDL statements like dropping and creating tables, indices, etc.).
With this in mind, I am generally on the fence about whether we should even allow the migrator user to perform run_key_migration_functions: they should only be data movements, and never DDL statements.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
gh api repos/specify/specify7/commits/9f9df797b631157353253296059e5cb7f2ec73fd --jq '.files[] | "=== \(.filename) ===\n\(.patch)"'

Length of output: 10197


🏁 Script executed:

#!/bin/bash
# Check how 'migrations' alias is set up in settings
rg -n "migrations" specifyweb/settings/__init__.py -A 3 -B 3

# Check if there's a fallback/guard for the 'migrations' alias
rg -n "migrations" specifyweb/specify/migration_utils/router.py

Length of output: 748



@melton-jason The approach is elegant — a ContextVar-based router is the right tool here: it's thread-safe, requires no signature changes across utility functions, and scales automatically to future functions added to run_key_migration_functions. A few thoughts:

What works well:

  • The ContextVar/MigrationRouter combination is clean and idiomatic Django. Returning None implicitly when the flag is unset means it has zero impact on normal ORM calls outside of run_key_migration_functions.
  • The 'migrations' alias is always defined in settings/__init__.py, so there's no risk of ConnectionDoesNotExist.
  • The @contextmanager decorator makes _GeneratorContextManager a ContextDecorator (since Python 3.2), so the decorator-style example in the docstring (@use_migration_connection()) is valid.

One cosmetic nit:
The module-level docstring in router.py appears between the use_migration_connection function and the MigrationRouter class, rather than at the top or directly above the class. It would read more clearly if placed directly above class MigrationRouter: as a class docstring or a # comment block.

On the design question you raised:
Your instinct is reasonable. Since run_key_migration_functions is intended to be DML-only, unconditionally routing it through the 'migrations' connection does hand it DDL-level credentials unnecessarily. A few options:

  1. Keep current approach — favors parity with migration execution context; acceptable if the migrator user's extra privileges don't pose a security concern in your deployment.
  2. Remove use_migration_connection() from run_key_migration_functions — it would then use 'default' (app user, DML-only credentials), which matches the principle of least privilege. The risk is that future contributors accidentally add DDL-requiring calls.
  3. Add an opt-in flag (e.g. --use-migration-connection) — makes the intent explicit and allows both cases.

Option 2 seems like the safer long-term default given your stated goal, but it's your call based on how the command is invoked in practice.

Overall the original concern about cross-database writes is now properly resolved via this approach.

@github-project-automation github-project-automation Bot moved this from 📋Back Log to Dev Attention Needed in General Tester Board Jun 4, 2026
Comment on lines +521 to +525
# REFACTOR: See if we can simplify this even further. We should be able
# to collapse this query -> iteration -> check workflow to a single
# query.
# That would eliminate the N + 1 problem with this current approach,
# where every scoped rule needs to be evaluated.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note for discussion)

I left this comment in the code, but I'll leave this as a comment on the PR for visibility.
There was a noticeable change in the asymptotic growth (time complexity) with this change.

Noticeably, the previous implementation of fix_global_default_rules:

  • Iterated over each distinct non-null DisciplineID
  • For each DisciplineID, performed a minimum of 2 queries to identify the "duplicate" scoped Uniqueness Rule and another 2 queries to delete them

Importantly, the number of Queries and iteration only grew on the number of Disciplines. So assuming the number of Disciplines remained the same, there could be any number of added Uniqueness Rules within those Disciplines without seeing a slowdown or additional database queries.
(This is only partly/theoretically true: there will be a slowdown for those queries if there is a sufficiently large amount of Uniqueness Rules and the queries are performed on unindexed fields or in a way that can't utilize the indexes, though I'm holding that slowdown as negligible or constant between the two approaches).

I've cleaned up the new approach and eliminated extra queries in 6e37e1e (this PR), but there still is a change in the approach:

  • Now we make a DB query to fetch all Uniqueness Rules that are scoped (have a non-null DisciplineID)
  • For each rule, we check whether there is an identical unscoped Uniqueness Rule
  • If there is an identified duplicate, we delete the Uniqueness Rule

Our "bottleneck" is now largely on the amount of scoped Uniqueness Rules, rather than the number of Disciplines.
Practically, I'm not sure how much of a slowdown this would introduce.
If we have the time, we can try and resolve this slowdown (I imagine we can use and filter on annotations on the query for scoped rules for duplicate rules, or just revert back to something like the past approach if there are no bugs), and/or we can evaluate how much of a slowdown this introduces.

# The ordering here for checks here is intentional: it's more likely a
# user has Sp 7 permissions than being an admin, so we do the former
# check first
if is_sp6_user_permissions_migrated(user=user, apps=apps):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note here on why I brought back is_sp6_user_permissions_migrated)

I brought back is_sp6_user_permissions_migrated here.
It was needed for the following workflow:

  • A user is an admin in Specify 6, and should not be an admin in Specify 7
  • In this scenario, the user's Specify 7 Institution Admin privileges have been explicitly revoked

Previously, because this is a part of Run Key Migration Functions, the user in question would be re-made into an Institutional Admin, even if they were explicitly removed as an admin.

With is_sp6_user_permissions_migrated, we assume that users that don't have Sp 7 permissions can't have been intentionally removed as Specify 7 Institution Admins, so we can safely migrate their Admin preference. Otherwise, we can't tell if the user was intentionally removed as an admin

Comment thread specifyweb/specify/migration_utils/tectonic_ranks.py
def consolidated_python_django_migration_operations(apps, schema_editor):
db_alias = schema_editor.connection.alias or 'migrator'
create_default_collection_types(apps, using=db_alias)
create_default_collection_types(apps)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented a more general way of ensuring all current and future functions within run_key_migration_functions will be executed with the same database connection that Migrations are run in.
The important changes are included in
9f9df79 (this PR), can you take a look and give me your thoughts?

The only difference between the migrations database and the default database is the user that is used to connect.
Ideally, the functions executed as part of run_key_migration_functions should never require the elevated level of permissions that is provided to the migrations user (the ability to perform DDL statements like dropping and creating tables, indices, etc.).
With this in mind, I am generally on the fence about whether we should even allow the migrator user to perform run_key_migration_functions: they should only be data movements, and never DDL statements.

@melton-jason
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
specifyweb/backend/businessrules/uniqueness_rules.py (1)

531-548: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bind this cleanup transaction to the same DB alias as the ORM writes.

With the migration-router changes in this PR, UniquenessRule can be written on a non-default alias, but transaction.atomic() here still opens a transaction on the default connection. If an error occurs mid-cleanup, the deletes on the routed DB are no longer protected by this atomic block.

Suggested fix
+    db_alias = router.db_for_write(UniquenessRule)
-    with transaction.atomic():
+    with transaction.atomic(using=db_alias):
         # REFACTOR: See if we can simplify this even further. We should be able
         # to collapse this query -> iteration -> check workflow to a single
         # query.
         # That would eliminate the N + 1 problem with this current approach,
         # where every scoped rule needs to be evaluated.
-        for rule in UniquenessRule.objects.exclude(discipline__isnull=True).prefetch_related("uniquenessrulefield_set"):
+        for rule in (
+            UniquenessRule.objects.using(db_alias)
+            .exclude(discipline__isnull=True)
+            .prefetch_related("uniquenessrulefield_set")
+        ):

If this helper is migration-only, passing schema_editor.connection.alias down from the caller would be safer still.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/backend/businessrules/uniqueness_rules.py` around lines 531 - 548,
The transaction.atomic() block must be bound to the same DB alias used for
writing UniquenessRule records; change the cleanup to accept an explicit DB
alias (e.g. pass schema_editor.connection.alias from the caller if
migration-only) and use transaction.atomic(using=alias) and operate on the
routed DB by calling UniquenessRule.objects.using(alias).exclude(...) and use
.using(alias) for the queryset deletes
(rule.uniquenessrulefield_set.all().using(alias).delete() and
rule.delete(using=alias) or delete via the queryset) so all reads/deletes are
executed and rolled back on the same connection.
♻️ Duplicate comments (1)
specifyweb/specify/migration_utils/default_cots.py (1)

33-63: ⚠️ Potential issue | 🟠 Major

Keep these migration helpers alias-aware.

Dropping the explicit DB alias here means 0002_geo now runs these reads/writes on Django’s default connection unless every caller re-routes them some other way. On non-default migration runs, that seeds and backfills the wrong database.

In Django 4.2 `RunPython` migrations, do queryset reads/writes and `model.save()` automatically use `schema_editor.connection.alias`, or must migrations explicitly call `.using(schema_editor.connection.alias)` / `save(using=...)`?

Also applies to: 65-86, 111-123

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/specify/migration_utils/default_cots.py` around lines 33 - 63, The
helper create_default_discipline_for_tree_defs (and the other helpers at the
noted ranges) currently perform queries and .save() calls without honoring the
migration DB alias; change their signatures to accept (apps, schema_editor), get
alias = schema_editor.connection.alias, and perform all queries as
Model.objects.using(alias).all() (e.g., Discipline.objects.using(alias).all(),
Institution.objects.using(alias).all()) and call .save(using=alias) when saving
tree definition instances; update any other helpers mentioned (lines 65-86,
111-123) to follow the same pattern so reads/writes run on
schema_editor.connection.alias rather than the default DB.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specifyweb/backend/businessrules/uniqueness_rules.py`:
- Around line 198-206: The migration gate and cleanup hard-code the default DB
and cache key, causing DB-alias leakage; update
_initial_businessrules_migration_applied and
_cached_businessrules_migration_applied to accept a connection or alias (or
derive it from the registry/registry.apps) instead of using
connections["default"], and use a cache key that includes the connection.alias;
update validate_unique to pass the right alias/connection when calling
_cached_businessrules_migration_applied. In fix_global_default_rules (migration
0008_fix_global_default_rules.py) run cleanup against schema_editor.connection
by wrapping operations in
transaction.atomic(using=schema_editor.connection.alias) and perform
queries/deletes with Model.objects.using(schema_editor.connection.alias) (or
QuerySet.using) so all reads/writes occur on the migration-provided
connection/transaction. Ensure all references mention the functions
_initial_businessrules_migration_applied,
_cached_businessrules_migration_applied, validate_unique, and
fix_global_default_rules/schema_editor.connection.alias.

In `@specifyweb/specify/migrations/0009_tectonic_ranks.py`:
- Around line 17-46: Narrow the reverse deletion to only remove the exact
default chain created by this migration by adding checks for the exact
parent/child chain and names instead of deleting any "Root" or leaf with that
name: update the TectonicUnit queryset to require a Root node whose single child
is named "Tectonic Unit" (and that child’s single child is "Tectonic Subunit" if
applicable), ensure counts/no extra siblings (e.g. annotate child_count via
Subquery/Count and require expected counts), and similarly tighten the
TectonicUnitTreeDefItem filter to assert the exact tree shape (Root -> "Tectonic
Unit" -> "Tectonic Subunit") via Exists/Subquery checks against parent and
definitionitem relationships before calling .delete() so only the specific chain
created by this migration is removed.

---

Outside diff comments:
In `@specifyweb/backend/businessrules/uniqueness_rules.py`:
- Around line 531-548: The transaction.atomic() block must be bound to the same
DB alias used for writing UniquenessRule records; change the cleanup to accept
an explicit DB alias (e.g. pass schema_editor.connection.alias from the caller
if migration-only) and use transaction.atomic(using=alias) and operate on the
routed DB by calling UniquenessRule.objects.using(alias).exclude(...) and use
.using(alias) for the queryset deletes
(rule.uniquenessrulefield_set.all().using(alias).delete() and
rule.delete(using=alias) or delete via the queryset) so all reads/deletes are
executed and rolled back on the same connection.

---

Duplicate comments:
In `@specifyweb/specify/migration_utils/default_cots.py`:
- Around line 33-63: The helper create_default_discipline_for_tree_defs (and the
other helpers at the noted ranges) currently perform queries and .save() calls
without honoring the migration DB alias; change their signatures to accept
(apps, schema_editor), get alias = schema_editor.connection.alias, and perform
all queries as Model.objects.using(alias).all() (e.g.,
Discipline.objects.using(alias).all(), Institution.objects.using(alias).all())
and call .save(using=alias) when saving tree definition instances; update any
other helpers mentioned (lines 65-86, 111-123) to follow the same pattern so
reads/writes run on schema_editor.connection.alias rather than the default DB.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 823940ce-81d2-4931-804a-bacdc15b9a66

📥 Commits

Reviewing files that changed from the base of the PR and between 39ec012 and 9ddf13e.

📒 Files selected for processing (9)
  • specifyweb/backend/businessrules/uniqueness_rules.py
  • specifyweb/settings/__init__.py
  • specifyweb/specify/management/commands/run_key_migration_functions.py
  • specifyweb/specify/migration_utils/default_cots.py
  • specifyweb/specify/migration_utils/misc_migrations.py
  • specifyweb/specify/migration_utils/router.py
  • specifyweb/specify/migration_utils/tectonic_ranks.py
  • specifyweb/specify/migrations/0002_geo.py
  • specifyweb/specify/migrations/0009_tectonic_ranks.py
💤 Files with no reviewable changes (1)
  • specifyweb/specify/migration_utils/misc_migrations.py
✅ Files skipped from review due to trivial changes (1)
  • specifyweb/settings/init.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • specifyweb/specify/management/commands/run_key_migration_functions.py

Comment on lines +198 to +206
# If we're in a migration where businessrules have not been loaded and/or
# the initial businessrule migration has not been applied, then skip
# checking the rule for now.
# Note that the former can exist where the latter does: if we're reversing
# a migration which does not have a dependency on businessrules (so the
# businessrules app does not need to be loaded) but the businessrule
# migration is still applied
if not businessrule_app_is_ready(registry) or not _cached_businessrules_migration_applied():
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="specifyweb/backend/businessrules/uniqueness_rules.py"

echo "== File: $FILE =="
wc -l "$FILE"

echo
echo "== Function locations =="
rg -n "def (_initial_businessrules_migration_applied|_cached_businessrules_migration_applied|businessrule_app_is_ready)\b" "$FILE" || true

echo
echo "== Cache definition / usage (near _uniqueness_migration_cache) =="
rg -n "_uniqueness_migration_cache" "$FILE" || true

echo
echo "== validate_unique snippet around lines 170-240 =="
nl -ba "$FILE" | sed -n '160,240p'

# Print around the helper defs (by ranges around the matched line numbers)
echo
echo "== Top of file imports/context (first 120 lines) =="
nl -ba "$FILE" | sed -n '1,120p'

Repository: specify/specify7

Length of output: 810


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="specifyweb/backend/businessrules/uniqueness_rules.py"

echo "== Cache definition block =="
sed -n '40,140p' "$FILE" | cat -n

echo
echo "== _initial_businessrules_migration_applied() body =="
sed -n '80,120p' "$FILE" | cat -n

echo
echo "== businessrule_app_is_ready(registry) body =="
sed -n '96,140p' "$FILE" | cat -n

echo
echo "== _cached_businessrules_migration_applied() body =="
sed -n '100,140p' "$FILE" | cat -n

echo
echo "== validate_unique early-return block (context) =="
sed -n '160,220p' "$FILE" | cat -n

Repository: specify/specify7

Length of output: 13770


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="specifyweb/backend/businessrules/uniqueness_rules.py"

rg -n "def fix_global_default_rules" "$FILE"
# Print around the function
sed -n '220,340p' "$FILE" | cat -n

Repository: specify/specify7

Length of output: 5308


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="specifyweb/backend/businessrules/uniqueness_rules.py"

echo "== fix_global_default_rules() around its definition =="
sed -n '480,560p' "$FILE" | cat -n

echo
echo "== Any transaction.atomic usage in the file =="
rg -n "transaction\.atomic\(" "$FILE"

Repository: specify/specify7

Length of output: 3446


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "fix_global_default_rules\(" -S specifyweb | head -n 50

Repository: specify/specify7

Length of output: 276


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="specifyweb/backend/businessrules/migrations/0008_fix_global_default_rules.py"

echo "== File =="
wc -l "$FILE"
echo
echo "== Contents =="
cat -n "$FILE"

Repository: specify/specify7

Length of output: 781


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="specifyweb/backend/businessrules/uniqueness_rules.py"
rg -n "router" "$FILE" | head -n 50
rg -n "from django\.db import .*router" "$FILE" || true

Repository: specify/specify7

Length of output: 176


Fix DB-alias leakage in the businessrules migration gate and the global-rule cleanup.

  • validate_unique() calls _cached_businessrules_migration_applied(), but both _initial_businessrules_migration_applied() and _cached_businessrules_migration_applied() hard-code connections["default"] and cache key "default". On non-default DB aliases this can check the wrong migration state and incorrectly skip uniqueness enforcement.
  • fix_global_default_rules() wraps cleanup in with transaction.atomic(): (default connection) and performs deletes/reads without tying them to schema_editor.connection.alias. Migration 0008_fix_global_default_rules.py ignores schema_editor, so the cleanup may run against the wrong database/transaction boundary.
Suggested fix (migration gate)
-def _initial_businessrules_migration_applied():
+def _initial_businessrules_migration_applied(db_alias: str):
     return any(
         app == "businessrules" and migration_name == "0001_initial"
         for app, migration_name in MigrationRecorder(
-            connections["default"]
+            connections[db_alias]
         ).applied_migrations()
     )

-def _cached_businessrules_migration_applied() -> bool:
-    cache_key = "default"
+def _cached_businessrules_migration_applied(db_alias: str) -> bool:
+    cache_key = db_alias
     cache_is_active, is_set = _uniqueness_migration_cache.get(cache_key, default=False)
     if cache_is_active and is_set:
         return True
     is_applied = _initial_businessrules_migration_applied(db_alias)
     if cache_is_active and is_applied:
         _uniqueness_migration_cache.set(cache_key, is_applied)
     return is_applied

+    db_alias = instance._state.db or router.db_for_write(model, instance=instance)
-    if not businessrule_app_is_ready(registry) or not _cached_businessrules_migration_applied():
+    if not businessrule_app_is_ready(registry) or not _cached_businessrules_migration_applied(db_alias):
         return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/backend/businessrules/uniqueness_rules.py` around lines 198 - 206,
The migration gate and cleanup hard-code the default DB and cache key, causing
DB-alias leakage; update _initial_businessrules_migration_applied and
_cached_businessrules_migration_applied to accept a connection or alias (or
derive it from the registry/registry.apps) instead of using
connections["default"], and use a cache key that includes the connection.alias;
update validate_unique to pass the right alias/connection when calling
_cached_businessrules_migration_applied. In fix_global_default_rules (migration
0008_fix_global_default_rules.py) run cleanup against schema_editor.connection
by wrapping operations in
transaction.atomic(using=schema_editor.connection.alias) and perform
queries/deletes with Model.objects.using(schema_editor.connection.alias) (or
QuerySet.using) so all reads/writes occur on the migration-provided
connection/transaction. Ensure all references mention the functions
_initial_businessrules_migration_applied,
_cached_businessrules_migration_applied, validate_unique, and
fix_global_default_rules/schema_editor.connection.alias.

Comment on lines +17 to +46
TectonicUnit.objects.annotate(
has_children_nodes=Exists(
TectonicUnit.objects.filter(
parent=OuterRef("pk")
)
)
).filter(
parent__isnull=True,
has_children_nodes=False,
name="Root"
).delete()

# Delete the Root TectonicUnit rank if there are no nodes in the tree and
# no children rank reference the Root rank
TectonicUnitTreeDefItem.objects.annotate(
has_nodes=Exists(
TectonicUnit.objects.filter(
definitionitem=OuterRef("pk")
)
),
has_child_rank=Exists(
TectonicUnitTreeDefItem.objects.filter(
parent=OuterRef("pk")
)
)
).filter(
has_nodes=False,
has_child_rank=False,
name="Root"
).delete()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Tighten the rollback scope before deleting tectonic data.

These reverse helpers still match on generic names/leaf-ness, not on records this migration can prove it created. A rollback can therefore strip pre-existing custom trees that happen to use Root, Tectonic Unit, Tectonic Subunit, etc. Please scope the deletes to the exact default chain/tree shape created by the forward migration instead of pruning by name alone.

Also applies to: 55-103

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specifyweb/specify/migrations/0009_tectonic_ranks.py` around lines 17 - 46,
Narrow the reverse deletion to only remove the exact default chain created by
this migration by adding checks for the exact parent/child chain and names
instead of deleting any "Root" or leaf with that name: update the TectonicUnit
queryset to require a Root node whose single child is named "Tectonic Unit" (and
that child’s single child is "Tectonic Subunit" if applicable), ensure counts/no
extra siblings (e.g. annotate child_count via Subquery/Count and require
expected counts), and similarly tighten the TectonicUnitTreeDefItem filter to
assert the exact tree shape (Root -> "Tectonic Unit" -> "Tectonic Subunit") via
Exists/Subquery checks against parent and definitionitem relationships before
calling .delete() so only the specific chain created by this migration is
removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Dev Attention Needed

3 participants