Skip to content

Enhancements to the aggregation-based eager load mechanism#6158

Open
cperezabo wants to merge 12 commits into
mongodb:masterfrom
cperezabo:eager-load-missing
Open

Enhancements to the aggregation-based eager load mechanism#6158
cperezabo wants to merge 12 commits into
mongodb:masterfrom
cperezabo:eager-load-missing

Conversation

@cperezabo

@cperezabo cperezabo commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Hi folks, this PR closes a few gaps in eager_load. Each case below either wasn't supported or crashed before but now they resolve in the same single aggregation as the rest.

1. Associations defined only on a subclass

When you query through a superclass, eager_load now finds associations declared only on a subclass.

class Gadget
  include Mongoid::Document
end

class Speaker < Gadget
  has_many :cables
end

Gadget.eager_load(:cables) # resolves :cables through Speaker

Works for has_many, has_one, has_and_belongs_to_many and belongs_to.

2. References that live inside embedded documents

A belongs_to sitting in an embedded document is now eager-loaded through its embedding parent, with no extra query:

class Device
  include Mongoid::Document
end

class Port
  include Mongoid::Document
  embedded_in :computer
  belongs_to :device
end

class Computer
  include Mongoid::Document
  embeds_one :port
end

Computer.eager_load(port: :device).first.port.device # no extra query

This holds true regardless of how deep the embedding goes, even when the embedded chain is attached to a referenced parent.

Desk.eager_load(computer: { port: { pin: :device } }) # referenced → embedded → embedded → referenced

has_and_belongs_to_many inside an embedded document works too — each element keeps its own matches:

class Port
  include Mongoid::Document
  embedded_in :computer
  has_and_belongs_to_many :devices
end

Computer.eager_load(port: :devices)

3. Polymorphic belongs_to

A polymorphic belongs_to can't be a single $lookup (its target collection changes per row), so it's resolved with one extra aggregation that fans out per type:

class Cartridge
  include Mongoid::Document
  belongs_to :device, polymorphic: true # Printer, Scanner, ...
end

Cartridge.eager_load(:device) # root query + one faceted query, regardless of how many types

Everything resolves in a single aggregation, except the polymorphic case, which adds exactly one faceted query.


While trying these features out in a real project I also ran into two pre-existing eager_load bugs, fixed here in their own commits:

  • Model.eager_load([]).first used to raise instead of just returning the documents; now it falls back to the normal path.
  • a has_many whose target shares its collection with sibling subclasses pulled those siblings in too (crashing while materializing one whose discriminator didn't resolve); it now filters by the target subclass, the way a normal query does.

@cperezabo cperezabo requested a review from a team as a code owner June 15, 2026 23:25
@cperezabo cperezabo requested a review from jamis June 15, 2026 23:25
@cperezabo cperezabo force-pushed the eager-load-missing branch from b49c10f to e52ae5a Compare June 16, 2026 01:58

@jamis jamis left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is elegant! Definitely a challenge to reason about cleanly, which gives me a bit of (maintenance-related) concern, but I think adding some documentation to the private methods will help with that.

Comment thread lib/mongoid/association/eager_loadable.rb Outdated

# Fetch every target in one aggregation: a $facet with a $lookup branch per
# type. Returns the targets as { type => { primary_key => document } }.
def fetch_polymorphic_targets(assoc, keys_by_type)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cross_cluster_inclusions method now (correctly) skips polymorphic associations, but I think that just means this method (fetch_polymorphic_targets) needs to make sure a single generated pipeline at this end does not attempt to reference collections in disparate clusters. The risk here is that someone might issue an eager_load on polymorphic collections that exist in different clusters, and the fetch at this point will then fail (silently or otherwise).

@cperezabo

Copy link
Copy Markdown
Contributor Author

My intention is, once this PR is merged, refactor and reificate some abstractions so this can be easier to maintain and extend. I haven't do it because I didn't want to make the PR longer. I decided to keep the already existing god-class and its imperative code as-is and move forward.
So, this is the first step: tests + implementation that works.
Then, refactoring.

@cperezabo

cperezabo commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

The good part is that everything is covered with easy-to-read tests. So they are well defined guardrails to make refactoring.

@jamis

jamis commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

My intention is, once this PR is merged, refactor and reificate some abstractions so this can be easier to maintain and extend. I haven't do it because I didn't want to make the PR longer. I decided to keep the already existing god-class and its imperative code as-is and move forward. So, this is the first step: tests + implementation that works. Then, refactoring.

I can appreciate that. However, if the refactoring and other work relates directly to this feature, I'd rather see a single complete PR than a series of PR's that continue iterating on the same theme. On the other hand, if the refactoring is broader in scope and does not directly relate to this feature, then it makes sense to do it in a separate PR.

The goal (on my end) is to make sure any code that lands in master is production ready. If I were planning to cut a new release of Mongoid tomorrow, would this PR be of sufficient quality? If (heaven forbid!) you were to be hit by a bus tomorrow, and your proposed refactorings never land, would the PR as it currently stands be "good enough"? I think it's almost there; if you make at least the changes I proposed, I'd be happy to merge it at that point.

@cperezabo

Copy link
Copy Markdown
Contributor Author

I already did the changes you mentioned. You can see the commits just above your last comment haha. I'm now working in a full refactoring by the way.

@jamis

jamis commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Sorry for the delay -- there was a very recent change to the i18n gem that broke our CI runs with Ruby 3.1, and I was trying to get it fixed. :) This looks great, thank you! Could you pull in the latest changes from master, so I can get the tests here green?

A polymorphic belongs_to has no single target collection, so it cannot be expressed as a $lookup. After the root documents are loaded, the polymorphic targets are fetched with a single aggregation whose $facet has one branch per distinct type, then assigned back to each document by type and key.
A $lookup only reaches collections in the root's own database, so eager loading
a polymorphic belongs_to whose target is stored in another database (or cluster)
silently returned nil for that target: the generated $facet referenced a
collection that was not there.

Split the types by location: the ones sharing the root's database are still
fetched together in a single $facet, while a type kept elsewhere is read
directly through its own model, which connects with that model's client.
@cperezabo cperezabo force-pushed the eager-load-missing branch from cb9d2cc to 109e3a9 Compare June 22, 2026 01:46
@cperezabo

Copy link
Copy Markdown
Contributor Author

Hi @jamis, can you take a look at the changes?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances Mongoid’s aggregation-based Criteria#eager_load implementation to cover additional association shapes and to close a couple of previously-crashing edge cases, aiming to keep eager loading to a single aggregation where possible (with one additional faceted query for polymorphic belongs_to).

Changes:

  • Extend inclusion resolution so eager_load can find associations declared only on subclasses of the queried class.
  • Introduce a new $lookup pipeline builder that supports distributing looked-up results into embedded documents (including deeply nested embedded paths).
  • Add a polymorphic belongs_to eager-load path that resolves targets with one additional fan-out query (and handles out-of-database targets).

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
spec/mongoid/criteria/includable_spec.rb Adds extensive coverage for subclass-only associations, embedded-reference chains, polymorphic eager loading, and prior crash/regression cases.
lib/mongoid/criteria/includable.rb Changes inclusion extraction to resolve eager-load associations from descendants as well as the parent class.
lib/mongoid/association/relatable.rb Adds a many_to_many? helper used by embedded distribution logic.
lib/mongoid/association/eager.rb Removes the old $lookup execution path from the generic eager preloader (now handled elsewhere).
lib/mongoid/association/eager_loadable.rb Refactors $lookup eager loading to use the new pipeline builder and adds polymorphic post-processing.
lib/mongoid/association/eager_load/* New pipeline builder, inclusion tree, embedded distributor, and polymorphic preloading/target resolution components.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/mongoid/association/eager_load/embedded_distributor.rb
Comment on lines +120 to +127
# The matches that belong to a single embedded element.
def correlated_matches(element)
{ '$filter' => {
'input' => "$#{temporary_field}",
'as' => 'match',
'cond' => { match_operator => [ "$$match.#{@foreign_field}", "#{element}.#{@local_field}" ] }
} }
end
Comment thread lib/mongoid/association/eager_loadable.rb Outdated
Comment on lines +128 to +136
def resolve_inclusion_associations(parent_class, relation, is_eager_load)
if association = parent_class.reflect_on_association(relation)
return [ association ]
end

return [] unless is_eager_load

parent_class.descendants.filter_map { |sub| sub.reflect_on_association(relation) }
end

@jamis jamis left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this refactoring -- it cleans a lot of things up nicely. There are a few things I've noted, and Copilot raises some good points as well (though I think some of them are probably pre-existing issues that Copilot is only surfacing because they were moved around in the refactoring; if so, feel free to indicate that and we can address them in a different PR).

Comment thread lib/mongoid/association/eager_load/inclusion.rb
Comment thread lib/mongoid/association/eager_load/inclusion.rb
Comment thread lib/mongoid/association/eager_load/inclusion_tree.rb
Comment thread lib/mongoid/association/eager_load/lookup_pipeline.rb
Comment thread lib/mongoid/association/eager_load/polymorphic_preloader.rb
Comment thread lib/mongoid/association/eager_load/polymorphic_targets.rb Outdated
Comment thread lib/mongoid/association/eager.rb Outdated
A $lookup joins only within the same client and database, but the fallback
check compared only the client. An inclusion whose target uses
store_in(database:) -- same client, different database -- silently returned
empty results instead of falling back. Compare the database as well.
Distributing a referenced association nested in an embeds_one used
$mergeObjects, which synthesized the embedded document even when it was
missing -- an absent port became { device: [...] }. Guard the merge so a
missing embeds_one stays absent instead of being fabricated from its matches.
When more than one subclass defines an association with the same name but a
different target, eager_load emitted one $lookup per subclass all writing to
the same field, so the last overwrote the rest and a sibling's matches silently
disappeared. Look each subclass's inclusion, with its own nested children, into
its own temporary field and route every document to its own by the discriminator.
The new eager-load classes are implementation detail, so mark them @api
private, and add YARD to the public methods that lacked it.

Removing parameters from Eager#initialize changed its public API. Keep the
use_lookup/pipeline parameters and the $lookup branch in #run, and have
preload_for_lookup delegate to a new Eager.run class method that wraps
new(...).run.

Update the eager_spec warning expectation for the new cluster/database message.
@cperezabo

Copy link
Copy Markdown
Contributor Author

Well, I've addressed:

  • @api private on the new classes (@jamis)
  • YARD where it was missing (@jamis)
  • restored Eager signature to prevent breaking change (@jamis)
  • cross-database fallback (Copilot)
  • embeds_one synthesis (Copilot)
  • same-name-across-subclasses overwrite (Copilot)

Regarding this, I couldn't reproduce, so left it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants