Enhancements to the aggregation-based eager load mechanism#6158
Enhancements to the aggregation-based eager load mechanism#6158cperezabo wants to merge 12 commits into
Conversation
b49c10f to
e52ae5a
Compare
jamis
left a comment
There was a problem hiding this comment.
This is elegant! Definitely a challenge to reason about cleanly, which gives me a bit of (maintenance-related) concern, but I think adding some documentation to the private methods will help with that.
|
|
||
| # Fetch every target in one aggregation: a $facet with a $lookup branch per | ||
| # type. Returns the targets as { type => { primary_key => document } }. | ||
| def fetch_polymorphic_targets(assoc, keys_by_type) |
There was a problem hiding this comment.
The cross_cluster_inclusions method now (correctly) skips polymorphic associations, but I think that just means this method (fetch_polymorphic_targets) needs to make sure a single generated pipeline at this end does not attempt to reference collections in disparate clusters. The risk here is that someone might issue an eager_load on polymorphic collections that exist in different clusters, and the fetch at this point will then fail (silently or otherwise).
|
My intention is, once this PR is merged, refactor and reificate some abstractions so this can be easier to maintain and extend. I haven't do it because I didn't want to make the PR longer. I decided to keep the already existing god-class and its imperative code as-is and move forward. |
|
The good part is that everything is covered with easy-to-read tests. So they are well defined guardrails to make refactoring. |
I can appreciate that. However, if the refactoring and other work relates directly to this feature, I'd rather see a single complete PR than a series of PR's that continue iterating on the same theme. On the other hand, if the refactoring is broader in scope and does not directly relate to this feature, then it makes sense to do it in a separate PR. The goal (on my end) is to make sure any code that lands in master is production ready. If I were planning to cut a new release of Mongoid tomorrow, would this PR be of sufficient quality? If (heaven forbid!) you were to be hit by a bus tomorrow, and your proposed refactorings never land, would the PR as it currently stands be "good enough"? I think it's almost there; if you make at least the changes I proposed, I'd be happy to merge it at that point. |
|
I already did the changes you mentioned. You can see the commits just above your last comment haha. I'm now working in a full refactoring by the way. |
|
Sorry for the delay -- there was a very recent change to the i18n gem that broke our CI runs with Ruby 3.1, and I was trying to get it fixed. :) This looks great, thank you! Could you pull in the latest changes from master, so I can get the tests here green? |
A polymorphic belongs_to has no single target collection, so it cannot be expressed as a $lookup. After the root documents are loaded, the polymorphic targets are fetched with a single aggregation whose $facet has one branch per distinct type, then assigned back to each document by type and key.
A $lookup only reaches collections in the root's own database, so eager loading a polymorphic belongs_to whose target is stored in another database (or cluster) silently returned nil for that target: the generated $facet referenced a collection that was not there. Split the types by location: the ones sharing the root's database are still fetched together in a single $facet, while a type kept elsewhere is read directly through its own model, which connects with that model's client.
cb9d2cc to
109e3a9
Compare
|
Hi @jamis, can you take a look at the changes? |
There was a problem hiding this comment.
Pull request overview
This PR enhances Mongoid’s aggregation-based Criteria#eager_load implementation to cover additional association shapes and to close a couple of previously-crashing edge cases, aiming to keep eager loading to a single aggregation where possible (with one additional faceted query for polymorphic belongs_to).
Changes:
- Extend inclusion resolution so
eager_loadcan find associations declared only on subclasses of the queried class. - Introduce a new
$lookuppipeline builder that supports distributing looked-up results into embedded documents (including deeply nested embedded paths). - Add a polymorphic
belongs_toeager-load path that resolves targets with one additional fan-out query (and handles out-of-database targets).
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/mongoid/criteria/includable_spec.rb | Adds extensive coverage for subclass-only associations, embedded-reference chains, polymorphic eager loading, and prior crash/regression cases. |
| lib/mongoid/criteria/includable.rb | Changes inclusion extraction to resolve eager-load associations from descendants as well as the parent class. |
| lib/mongoid/association/relatable.rb | Adds a many_to_many? helper used by embedded distribution logic. |
| lib/mongoid/association/eager.rb | Removes the old $lookup execution path from the generic eager preloader (now handled elsewhere). |
| lib/mongoid/association/eager_loadable.rb | Refactors $lookup eager loading to use the new pipeline builder and adds polymorphic post-processing. |
| lib/mongoid/association/eager_load/* | New pipeline builder, inclusion tree, embedded distributor, and polymorphic preloading/target resolution components. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # The matches that belong to a single embedded element. | ||
| def correlated_matches(element) | ||
| { '$filter' => { | ||
| 'input' => "$#{temporary_field}", | ||
| 'as' => 'match', | ||
| 'cond' => { match_operator => [ "$$match.#{@foreign_field}", "#{element}.#{@local_field}" ] } | ||
| } } | ||
| end |
| def resolve_inclusion_associations(parent_class, relation, is_eager_load) | ||
| if association = parent_class.reflect_on_association(relation) | ||
| return [ association ] | ||
| end | ||
|
|
||
| return [] unless is_eager_load | ||
|
|
||
| parent_class.descendants.filter_map { |sub| sub.reflect_on_association(relation) } | ||
| end |
jamis
left a comment
There was a problem hiding this comment.
I like this refactoring -- it cleans a lot of things up nicely. There are a few things I've noted, and Copilot raises some good points as well (though I think some of them are probably pre-existing issues that Copilot is only surfacing because they were moved around in the refactoring; if so, feel free to indicate that and we can address them in a different PR).
A $lookup joins only within the same client and database, but the fallback check compared only the client. An inclusion whose target uses store_in(database:) -- same client, different database -- silently returned empty results instead of falling back. Compare the database as well.
Distributing a referenced association nested in an embeds_one used
$mergeObjects, which synthesized the embedded document even when it was
missing -- an absent port became { device: [...] }. Guard the merge so a
missing embeds_one stays absent instead of being fabricated from its matches.
When more than one subclass defines an association with the same name but a different target, eager_load emitted one $lookup per subclass all writing to the same field, so the last overwrote the rest and a sibling's matches silently disappeared. Look each subclass's inclusion, with its own nested children, into its own temporary field and route every document to its own by the discriminator.
The new eager-load classes are implementation detail, so mark them @api private, and add YARD to the public methods that lacked it. Removing parameters from Eager#initialize changed its public API. Keep the use_lookup/pipeline parameters and the $lookup branch in #run, and have preload_for_lookup delegate to a new Eager.run class method that wraps new(...).run. Update the eager_spec warning expectation for the new cluster/database message.
|
Well, I've addressed:
Regarding this, I couldn't reproduce, so left it |
Hi folks, this PR closes a few gaps in
eager_load. Each case below either wasn't supported or crashed before but now they resolve in the same single aggregation as the rest.1. Associations defined only on a subclass
When you query through a superclass,
eager_loadnow finds associations declared only on a subclass.Works for
has_many,has_one,has_and_belongs_to_manyandbelongs_to.2. References that live inside embedded documents
A
belongs_tositting in an embedded document is now eager-loaded through its embedding parent, with no extra query:This holds true regardless of how deep the embedding goes, even when the embedded chain is attached to a referenced parent.
has_and_belongs_to_manyinside an embedded document works too — each element keeps its own matches:3. Polymorphic
belongs_toA polymorphic
belongs_tocan't be a single$lookup(its target collection changes per row), so it's resolved with one extra aggregation that fans out per type:Everything resolves in a single aggregation, except the polymorphic case, which adds exactly one faceted query.
While trying these features out in a real project I also ran into two pre-existing
eager_loadbugs, fixed here in their own commits:Model.eager_load([]).firstused to raise instead of just returning the documents; now it falls back to the normal path.has_manywhose target shares its collection with sibling subclasses pulled those siblings in too (crashing while materializing one whose discriminator didn't resolve); it now filters by the target subclass, the way a normal query does.