Add histogram bucket aggregation support by varun-st · Pull Request #5 · varun-st/OpenSearch

varun-st · 2026-03-27T06:19:17Z

Description

Adds support for histogram bucket aggregations in the DSL Calcite plugin.

Changes

Architecture:
• Refactors GroupingInfo into two interfaces:
• FieldGrouping: For field-based aggregations (terms, multi_terms)
• ExpressionGrouping: For computed aggregations (histogram)
• Histogram expressions are computed via LogicalProject before LogicalAggregate

Implementation:
• HistogramGrouping: Builds expression FLOOR((field - offset) / interval) * interval + offset
• HistogramBucketShape: Converts to InternalHistogram results
• SimpleFieldGrouping: Replaces direct GroupingInfo usage for field-based aggregations
• CollationResolver: Maps _key to projected column names for sorting

Testing:
• 4 unit tests for HistogramGrouping
• 13 unit tests for HistogramBucketShape
• 3 integration tests in DslLogicalPlanIntegrationIT

Implements histogram aggregation translation using expression-based grouping. Splits GroupingInfo into FieldGrouping and ExpressionGrouping interfaces. Histogram expressions are computed via LogicalProject before aggregation.

nssuresh2007 · 2026-03-29T14:47:07Z

+        RexNode fieldRef = builder.makeInputRef(field.getType(), field.getIndex());
+        RexNode intervalLiteral = builder.makeExactLiteral(BigDecimal.valueOf(interval));
+        RexNode offsetLiteral = builder.makeExactLiteral(BigDecimal.valueOf(offset));
+
+        RexNode adjusted = builder.makeCall(SqlStdOperatorTable.MINUS, fieldRef, offsetLiteral);
+        RexNode divided = builder.makeCall(SqlStdOperatorTable.DIVIDE, adjusted, intervalLiteral);
+        RexNode floored = builder.makeCall(SqlStdOperatorTable.FLOOR, divided);
+        RexNode multiplied = builder.makeCall(SqlStdOperatorTable.MULTIPLY, floored, intervalLiteral);


nit: Please add a comment on what this formula is for easy understanding

nssuresh2007 · 2026-03-29T17:08:41Z

We may need to update AggregationTreeWalker as well where it uses getFieldNames to generate granularity key [Link].

When we have a query where same field is used for histogram but different interval, then this clash could happen right?

{ "aggs": { "coarse": { "histogram": { "field": "price", "interval": 1000 } }, "fine": { "histogram": { "field": "price", "interval": 100 } } } }

nssuresh2007 · 2026-03-29T17:09:47Z

+
+    @Override
+    public String getProjectedColumnName() {
+        return fieldName + PROJECTED_COLUMN_SUFFIX;


Same clash issue can happen here since the projected column suffix remains same if same field name is used multiple times

nssuresh2007 · 2026-03-29T17:36:45Z

+        if (field == null) {
+            throw ConversionException.invalidField(fieldName);
+        }


Shouldn't we handle null as well, since it is expected that some fields might be null? Not sure if throwing Exception here would be correct behavior

nssuresh2007 · 2026-03-29T17:58:38Z

+        // InternalHistogram requires EmptyBucketInfo when minDocCount is 0, but DSL doesn't support
+        // extended bounds. Use minDocCount of 1 to avoid this requirement. Actual filtering is done
+        // in AggregationResponseBuilder before calling this method.
+        long minDocCount = agg.minDocCount() == 0 ? 1 : agg.minDocCount();


Instead of this approach, can't we identify the min and max buckets from the response and fill in the gaps with empty buckets before generating the response?

nssuresh2007 · 2026-03-29T17:59:52Z

        // - Response contains InternalValueCount with correct value (count=3)
    }
+
+    public void testHistogramAggregation() throws Exception {


Lets add the following nested aggregation scenarios as well:

histogram -> metric (histogram with sub-agg)
terms -> histogram (histogram nested under terms)
histogram -> terms (terms nested under histogram)
histogram -> histogram (nested histograms on different fields)
sibling histograms on the same field with different intervals

Add histogram bucket aggregation support

f2d6f3e

Implements histogram aggregation translation using expression-based grouping. Splits GroupingInfo into FieldGrouping and ExpressionGrouping interfaces. Histogram expressions are computed via LogicalProject before aggregation.

nssuresh2007 reviewed Mar 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add histogram bucket aggregation support#5

Add histogram bucket aggregation support#5
varun-st wants to merge 1 commit into
datafusion-dsl-metric-aggfrom
pr/histogram-bucket

varun-st commented Mar 27, 2026

Uh oh!

nssuresh2007 Mar 29, 2026

Uh oh!

nssuresh2007 Mar 29, 2026

Uh oh!

nssuresh2007 Mar 29, 2026

Uh oh!

nssuresh2007 Mar 29, 2026

Uh oh!

nssuresh2007 Mar 29, 2026

Uh oh!

nssuresh2007 Mar 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

varun-st commented Mar 27, 2026

Description

Changes

Uh oh!

nssuresh2007 Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

nssuresh2007 Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nssuresh2007 Mar 29, 2026 •

edited

Loading