Skip to content

Add range and date_range bucket aggregation support#7

Open
varun-st wants to merge 1 commit into
pr/date-histogram-from-datafusionfrom
pr/range-aggregation-from-datafusion
Open

Add range and date_range bucket aggregation support#7
varun-st wants to merge 1 commit into
pr/date-histogram-from-datafusionfrom
pr/range-aggregation-from-datafusion

Conversation

@varun-st

@varun-st varun-st commented Apr 3, 2026

Copy link
Copy Markdown
Owner

Description

This PR adds support for range and date_range bucket aggregations in the DataFusion query executor, enabling range-based bucketing of numeric and date fields.

Changes

RangeGrouping & DateRangeGrouping: Implement expression-based grouping using SQL CASE WHEN expressions to match values against range boundaries
RangeBucketTranslator & DateRangeBucketTranslator: Handle translation between OpenSearch aggregation builders and DataFusion execution
Date math support: Parse and resolve date expressions like now-7d/d at query planning time with user-configurable timezone
RangeUtils: Shared utility for generating range bucket keys
InternalDateRange constructor: Made public for consistency with other Internal* aggregation classes
Comprehensive tests: 49 unit and integration tests covering range matching, date math, timezone handling, and edge cases

Implementation Details

Range Matching Strategy:
• Uses CASE WHEN expressions: CASE WHEN field >= from AND field < to THEN 'key' ... END
• Boundary semantics: [from, to) - inclusive lower bound, exclusive upper bound
• Supports unbounded ranges with infinity values

Date Math Resolution:
• Expressions like now-7d/d, now/M, 2024-01-01 resolved at query planning time
• Uses OpenSearch's DateMathParser for 100% compatibility
• Respects user-specified timezone (defaults to UTC)

Limitations:
• Overlapping ranges not supported (CASE WHEN returns first match only vs OpenSearch's multi-bucket behavior)
• Script parameter not supported (field-based only)
• Format parameter hardcoded to RAW

Testing

bash
./gradlew :sandbox:plugins:dsl-query-executor:test

All 155 tests pass (49 for range/date_range aggregations)

Dependencies

This PR depends on the histogram aggregation PR (#XXXX) as it uses the ExpressionGrouping interface introduced there.

Related Documentation

• RANGE_AGGREGATION_COMPARISON.md: Detailed comparison with OpenSearch native implementation

- Implement RangeGrouping and DateRangeGrouping with CASE WHEN expressions
- Add RangeBucketTranslator and DateRangeBucketTranslator
- Support date math expressions (now-7d/d) with user-configurable timezone
- Add RangeUtils utility for key generation
}

@Override
public RexNode buildExpression(RelDataType inputRowType, RexBuilder builder) throws ConversionException {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RangeGrouping and DateRangeGrouping seem to have a lot of code overlap. Is it possible to reuse the code?

}

@Override
public GroupingInfo getGrouping(DateRangeAggregationBuilder agg) throws ConversionException {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add validations to validate that if any other unsupported parameter is specified then we throw ConversionException? For eg: for "format" option in date range which we are not supporting

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file from PR since design docs are not checked in to the repo

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it was unintentional. I'll remove it in the next revision.

Comment on lines +76 to +77
operands.add(builder.makeNullLiteral(stringType));
return builder.makeCall(SqlStdOperatorTable.CASE, operands);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check if ranges is non empty or else we would have an empty case statement here right? (Same comment for DateRangeGrouping as well)

private DateRangeBucketTranslator translator;
private DateRangeAggregationBuilder agg;

@Before

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests need to extend OpenSearchTestCase similar to existing tests and we can remove the annotations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants