Add range and date_range bucket aggregation support#7
Conversation
- Implement RangeGrouping and DateRangeGrouping with CASE WHEN expressions - Add RangeBucketTranslator and DateRangeBucketTranslator - Support date math expressions (now-7d/d) with user-configurable timezone - Add RangeUtils utility for key generation
| } | ||
|
|
||
| @Override | ||
| public RexNode buildExpression(RelDataType inputRowType, RexBuilder builder) throws ConversionException { |
There was a problem hiding this comment.
RangeGrouping and DateRangeGrouping seem to have a lot of code overlap. Is it possible to reuse the code?
| } | ||
|
|
||
| @Override | ||
| public GroupingInfo getGrouping(DateRangeAggregationBuilder agg) throws ConversionException { |
There was a problem hiding this comment.
Can we add validations to validate that if any other unsupported parameter is specified then we throw ConversionException? For eg: for "format" option in date range which we are not supporting
There was a problem hiding this comment.
Remove this file from PR since design docs are not checked in to the repo
There was a problem hiding this comment.
Sorry, it was unintentional. I'll remove it in the next revision.
| operands.add(builder.makeNullLiteral(stringType)); | ||
| return builder.makeCall(SqlStdOperatorTable.CASE, operands); |
There was a problem hiding this comment.
We need to check if ranges is non empty or else we would have an empty case statement here right? (Same comment for DateRangeGrouping as well)
| private DateRangeBucketTranslator translator; | ||
| private DateRangeAggregationBuilder agg; | ||
|
|
||
| @Before |
There was a problem hiding this comment.
Tests need to extend OpenSearchTestCase similar to existing tests and we can remove the annotations
Description
This PR adds support for range and date_range bucket aggregations in the DataFusion query executor, enabling range-based bucketing of numeric and date fields.
Changes
• RangeGrouping & DateRangeGrouping: Implement expression-based grouping using SQL CASE WHEN expressions to match values against range boundaries
• RangeBucketTranslator & DateRangeBucketTranslator: Handle translation between OpenSearch aggregation builders and DataFusion execution
• Date math support: Parse and resolve date expressions like now-7d/d at query planning time with user-configurable timezone
• RangeUtils: Shared utility for generating range bucket keys
• InternalDateRange constructor: Made public for consistency with other Internal* aggregation classes
• Comprehensive tests: 49 unit and integration tests covering range matching, date math, timezone handling, and edge cases
Implementation Details
Range Matching Strategy:
• Uses CASE WHEN expressions: CASE WHEN field >= from AND field < to THEN 'key' ... END
• Boundary semantics: [from, to) - inclusive lower bound, exclusive upper bound
• Supports unbounded ranges with infinity values
Date Math Resolution:
• Expressions like now-7d/d, now/M, 2024-01-01 resolved at query planning time
• Uses OpenSearch's DateMathParser for 100% compatibility
• Respects user-specified timezone (defaults to UTC)
Limitations:
• Overlapping ranges not supported (CASE WHEN returns first match only vs OpenSearch's multi-bucket behavior)
• Script parameter not supported (field-based only)
• Format parameter hardcoded to RAW
Testing
bash
./gradlew :sandbox:plugins:dsl-query-executor:test
All 155 tests pass (49 for range/date_range aggregations)
Dependencies
This PR depends on the histogram aggregation PR (#XXXX) as it uses the ExpressionGrouping interface introduced there.
Related Documentation
• RANGE_AGGREGATION_COMPARISON.md: Detailed comparison with OpenSearch native implementation