Add stats, extended_stats, and value_count support to DSL Calcite plugin#4
Add stats, extended_stats, and value_count support to DSL Calcite plugin#4varun-st wants to merge 1 commit into
Conversation
Implements metric aggregation translators for stats (min/max/sum/count/avg), extended_stats (variance/stddev/sum_of_squares), and value_count aggregations.
| * @return List of AggregateCall objects, one per metric value | ||
| * @throws ConversionException if the conversion fails | ||
| */ | ||
| List<AggregateCall> toAggregateCalls(T agg, AggregationConversionContext ctx) throws ConversionException; |
There was a problem hiding this comment.
Would it be better to make MetricTranslator itself to return a list of AggregateCall instead? That way we do not need to create this additional class?
There was a problem hiding this comment.
Makes sense. I have updated the origin MetricTranslator itself. Thanks.
| double min = values.get(1) != null ? ((Number) values.get(1)).doubleValue() : Double.POSITIVE_INFINITY; | ||
| double max = values.get(2) != null ? ((Number) values.get(2)).doubleValue() : Double.NEGATIVE_INFINITY; |
There was a problem hiding this comment.
Are we sure if returning positive and negative infinity matches existing behavior? Instead, can we return null when min and max does not exist?
There was a problem hiding this comment.
The existing implementation of OpenSearch uses these values - https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java#L86-L88. Hence, tried to retain the same.
| @Override | ||
| public InternalAggregation toInternalAggregation(String name, List<Object> values) { | ||
| if (values == null || values.size() != METRIC_COUNT) { | ||
| return buildEmptyAggregation(name); |
There was a problem hiding this comment.
Just for my understanding - Would this cause issues if we just return null here?
There was a problem hiding this comment.
I was just trying to stick to the existing OpenSearch behaviour - https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java#L249-L251.
Summary
Add support for stats, extended_stats, and value_count metric aggregations to the DSL Calcite plugin.
Changes
New Metric Aggregations
• StatsMetricTranslator: Implements stats aggregation returning 5 metrics (count, min, max, sum, avg)
• ExtendedStatsMetricTranslator: Implements extended_stats aggregation returning 7 metrics (count, min, max, sum, avg, variance, std_deviation)
• ValueCountMetricTranslator: Implements value_count aggregation returning document count
Implementation Details
Stats Aggregation
• Generates 5 AggregateCall objects for COUNT, MIN, MAX, SUM, AVG
• Maps to Calcite's standard SQL aggregate functions
• Returns InternalStats with proper empty aggregation handling
Extended Stats Aggregation
• Generates 7 AggregateCall objects including VAR_POP and STDDEV_POP
• Calculates sum_of_squares from variance using formula: sum_of_squares = variance * count + avg² * count
• Uses default sigma value (2.0) for std_deviation_bounds
• Returns InternalExtendedStats with proper empty aggregation handling
Value Count Aggregation
• Maps to Calcite's COUNT function
• Returns InternalValueCount with count value
Key Features
• Composite Metric Pattern: Stats and extended_stats use CompositeMetricTranslator interface for multi-value metrics
• Type Handling:
• COUNT returns BIGINT NOT NULL
• AVG, VAR_POP, STDDEV_POP return DOUBLE
• MIN, MAX, SUM preserve input field type
• Empty Aggregation Support: Proper handling when no data matches (count=0, min=+∞, max=-∞)
• Null Handling: All aggregations ignore null values (matching OpenSearch behavior)
Testing
• Comprehensive unit tests for all three translators
• Tests cover: valid values, null handling, empty results, type conversions, edge cases
• Integration tests added to DslLogicalPlanIntegrationIT
Files Added
• StatsMetricTranslator.java
• ExtendedStatsMetricTranslator.java
• ValueCountMetricTranslator.java
• StatsMetricTranslatorTest.java (180 lines, 11 tests)
• ExtendedStatsMetricTranslatorTest.java (197 lines, 11 tests)
• ValueCountMetricTranslatorTest.java (124 lines, 8 tests)
Files Modified
• AggregationTreeWalker.java: Added handling for composite metrics
• AggregationResponseBuilder.java: Added buildCompositeMetric() method
• CompositeMetricTranslator.java: New interface for multi-value metrics
• AggregationRegistry.java: Registered new aggregation types
• DslLogicalPlanIntegrationIT.java: Added integration tests
Testing
All unit tests pass. Integration tests verify end-to-end functionality with OpenSearch.