Skip to content

Add stats, extended_stats, and value_count support to DSL Calcite plugin#4

Open
varun-st wants to merge 1 commit into
feature/datafusion-dslfrom
pr/stats-aggregation
Open

Add stats, extended_stats, and value_count support to DSL Calcite plugin#4
varun-st wants to merge 1 commit into
feature/datafusion-dslfrom
pr/stats-aggregation

Conversation

@varun-st

Copy link
Copy Markdown
Owner

Summary

Add support for stats, extended_stats, and value_count metric aggregations to the DSL Calcite plugin.

Changes

New Metric Aggregations

StatsMetricTranslator: Implements stats aggregation returning 5 metrics (count, min, max, sum, avg)
ExtendedStatsMetricTranslator: Implements extended_stats aggregation returning 7 metrics (count, min, max, sum, avg, variance, std_deviation)
ValueCountMetricTranslator: Implements value_count aggregation returning document count

Implementation Details

Stats Aggregation

• Generates 5 AggregateCall objects for COUNT, MIN, MAX, SUM, AVG
• Maps to Calcite's standard SQL aggregate functions
• Returns InternalStats with proper empty aggregation handling

Extended Stats Aggregation

• Generates 7 AggregateCall objects including VAR_POP and STDDEV_POP
• Calculates sum_of_squares from variance using formula: sum_of_squares = variance * count + avg² * count
• Uses default sigma value (2.0) for std_deviation_bounds
• Returns InternalExtendedStats with proper empty aggregation handling

Value Count Aggregation

• Maps to Calcite's COUNT function
• Returns InternalValueCount with count value

Key Features

Composite Metric Pattern: Stats and extended_stats use CompositeMetricTranslator interface for multi-value metrics
Type Handling:
• COUNT returns BIGINT NOT NULL
• AVG, VAR_POP, STDDEV_POP return DOUBLE
• MIN, MAX, SUM preserve input field type
Empty Aggregation Support: Proper handling when no data matches (count=0, min=+∞, max=-∞)
Null Handling: All aggregations ignore null values (matching OpenSearch behavior)

Testing

• Comprehensive unit tests for all three translators
• Tests cover: valid values, null handling, empty results, type conversions, edge cases
• Integration tests added to DslLogicalPlanIntegrationIT

Files Added

• StatsMetricTranslator.java
• ExtendedStatsMetricTranslator.java
• ValueCountMetricTranslator.java
• StatsMetricTranslatorTest.java (180 lines, 11 tests)
• ExtendedStatsMetricTranslatorTest.java (197 lines, 11 tests)
• ValueCountMetricTranslatorTest.java (124 lines, 8 tests)

Files Modified

• AggregationTreeWalker.java: Added handling for composite metrics
• AggregationResponseBuilder.java: Added buildCompositeMetric() method
• CompositeMetricTranslator.java: New interface for multi-value metrics
• AggregationRegistry.java: Registered new aggregation types
• DslLogicalPlanIntegrationIT.java: Added integration tests

Testing

All unit tests pass. Integration tests verify end-to-end functionality with OpenSearch.

Implements metric aggregation translators for stats (min/max/sum/count/avg), extended_stats (variance/stddev/sum_of_squares), and value_count aggregations.
* @return List of AggregateCall objects, one per metric value
* @throws ConversionException if the conversion fails
*/
List<AggregateCall> toAggregateCalls(T agg, AggregationConversionContext ctx) throws ConversionException;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to make MetricTranslator itself to return a list of AggregateCall instead? That way we do not need to create this additional class?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I have updated the origin MetricTranslator itself. Thanks.

Comment on lines +109 to +110
double min = values.get(1) != null ? ((Number) values.get(1)).doubleValue() : Double.POSITIVE_INFINITY;
double max = values.get(2) != null ? ((Number) values.get(2)).doubleValue() : Double.NEGATIVE_INFINITY;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure if returning positive and negative infinity matches existing behavior? Instead, can we return null when min and max does not exist?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Override
public InternalAggregation toInternalAggregation(String name, List<Object> values) {
if (values == null || values.size() != METRIC_COUNT) {
return buildEmptyAggregation(name);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding - Would this cause issues if we just return null here?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants