[GLUTEN-12260][VL] Fix CheckOverflowTransformer using wrong child type for cast decision in Spark-33 by Xtpacz · Pull Request #12261 · apache/gluten

Xtpacz · 2026-06-08T03:45:12Z

What changes are proposed in this pull request?

CheckOverflowTransformer reads original.child.dataType to decide whether to insert a cast. For BinaryArithmetic, Spark's .dataType returns left.dataType rather than the arithmetic result type. After child transformers apply rescale optimizations, the actual output type may differ from the Spark-declared type, and the cast is wrongly skipped.

The resulting substrait plan has decimal types that mismatch function signatures. Velox's SimpleFunction validation rejects it, and ColumnarPartialProjectRule falls the entire Project back to JVM. Result is correct (via fallback) but native acceleration is lost.

Reproducer

CREATE TABLE t1 (val BIGINT) USING parquet;
CREATE TABLE t2 (val BIGINT) USING parquet;
INSERT INTO t1 VALUES (200);
INSERT INTO t2 VALUES (100), (100), (100), (100), (100);

SELECT
    a.val,
    (a.val - COALESCE(SUM(b.val), 0) / 5.0)
        / (COALESCE(SUM(b.val), 0) / 5.0) AS growth_rate
FROM t1 a CROSS JOIN t2 b
GROUP BY a.val;

Root cause:

gluten/gluten-substrait/src/main/scala/org/apache/gluten/expression/UnaryExpressionTransformer.scala

Line 90 in fc90a79

original.child.dataType,

this read the Spark expression's declared type instead of the transformer's actual output type.

Fix

- original.child.dataType,
+ child.dataType,

How was this patch tested?

Before fix — Project falls back to JVM:

== Final Plan ==
* Project (17)                                         ← JVM, codegen id=3
+- VeloxColumnarToRow (16)                             ← extra C2R conversion
   +- ^ RegularHashAggregateExecTransformer (14)
      +- ^ VeloxBroadcastNestedLoopJoinExecTransformer (13)
         :- ^ InputIteratorTransformer (7)
         :  +- BroadcastQueryStage (5)
         :     +- ColumnarBroadcastExchange (4)
         :        +- RowToVeloxColumnar (3)
         :           +- * ColumnarToRow (2)
         :              +- BatchScan (1)
         +- ^ InputIteratorTransformer (12)
            +- RowToVeloxColumnar (10)
               +- * ColumnarToRow (9)
                  +- BatchScan (8)

After fix — Project runs natively in Velox:

== Final Plan ==
VeloxColumnarToRow (17)
+- ^ ProjectExecTransformer (15)                       ← native Velox Project
   +- ^ RegularHashAggregateExecTransformer (14)
      +- ^ VeloxBroadcastNestedLoopJoinExecTransformer (13)
         :- ^ InputIteratorTransformer (7)
         :  +- BroadcastQueryStage (5)
         :     +- ColumnarBroadcastExchange (4)
         :        +- RowToVeloxColumnar (3)
         :           +- * ColumnarToRow (2)
         :              +- BatchScan (1)
         +- ^ InputIteratorTransformer (12)
            +- RowToVeloxColumnar (10)
               +- * ColumnarToRow (9)
                  +- BatchScan (8)

Key differences:

Node (15) changes from Project (JVM, * = codegen) to ProjectExecTransformer (Velox native, ^ = transformer)
VeloxColumnarToRow moves from before Project (forced conversion to feed JVM) to after Project (deferred until output)
Aggregate→Project pipeline stays in Velox without breaking

github-actions · 2026-06-08T03:45:42Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-08T06:37:27Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-08T06:38:26Z

Run Gluten Clickhouse CI on x86

Xtpacz · 2026-06-08T06:47:14Z

@zhouyuan @philo-he could you have a look at the PR?

Xtpacz · 2026-06-09T14:32:46Z

@JkSelf @wForget could you have a look at the PR?

wForget · 2026-06-10T03:43:18Z

I executed the reproducer sqls you provided, but it doesn't seem to reproduce the issue. Is there something I'm missing?

spark version: 3.5.1
gluten version: 1.5.0

For BinaryArithmetic, Spark's .dataType returns left.dataType rather than the arithmetic result type.

Furthermore, this description seems incorrect; BinaryArithmetic.dataType should be the result type.

philo-he

The changes look good to me. Could you rebase the code to check whether the CI failure goes away?

Xtpacz · 2026-06-10T08:37:43Z

@wForget Thanks for the correction!
My environment is Spark 3.3.4 + Gluten 1.6.0
|

Xtpacz · 2026-06-10T08:38:30Z

The changes look good to me. Could you rebase the code to check whether the CI failure goes away?

I will rebase. Thanks!

…ision

github-actions · 2026-06-10T08:57:14Z

Run Gluten Clickhouse CI on x86

wForget · 2026-06-10T09:04:15Z

@Xtpacz The execution plan in your screenshot does not appear to match the reproducer sql (the reproducer sql does not have an id_count column).

github-actions · 2026-06-10T09:17:12Z

Run Gluten Clickhouse CI on x86

Xtpacz · 2026-06-10T09:33:20Z

@wForget Sorry for the confusion — I habitually desensitized the column names, which caused the mismatch with the screenshot. It doesn't affect the actual result though. Let me rerun with the exact reproducer SQL and update here.

INSERT INTO tb_sink
SELECT
    a.val,
    (a.val - COALESCE(SUM(b.val), 0) / 5.0)
        / (COALESCE(SUM(b.val), 0) / 5.0) AS growth_rate
FROM t1 a CROSS JOIN t2 b
GROUP BY a.val;

wForget · 2026-06-10T10:20:00Z

I suspect it might be related to apache/spark#36698, cc @ulysses-you Could you please take a look?

github-actions · 2026-06-10T12:28:24Z

Run Gluten Clickhouse CI on x86

zhouyuan · 2026-06-10T14:41:08Z

@Xtpacz
Starting from spark-34, the data type check logic is changed, it will conduct the right type based on the child decimal datatype:
https://github.com/apache/spark/blob/branch-3.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L221
it's actually a optimization to skip the cast in some cases.

So it looks like the issue only exists in spark-33. if i understand it correctly, there is a fallback in your test, would you please help to check the fallback in the driver log? There should be some log like "Fallback due to xxx"

Xtpacz · 2026-06-11T03:26:43Z

@zhouyuan Thanks for you review. It is confirmed that this issue only exists in spark-33, as you mentioned. Here is the fallback log from driver on spark-33:

INFO org.apache.spark.sql.execution.GlutenFallbackReporter: Validation failed for plan: Project[QueryId=4], due to: 
 - Native validation failed: 
   |- Validation failed due to exception caught at file:SubstraitToVeloxPlanValidator.cc line:1450 function:validate, thrown from file:ExprCompiler.cpp line:311 function:compileCall, reason:Found incompatible return types for 'divide' (DECIMAL(38, 9) vs. DECIMAL(38, 10)) for input types (DECIMAL(29, 6), DECIMAL(27, 6)).

This confirms that CheckOverflowTransformer use original.child.dataType (which is left.dataType in spark-33's BinaryArithmetic), so causing the precision mismatch.

github-actions Bot added the CORE works for Gluten Core label Jun 8, 2026

philo-he approved these changes Jun 10, 2026

View reviewed changes

Xtpacz added 2 commits June 10, 2026 16:47

[VL] Fix CheckOverflowTransformer using wrong child type for cast dec…

e37b064

…ision

rerun CI

375b9f2

Xtpacz force-pushed the fix-decimal-pb branch from 8d7c088 to 375b9f2 Compare June 10, 2026 08:47

rerun CI

a4d8a79

philo-he changed the title ~~[VL] Fix CheckOverflowTransformer using wrong child type for cast decision~~ [GLUTEN-12260][VL] Fix CheckOverflowTransformer using wrong child type for cast decision Jun 11, 2026

zhouyuan changed the title ~~[GLUTEN-12260][VL] Fix CheckOverflowTransformer using wrong child type for cast decision~~ [GLUTEN-12260][VL] Fix CheckOverflowTransformer using wrong child type for cast decision in Spark-33 Jun 11, 2026

Conversation

Xtpacz commented Jun 8, 2026

What changes are proposed in this pull request?

Reproducer

Root cause:

Fix

How was this patch tested?

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Xtpacz commented Jun 8, 2026

Uh oh!

Xtpacz commented Jun 9, 2026

Uh oh!

wForget commented Jun 10, 2026

Uh oh!

philo-he left a comment

Choose a reason for hiding this comment

Uh oh!

Xtpacz commented Jun 10, 2026

Uh oh!

Xtpacz commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

wForget commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Xtpacz commented Jun 10, 2026

Uh oh!

wForget commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

zhouyuan commented Jun 10, 2026

Uh oh!

Xtpacz commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants