Backend
VL (Velox)
Bug description
Backend
VL (Velox)
Bug description
ExpandExecTransformer currently converts Spark ExpandExec.projections to Substrait ExpandRel mostly according to each projection expression's own type. However, Spark's ExpandExec.output attributes are the schema contract for each Expand output column.
In some multiple-distinct aggregate plans, especially with decimal expressions, CASE WHEN, round(avg(decimal)), or duplicated grouping keys, a projection expression's Spark/Gluten transformer type can differ from the corresponding ExpandExec.output attribute type. This may produce an ExpandRel whose per-column projection types are inconsistent with the output schema.
Expected behavior:
Gluten should make sure every Expand projection column is type-compatible with the corresponding ExpandExec.output attribute before Substrait conversion. If the backend cannot support the required alignment in native Expand, validation should fail cleanly and fallback, instead of producing an invalid native plan or failing later.
Actual behavior:
Plans can hit native planning/validation failures or fallback reasons such as:
Reason: The projections type does not match across different rows in the same column. Got: DECIMAL(27, 10), DECIMAL(28, 10)
Retriable: False
Expression: projections_[j][i]->type()->equivalent(*type)
Function: ExpandNode
File: /home/jenkins/agent/workspace/di-spark/Gluten/CI_spark_gluten/ep/build-velox/build/velox_ep/velox/core/PlanNode.cpp
Line: 511
Stack trace:
# 0 _ZN8facebook5velox7process10StackTraceC1Ei
# 1 _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2 _ZN8facebook5velox6detail14veloxCheckFailINS0_14VeloxUserErrorERKSsEEvRKNS1_18VeloxCheckFailArgsET0_
# 3 _ZN8facebook5velox4core10ExpandNodeC1ESsSt6vectorIS3_ISt10shared_ptrIKNS1_10ITypedExprEESaIS7_EESaIS9_EES3_ISsSaISsEES4_IKNS1_8PlanNodeEE
# 4 _ZN6gluten29SubstraitToVeloxPlanConverter11toVeloxPlanERKN9substrait9ExpandRelE
# 5 _ZN6gluten29SubstraitToVeloxPlanConverter11toVeloxPlanERKN9substrait3RelE
# 6 _ZN6gluten29SubstraitToVeloxPlanConverter11toVeloxPlanERKN9substrait12AggregateRelE
# 7 _ZN6gluten29SubstraitToVeloxPlanConverter11toVeloxPlanERKN9substrait3RelE
# 8 _ZN6gluten29SubstraitToVeloxPlanConverter11toVeloxPlanERKN9substrait10ProjectRelE
# 9 _ZN6gluten29SubstraitToVeloxPlanConverter11toVeloxPlanERKN9substrait3RelE
# 10 _ZN6gluten29SubstraitToVeloxPlanConverter11toVeloxPlanERKN9substrait10ProjectRelE
# 11 _ZN6gluten29SubstraitToVeloxPlanConverter11toVeloxPlanERKN9substrait3RelE
# 12 _ZN6gluten29SubstraitToVeloxPlanConverter11toVeloxPlanERKN9substrait7RelRootE
# 13 _ZN6gluten29SubstraitToVeloxPlanConverter11toVeloxPlanERKN9substrait4PlanE
# 14 _ZN6gluten18VeloxPlanConverter11toVeloxPlanERKN9substrait4PlanESt6vectorINS1_18ReadRel_LocalFilesESaIS6_EE
# 15 _ZN6gluten12VeloxRuntime20createResultIteratorERKSsRKSt6vectorISt10shared_ptrINS_14ResultIteratorEESaIS6_EERKSt13unordered_mapISsSsSt4hashISsESt8equal_toISsESaISt4pairIS1_SsEEE
# 16 Java_org_apache_gluten_vectorized_PlanEvaluatorJniWrapper_nativeCreateKernelWithIterator
# 17 0x00007f94490186e7
at org.apache.gluten.vectorized.PlanEvaluatorJniWrapper.nativeCreateKernelWithIterator(Native Method)
at org.apache.gluten.vectorized.NativePlanEvaluator.createKernelWithBatchIterator(NativePlanEvaluator.java:69)
at org.apache.gluten.backendsapi.velox.VeloxIteratorApi.genFinalStageIterator(VeloxIteratorApi.scala:255)
Reproducer
CREATE TABLE pending_events (
order_id BIGINT,
pending_date DATE,
pending_reason STRING,
pending_timestamp BIGINT
) USING PARQUET;
CREATE TABLE verified_events (
order_id BIGINT,
verified_timestamp BIGINT
) USING PARQUET;
INSERT INTO pending_events VALUES
(1001, DATE '2026-04-20', 'A', 0),
(1002, DATE '2026-04-20', 'A', 0),
(1003, DATE '2026-04-20', 'B', 0),
(1004, DATE '2026-04-21', 'A', 0);
INSERT INTO verified_events VALUES
(1001, 43200), -- 12.0 hours
(1002, 108000), -- 30.0 hours
(1003, 198000), -- 55.0 hours
(1004, 176400); -- 49.0 hours
WITH sla_calc AS (
SELECT
p.pending_date,
p.pending_reason,
p.order_id,
ROUND(
CAST(v.verified_timestamp - p.pending_timestamp AS DECIMAL(38, 18))
/ CAST(3600 AS DECIMAL(38, 18)),
1
) AS sla_hours
FROM pending_events p
JOIN verified_events v
ON p.order_id = v.order_id
)
SELECT
pending_date,
pending_reason,
COUNT(DISTINCT order_id) AS total_order,
ROUND(AVG(sla_hours), 1) AS avg_sla_hours,
COUNT(DISTINCT CASE WHEN sla_hours > 24 THEN order_id END) AS backlog_24,
COUNT(DISTINCT CASE WHEN sla_hours > 48 THEN order_id END) AS backlog_48
FROM sla_calc
GROUP BY pending_date, pending_reason;
DROP TABLE IF EXISTS pending_events;
DROP TABLE IF EXISTS verified_events;
Possible fix direction
Before converting Expand projections to Substrait:
- Compare each projection expression type with the corresponding ExpandExec.output attribute type.
- Align null literals directly to the output type.
- For other mismatches, either insert an explicit cast in a place supported by the backend, or push the cast into a pre-project if native Expand cannot contain scalar functions.
- During validation, return ValidationResult.failed(...) for unsupported alignment cases so planning can fallback cleanly.
- Keep a hard assertion/protection in doTransform to avoid generating invalid native ExpandRel after validation.
This issue was written with the assistance of AI.
Gluten version
Gluten-1.3
Spark version
Spark-3.2.x
Spark configurations
No response
System information
No response
Relevant logs
Backend
VL (Velox)
Bug description
Backend
VL (Velox)
Bug description
ExpandExecTransformercurrently converts SparkExpandExec.projectionsto SubstraitExpandRelmostly according to each projection expression's own type. However, Spark'sExpandExec.outputattributes are the schema contract for each Expand output column.In some multiple-distinct aggregate plans, especially with decimal expressions,
CASE WHEN,round(avg(decimal)), or duplicated grouping keys, a projection expression's Spark/Gluten transformer type can differ from the correspondingExpandExec.outputattribute type. This may produce an ExpandRel whose per-column projection types are inconsistent with the output schema.Expected behavior:
Gluten should make sure every Expand projection column is type-compatible with the corresponding
ExpandExec.outputattribute before Substrait conversion. If the backend cannot support the required alignment in native Expand, validation should fail cleanly and fallback, instead of producing an invalid native plan or failing later.Actual behavior:
Plans can hit native planning/validation failures or fallback reasons such as:
Reproducer
Possible fix direction
Before converting Expand projections to Substrait:
This issue was written with the assistance of AI.
Gluten version
Gluten-1.3
Spark version
Spark-3.2.x
Spark configurations
No response
System information
No response
Relevant logs