Skip to content

Conversation

@marin-ma
Copy link
Contributor

@marin-ma marin-ma commented Dec 4, 2025

Fix the transformer stage id contiguously increasing across different sql queries.

When AQE is off, the fix can be directly applied.

When AQE is on, vanillas spark set the rule CollapseCodegenStages to be stateful:
https://github.com/apache/spark/blob/branch-4.0/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala#L153-L156

However in Gluten, when AQE is on, the columnar rules are applied upon each individual query stages, and there's no stateful context shared across query stages that is visible to the columnar rules.

Seems like there's no way to use a stateful counter to generate incremental ids across different query stages. This pr adds a new rule to update the stage id after the physical plan is generated.

Related issue: #11251

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Dec 4, 2025
@github-actions
Copy link

github-actions bot commented Dec 4, 2025

Run Gluten Clickhouse CI on x86

@marin-ma marin-ma force-pushed the regenerate-transform-stageid branch from d3313ea to e45c64d Compare December 10, 2025 08:17
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@marin-ma marin-ma marked this pull request as ready for review December 10, 2025 14:30
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@marin-ma
Copy link
Contributor Author

@zhztheplayer @zhouyuan Could you help to review? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant