Remove unused enable_row_group_maxmin_index option #11277

EpsilonPrime · 2025-12-10T14:52:30Z

This option has been unused since October 2023 when the filter push-down framework was refactored in PR #3301 (commit a03e2b3).

History:

Added in PR [GLUTEN-2456][CH] Support use push down filter to skip rowgroups in parquet input format reader #2457 (Aug 2023) to enable row group filtering based on min/max statistics in Parquet files
Removed from C++ backend in PR [GLUTEN-3297][CH] Refactor filter push down framework in gluten, support orc FPD and reuse parquet FPD in CH. #3301 (Oct 2023) during filter push-down refactor, only 2.5 months after initial implementation
Replaced by more sophisticated page-level filtering in PR [GLUTEN-3582] Support PageIndex #4634 (Mar 2024) which uses Parquet Page Index

The configuration option and proto field were left as dead code:

Default value was always false (disabled)
C++ backends (both ClickHouse and Velox) never accessed this field
Modern Parquet readers enable row group filtering by default

This removes:

ParquetReadOptions.enable_row_group_maxmin_index proto field
spark.gluten.sql.parquet.maxmin.index configuration option
All references in LocalFilesNode.java and IcebergLocalFilesNode.java

What changes are proposed in this pull request?

How was this patch tested?

github-actions · 2025-12-10T14:53:04Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-12-10T15:00:54Z

Run Gluten Clickhouse CI on x86

This option has been unused since October 2023 when the filter push-down framework was refactored in PR apache#3301 (commit a03e2b3). History: - Added in PR apache#2457 (Aug 2023) to enable row group filtering based on min/max statistics in Parquet files - Removed from C++ backend in PR apache#3301 (Oct 2023) during filter push-down refactor, only 2.5 months after initial implementation - Replaced by more sophisticated page-level filtering in PR apache#4634 (Mar 2024) which uses Parquet Page Index The configuration option and proto field were left as dead code: - Default value was always false (disabled) - C++ backends (both ClickHouse and Velox) never accessed this field - Modern Parquet readers enable row group filtering by default This removes: - ParquetReadOptions.enable_row_group_maxmin_index proto field - spark.gluten.sql.parquet.maxmin.index configuration option - All references in LocalFilesNode.java and IcebergLocalFilesNode.java

github-actions · 2025-12-10T15:03:36Z

Run Gluten Clickhouse CI on x86

rui-mo

Thanks, this change makes sense to me. Would you please take a look at the CH workflow failure?

iceberg/test/scala/org/apache/gluten/execution/iceberg/ClickHouseIcebergHiveTableSupport.scala:53: error: value ENABLE_PARQUET_ROW_GROUP_MAX_MIN_INDEX is not a member of object org.apache.gluten.config.GlutenConfig
00:37:55 [ERROR] .set(GlutenConfig.ENABLE_PARQUET_ROW_GROUP_MAX_MIN_INDEX.key, "true")

github-actions · 2025-12-11T13:50:06Z

Run Gluten Clickhouse CI on x86

Updates documentation to reflect completed PRs: - PR apache#11277: Remove unused enable_row_group_maxmin_index - PR apache#11278: Replace output_schema with ProjectRel Changes: 1. SubstraitDiffAnalysis.md: - Mark completed migrations with ✅ - Update diff count: 262 → ~200 lines - Reorganize priority matrix into completed/pending - Add updated recommendations post-PR completion - Add progress metrics tracking 2. SubstraitUnfork-NextSteps.md (NEW): - Actionable next steps ranked by effort/impact - Recommended path: Upgrade to v0.77.0 first - Incremental alternatives with time estimates - 10 specific tasks with step-by-step guidance - Decision framework for upgrade vs incremental - Progress tracker table - Success criteria checklist Next recommended actions: 1. Verify JOIN_TYPE changes (30 min quick win) 2. Upgrade to v0.77.0 for free wins (6-8 hours) 3. Migrate column_types to AdvancedExtension (2-3 hours) Estimated remaining effort: 40-60 hours for complete unfork Target: <100 line diff or all modifications in AdvancedExtension

github-actions bot added CORE works for Gluten Core DATA_LAKE labels Dec 10, 2025

EpsilonPrime mentioned this pull request Dec 10, 2025

feat: add enable_row_group_maxmin_index to ParquetReadOptions substrait-io/substrait#882

Closed

EpsilonPrime force-pushed the remove-unused-parquet-maxmin-index-option branch from 2e75d81 to d3f1ead Compare December 10, 2025 15:00

EpsilonPrime force-pushed the remove-unused-parquet-maxmin-index-option branch from d3f1ead to 1b24e8d Compare December 10, 2025 15:03

FelixYBW requested a review from rui-mo December 11, 2025 04:34

rui-mo reviewed Dec 11, 2025

View reviewed changes

fix tests

3962374

github-actions bot added the CLICKHOUSE label Dec 11, 2025

EpsilonPrime requested a review from rui-mo December 11, 2025 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove unused enable_row_group_maxmin_index option #11277

Remove unused enable_row_group_maxmin_index option #11277

Uh oh!

EpsilonPrime commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

rui-mo left a comment

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Remove unused enable_row_group_maxmin_index option #11277

Are you sure you want to change the base?

Remove unused enable_row_group_maxmin_index option #11277

Uh oh!

Conversation

EpsilonPrime commented Dec 10, 2025

What changes are proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

rui-mo left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants