-
Notifications
You must be signed in to change notification settings - Fork 552
Remove unused enable_row_group_maxmin_index option #11277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Remove unused enable_row_group_maxmin_index option #11277
Conversation
|
Run Gluten Clickhouse CI on x86 |
2e75d81 to
d3f1ead
Compare
|
Run Gluten Clickhouse CI on x86 |
This option has been unused since October 2023 when the filter push-down framework was refactored in PR apache#3301 (commit a03e2b3). History: - Added in PR apache#2457 (Aug 2023) to enable row group filtering based on min/max statistics in Parquet files - Removed from C++ backend in PR apache#3301 (Oct 2023) during filter push-down refactor, only 2.5 months after initial implementation - Replaced by more sophisticated page-level filtering in PR apache#4634 (Mar 2024) which uses Parquet Page Index The configuration option and proto field were left as dead code: - Default value was always false (disabled) - C++ backends (both ClickHouse and Velox) never accessed this field - Modern Parquet readers enable row group filtering by default This removes: - ParquetReadOptions.enable_row_group_maxmin_index proto field - spark.gluten.sql.parquet.maxmin.index configuration option - All references in LocalFilesNode.java and IcebergLocalFilesNode.java
d3f1ead to
1b24e8d
Compare
|
Run Gluten Clickhouse CI on x86 |
rui-mo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this change makes sense to me. Would you please take a look at the CH workflow failure?
iceberg/test/scala/org/apache/gluten/execution/iceberg/ClickHouseIcebergHiveTableSupport.scala:53: error: value ENABLE_PARQUET_ROW_GROUP_MAX_MIN_INDEX is not a member of object org.apache.gluten.config.GlutenConfig
00:37:55 [ERROR] .set(GlutenConfig.ENABLE_PARQUET_ROW_GROUP_MAX_MIN_INDEX.key, "true")
|
Run Gluten Clickhouse CI on x86 |
Updates documentation to reflect completed PRs: - PR apache#11277: Remove unused enable_row_group_maxmin_index - PR apache#11278: Replace output_schema with ProjectRel Changes: 1. SubstraitDiffAnalysis.md: - Mark completed migrations with ✅ - Update diff count: 262 → ~200 lines - Reorganize priority matrix into completed/pending - Add updated recommendations post-PR completion - Add progress metrics tracking 2. SubstraitUnfork-NextSteps.md (NEW): - Actionable next steps ranked by effort/impact - Recommended path: Upgrade to v0.77.0 first - Incremental alternatives with time estimates - 10 specific tasks with step-by-step guidance - Decision framework for upgrade vs incremental - Progress tracker table - Success criteria checklist Next recommended actions: 1. Verify JOIN_TYPE changes (30 min quick win) 2. Upgrade to v0.77.0 for free wins (6-8 hours) 3. Migrate column_types to AdvancedExtension (2-3 hours) Estimated remaining effort: 40-60 hours for complete unfork Target: <100 line diff or all modifications in AdvancedExtension
This option has been unused since October 2023 when the filter push-down framework was refactored in PR #3301 (commit a03e2b3).
History:
The configuration option and proto field were left as dead code:
This removes:
What changes are proposed in this pull request?
How was this patch tested?