Skip to content

Conversation

@EpsilonPrime
Copy link
Contributor

This option has been unused since October 2023 when the filter push-down framework was refactored in PR #3301 (commit a03e2b3).

History:

The configuration option and proto field were left as dead code:

  • Default value was always false (disabled)
  • C++ backends (both ClickHouse and Velox) never accessed this field
  • Modern Parquet readers enable row group filtering by default

This removes:

  • ParquetReadOptions.enable_row_group_maxmin_index proto field
  • spark.gluten.sql.parquet.maxmin.index configuration option
  • All references in LocalFilesNode.java and IcebergLocalFilesNode.java

What changes are proposed in this pull request?

How was this patch tested?

@github-actions github-actions bot added CORE works for Gluten Core DATA_LAKE labels Dec 10, 2025
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

This option has been unused since October 2023 when the filter push-down
framework was refactored in PR apache#3301 (commit a03e2b3).

History:
- Added in PR apache#2457 (Aug 2023) to enable row group filtering based on
  min/max statistics in Parquet files
- Removed from C++ backend in PR apache#3301 (Oct 2023) during filter push-down
  refactor, only 2.5 months after initial implementation
- Replaced by more sophisticated page-level filtering in PR apache#4634 (Mar 2024)
  which uses Parquet Page Index

The configuration option and proto field were left as dead code:
- Default value was always false (disabled)
- C++ backends (both ClickHouse and Velox) never accessed this field
- Modern Parquet readers enable row group filtering by default

This removes:
- ParquetReadOptions.enable_row_group_maxmin_index proto field
- spark.gluten.sql.parquet.maxmin.index configuration option
- All references in LocalFilesNode.java and IcebergLocalFilesNode.java
@EpsilonPrime EpsilonPrime force-pushed the remove-unused-parquet-maxmin-index-option branch from d3f1ead to 1b24e8d Compare December 10, 2025 15:03
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@FelixYBW FelixYBW requested a review from rui-mo December 11, 2025 04:34
Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this change makes sense to me. Would you please take a look at the CH workflow failure?

iceberg/test/scala/org/apache/gluten/execution/iceberg/ClickHouseIcebergHiveTableSupport.scala:53: error: value ENABLE_PARQUET_ROW_GROUP_MAX_MIN_INDEX is not a member of object org.apache.gluten.config.GlutenConfig
00:37:55 [ERROR] .set(GlutenConfig.ENABLE_PARQUET_ROW_GROUP_MAX_MIN_INDEX.key, "true")

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@EpsilonPrime EpsilonPrime requested a review from rui-mo December 11, 2025 16:28
EpsilonPrime pushed a commit to EpsilonPrime/gluten that referenced this pull request Dec 12, 2025
Updates documentation to reflect completed PRs:
- PR apache#11277: Remove unused enable_row_group_maxmin_index
- PR apache#11278: Replace output_schema with ProjectRel

Changes:
1. SubstraitDiffAnalysis.md:
   - Mark completed migrations with ✅
   - Update diff count: 262 → ~200 lines
   - Reorganize priority matrix into completed/pending
   - Add updated recommendations post-PR completion
   - Add progress metrics tracking

2. SubstraitUnfork-NextSteps.md (NEW):
   - Actionable next steps ranked by effort/impact
   - Recommended path: Upgrade to v0.77.0 first
   - Incremental alternatives with time estimates
   - 10 specific tasks with step-by-step guidance
   - Decision framework for upgrade vs incremental
   - Progress tracker table
   - Success criteria checklist

Next recommended actions:
1. Verify JOIN_TYPE changes (30 min quick win)
2. Upgrade to v0.77.0 for free wins (6-8 hours)
3. Migrate column_types to AdvancedExtension (2-3 hours)

Estimated remaining effort: 40-60 hours for complete unfork
Target: <100 line diff or all modifications in AdvancedExtension
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core DATA_LAKE

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants