fix: handle complex projections in ordering validation#20362
Draft
adriangb wants to merge 1 commit intoapache:mainfrom
Draft
fix: handle complex projections in ordering validation#20362adriangb wants to merge 1 commit intoapache:mainfrom
adriangb wants to merge 1 commit intoapache:mainfrom
Conversation
Previously, `get_projected_output_ordering` used `ordered_column_indices_from_projection` which was all-or-nothing: if any expression in the projection wasn't a simple Column, it returned None for the entire projection — even if the sort columns themselves were simple column refs. Replace it with `resolve_sort_column_projection` which only requires sort-column positions to resolve to simple Columns. Each ordering is now independently evaluated: orderings on simple column refs get validated with statistics even when other projection expressions are complex. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ordered_column_indices_from_projectionwithresolve_sort_column_projectionwhich only requires sort-column positions to resolve to simpleColumnexpressions, rather than failing the entire projection if any expression is complexget_projected_output_ordering: orderings on simple column refs get validated with min/max statistics even when other projection expressions are complex (e.g.a + 1)Problem: After projection pushdown, complex expressions in
ProjectionExprsare common (e.g.SELECT a + 1 AS x, b, c FROM t ORDER BY b). The oldordered_column_indices_from_projectionwas all-or-nothing: it failed onBinaryExpr(a+1)at index 0 and returnedNonefor the entire projection, even though the ordering onb(index 1) maps cleanly to a simpleColumn. With multi-file groups, this caused valid orderings to be unnecessarily dropped.Test plan
cargo test -p datafusion-datasource(97 tests pass)cargo test -p datafusion-sqllogictest --test sqllogictests -- parquet_sorted_statistics(passes)🤖 Generated with Claude Code