fix: The limit_pushdown physical optimization rule removes limits in some cases leading to incorrect results by masonh22 · Pull Request #20048 · apache/datafusion

masonh22 · 2026-01-28T15:04:37Z

Which issue does this PR close?

None

Rationale for this change

Bug 1: When pushing down limits, we recurse down the physical plan accumulating limits until we reach a node where we can't push the limit down further. At this point, we insert another limit executor (or push it into the current node, if that node supports it). After this, we continue recursing to try to find more limits to push down. If we do find another, we remove it, but we don't set the GlobalRequirements::satisfied field back to false, meaning we don't always re-insert this limit.

Bug 2: When we're pushing down a limit with a skip/offset and no fetch/limit and we run into a node that supports fetch, we set GlobalRequirements::satisfied to true. This is wrong: the limit is not satisfied because fetch doesn't support skip/offset. Instead, we should set GlobalRequirements::satisfied to true if skip/offset is 0.

What changes are included in this PR?

This includes a one-line change to the push down limit logic that fixes the issue.

Are these changes tested?

I added a test that replicates the issue and fails without this change.

Are there any user-facing changes?

No

masonh22 · 2026-01-28T16:31:23Z

I'm looking into the CI failures. I guess I forgot to run the tests before making the PR 🙃

From what I can tell so far, it looks like my change improves/fixes things in the tests it breaks

avantgardnerio

From what I can tell, the tests are wrong and this was behaving incorrectly before.

I'd like to see copy & pasted results with and without the optimizer rule:

if they don't match prior to this PR, we have a issue we may need to hotfix in prior versions
if they do match after this PR is applied, I approve of this PR

masonh22 · 2026-01-28T17:09:06Z

Found one more bug. I updated the PR description and added a test + fix for that bug.

masonh22 · 2026-01-28T17:22:41Z

datafusion/sqllogictest/test_files/limit.slt

The previous result was wrong. Look at the query directly above this. It is identical except that this query adds OFFSET 3 LIMIT 2. None of the rows in the previous test training are in the rows returned by the previous query, and the new training I added is just rows 4-5 in that query.

What was happening here is the inner OFFSET/LIMITs were being removed by the physical optimizer rule.

Also, when I disable the limit_pushdown rule, I get the new results I added here

masonh22 · 2026-01-28T17:23:30Z

datafusion/sqllogictest/test_files/union.slt

-17)------BoundedWindowAggExec: wdw=[lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: Field { "lead(b.c1,Int64(1)) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING": nullable Int64 }, frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING], mode=[Sorted]
-18)--------ProjectionExec: expr=[1 as c1]
-19)----------PlaceholderRowExec
+04)------GlobalLimitExec: skip=0, fetch=3


I think this is a better plan than before: GlobalLimitExecs are pushed into the union, and before they were not pushed below CoalescePartitionsExec.

avantgardnerio

Nice find, I think you fixed a long standing bug. @alamb do we need to backport this to any previous releases, given that they were producing incorrect results?

alamb · 2026-01-29T17:41:29Z

Nice find, I think you fixed a long standing bug. @alamb do we need to backport this to any previous releases, given that they were producing incorrect results?

What I think we should do is file a ticket with some example queries where this bug results in incorrect results. Such queries I think will help us to understand the impact, and depending on that we can figure out if we want to backport this change

alamb

Thank you @masonh22 and @avantgardnerio -- this makes sense to me.

I also made a small follow on PR to update the tests to use insta

#20066

alamb · 2026-01-29T17:48:52Z

datafusion/core/tests/physical_optimizer/limit_pushdown.rs

+    let after_optimize =
+        LimitPushdown::new().optimize(outer_limit, &ConfigOptions::new())?;
+    let expected = [
+        "GlobalLimitExec: skip=1, fetch=3",


Without the code change in the PR, the actual code looks like

"SortExec: TopK(fetch=4), expr=[c1@0 ASC], preserve_partitioning=[false]" " EmptyExec"

Note the offset (aka the skip) was dropped

…some cases leading to incorrect results (apache#20048)  None  Bug 1: When pushing down limits, we recurse down the physical plan accumulating limits until we reach a node where we can't push the limit down further. At this point, we insert another limit executor (or push it into the current node, if that node supports it). After this, we continue recursing to try to find more limits to push down. If we do find another, we remove it, but we don't set the `GlobalRequirements::satisfied` field back to false, meaning we don't always re-insert this limit. Bug 2: When we're pushing down a limit with a skip/offset and no fetch/limit and we run into a node that supports fetch, we set `GlobalRequirements::satisfied` to true. This is wrong: the limit is not satisfied because fetch doesn't support skip/offset. Instead, we should set `GlobalRequirements::satisfied` to true if skip/offset is 0.  This includes a one-line change to the push down limit logic that fixes the issue.  I added a test that replicates the issue and fails without this change.  No

fix: The limit_pushdown physical optimization rule removes limits in some cases leading to incorrect results (apache#20048) ## Which issue does this PR close?  None ## Rationale for this change  Bug 1: When pushing down limits, we recurse down the physical plan accumulating limits until we reach a node where we can't push the limit down further. At this point, we insert another limit executor (or push it into the current node, if that node supports it). After this, we continue recursing to try to find more limits to push down. If we do find another, we remove it, but we don't set the `GlobalRequirements::satisfied` field back to false, meaning we don't always re-insert this limit. Bug 2: When we're pushing down a limit with a skip/offset and no fetch/limit and we run into a node that supports fetch, we set `GlobalRequirements::satisfied` to true. This is wrong: the limit is not satisfied because fetch doesn't support skip/offset. Instead, we should set `GlobalRequirements::satisfied` to true if skip/offset is 0. ## What changes are included in this PR?  This includes a one-line change to the push down limit logic that fixes the issue. ## Are these changes tested?  I added a test that replicates the issue and fails without this change. ## Are there any user-facing changes?  No

## Which issue does this PR close? - Follow on to #20048 ## Rationale for this change While reviewing #20048 and verifying test coverage, it was hard for me to see the test differences (b/c the formatting was not great) ## What changes are included in this PR? Port the tests to use insta rather than `assert_eq` ## Are these changes tested? Yes, only tests ## Are there any user-facing changes?

…some cases leading to incorrect results (apache#20048) (#394)  None  Bug 1: When pushing down limits, we recurse down the physical plan accumulating limits until we reach a node where we can't push the limit down further. At this point, we insert another limit executor (or push it into the current node, if that node supports it). After this, we continue recursing to try to find more limits to push down. If we do find another, we remove it, but we don't set the `GlobalRequirements::satisfied` field back to false, meaning we don't always re-insert this limit. Bug 2: When we're pushing down a limit with a skip/offset and no fetch/limit and we run into a node that supports fetch, we set `GlobalRequirements::satisfied` to true. This is wrong: the limit is not satisfied because fetch doesn't support skip/offset. Instead, we should set `GlobalRequirements::satisfied` to true if skip/offset is 0.  This includes a one-line change to the push down limit logic that fixes the issue.  I added a test that replicates the issue and fails without this change.  No

The LimitPushdown physical optimizer rule removes GlobalLimitExec without pushing the fetch into DataSourceExec, silently ignoring LIMIT on projected MemTable queries. Disable the rule until the upstream fix (apache/datafusion#20048) is released in 52.2. Also unpack dictionary-encoded columns at registration time and update LocalQuery default queries to use log_entries.

…some cases leading to incorrect results (apache#20048) (#394) --- [Cherry-pick summary: v46→v47] Source commit: 9e4cda9 (fix: limit_pushdown removes limits incorrectly (apache#20048) (#394)) Strategy: cherry-picked cleanly Upstream PR: apache#20048 (not in v47) Test coverage: adequate (adds 2 regression tests for the two bugs fixed) Tests: cargo nextest run -p datafusion-physical-optimizer passed

github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Jan 28, 2026

avantgardnerio requested review from Dandandan, alamb, thinkharderdev and xudong963 January 28, 2026 15:40

masonh22 added 2 commits January 28, 2026 11:15

Fix an issue with physical optimizer limit pushdown and add a test

a937814

Simplify the test case

d1fa394

avantgardnerio reviewed Jan 28, 2026

View reviewed changes

Don't set satisfied=true if we need to apply a skip

5ade46f

masonh22 force-pushed the fix-limit-pushdown-fetch branch from a5f6ad4 to 5ade46f Compare January 28, 2026 17:05

Update test trainings

47dd262

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 28, 2026

masonh22 commented Jan 28, 2026

View reviewed changes

masonh22 changed the title ~~fix: The limit_pushdown physical optimization rule removes nested limits in some cases~~ fix: The limit_pushdown physical optimization rule removes limits in some cases leading to incorrect results Jan 28, 2026

avantgardnerio self-requested a review January 28, 2026 18:10

avantgardnerio approved these changes Jan 28, 2026

View reviewed changes

alamb mentioned this pull request Jan 29, 2026

Minor: update tests in limit_pushdown.rs to insta #20066

Merged

alamb approved these changes Jan 29, 2026

View reviewed changes

thinkharderdev added this pull request to the merge queue Jan 30, 2026

Merged via the queue into apache:main with commit 2860ada Jan 30, 2026
33 checks passed

masonh22 added a commit to coralogix/arrow-datafusion that referenced this pull request Jan 30, 2026

Take 2860ada from upstream (apache#20048)

41127c2

masonh22 mentioned this pull request Jan 30, 2026

fix: The limit_pushdown physical optimization rule removes limits in … coralogix/arrow-datafusion#394

Merged

masonh22 added a commit to coralogix/arrow-datafusion that referenced this pull request Jan 30, 2026

Take 2860ada from upstream (apache#20048)

b74fbb6

madesroches mentioned this pull request Feb 12, 2026

Remove LimitPushdown workaround when DataFusion 52.2+ is released madesroches/micromegas#809

Open

-99 82
-99 79
+98 79
+97 96

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: The limit_pushdown physical optimization rule removes limits in some cases leading to incorrect results#20048

fix: The limit_pushdown physical optimization rule removes limits in some cases leading to incorrect results#20048
thinkharderdev merged 4 commits intoapache:mainfrom
coralogix:fix-limit-pushdown-fetch

masonh22 commented Jan 28, 2026 •

edited

Loading

Uh oh!

masonh22 commented Jan 28, 2026

Uh oh!

avantgardnerio left a comment

Uh oh!

masonh22 commented Jan 28, 2026

Uh oh!

masonh22 Jan 28, 2026

Uh oh!

masonh22 Jan 28, 2026

Uh oh!

masonh22 Jan 28, 2026

Uh oh!

avantgardnerio left a comment

Uh oh!

alamb commented Jan 29, 2026

Uh oh!

alamb left a comment

Uh oh!

alamb Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

masonh22 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

masonh22 commented Jan 28, 2026

Uh oh!

avantgardnerio left a comment

Choose a reason for hiding this comment

Uh oh!

masonh22 commented Jan 28, 2026

Uh oh!

masonh22 Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

masonh22 Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

masonh22 Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

avantgardnerio left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Jan 29, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

masonh22 commented Jan 28, 2026 •

edited

Loading