[DRAFT] SPMD PP gather weights and write custom vjp by NuojCheng · Pull Request #3071 · AI-Hypercomputer/maxtext

NuojCheng · 2026-02-03T18:05:50Z

Description

This PR refactors the Pipeline Parallelism (PP) core logic to improve efficiency and memory management when using circular pipelining. The key highlights include the introduction of a Buffer Sliding Window (BSW) for weights, the implementation of a custom VJP for scanned pipeline iterations, and support for scanning over pipeline repeats.

Key Changes

1. Configuration & Types

New Config Options: Added scan_pipeline_repeats to allow jax.lax.scan over pipeline repeats.
Mesh Updates: Updated deepseek_batchsplit configuration to include the stage axis in mesh_axes and logical_axis_rules.

2. Pipeline Core (`pipeline.py`)

Buffer Sliding Window (BSW): Introduced BSW to manage weight gathering more efficiently. It maintains a buffer for weight copies that are all-gathered over the FSDP axis using shard_map.
Custom VJP Implementation: Replaced standard nn.scan with a custom-defined VJP for pipeline iterations. This allows for manual gradient checkpointing where the forward pass is re-run during the backward pass to save memory on heavy states.
Loop State Refactor: loop_state now carries bsw and weights through iterations.

3. Layers & Utils

pipeline_utils.py: A new utility module containing helper functions for FSDP axis indexing, logical spec manipulation, and the create_scanned_function factory for the custom VJP.
DeepSeek Support: Updated deepseek_batchsplit.py to ensure megablox.gmm is used when pipeline parallelism is enabled.
Layer Sharding: Modified attention_op.py and moe.py to skip logical rules when using PP, relying instead on the pipeline's sharding logic.

4. Testing

Updated pipeline_parallelism_test.py to reflect changes in supported configurations.
Added skips for non-circular pipelines and FP8 configurations which are currently incompatible with the new BSW/Custom VJP logic.

Implementation Details

The custom VJP (run_scanned_custom_bwd) performs a backward scan to accumulate gradients. It reconstructs the curr_loop_state by combining saved lightweight states with the original bsw and weights before applying jax.vjp to the iteration step function.

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-02-03T18:15:33Z

Codecov Report

❌ Patch coverage is 88.88889% with 13 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/MaxText/layers/pipeline.py	88.88%	7 Missing and 6 partials ⚠️

📢 Thoughts on this report? Let us know!

codecov · 2026-02-09T19:56:31Z

Codecov Report

❌ Patch coverage is 86.56716% with 27 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/utils/pipeline_utils.py	85.18%	8 Missing and 8 partials ⚠️
src/maxtext/layers/pipeline.py	88.23%	5 Missing and 5 partials ⚠️
src/maxtext/layers/decoders.py	80.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

simple fix on debug sharding log add all gather insertion per repeat working all gather insertion clean version fsdp+pp bug free add bsw checkpoint split bsw all gather into two add custom vjp

NuojCheng force-pushed the chengnuojin-pp-separate-weights branch 2 times, most recently from 6c22238 to 28f98ff Compare February 9, 2026 19:29

NuojCheng added pull ready draft Draft PR and removed pull ready labels Feb 9, 2026

NuojCheng force-pushed the chengnuojin-pp-separate-weights branch from 28f98ff to 64b37ff Compare February 9, 2026 22:23

NuojCheng added pull ready and removed pull ready labels Feb 9, 2026

NuojCheng force-pushed the chengnuojin-pp-separate-weights branch 8 times, most recently from a7d38d0 to 9a36099 Compare February 18, 2026 17:16

NuojCheng force-pushed the chengnuojin-pp-separate-weights branch 2 times, most recently from d05c015 to e521a58 Compare February 24, 2026 23:14

NuojCheng added pull ready and removed pull ready labels Feb 24, 2026

gagika and others added 3 commits February 25, 2026 17:38

Enable grain input pipeline save and restore for distillation.

8456095

simple fix on debug sharding log add all gather insertion per repeat working all gather insertion clean version fsdp+pp bug free add bsw checkpoint split bsw all gather into two add custom vjp

enable pp with batch split ds

8424d63

move remat to fwd

94812d9

NuojCheng force-pushed the chengnuojin-pp-separate-weights branch from e521a58 to 94812d9 Compare February 25, 2026 17:38

NuojCheng added 2 commits February 25, 2026 19:18

update custom vjp

6d94f5a

add another layer of custom vjp

286e066

NuojCheng force-pushed the chengnuojin-pp-separate-weights branch from 51e6713 to 286e066 Compare February 26, 2026 00:16

NuojCheng added the pull ready label Feb 26, 2026

NuojCheng removed the pull ready label Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] SPMD PP gather weights and write custom vjp#3071

[DRAFT] SPMD PP gather weights and write custom vjp#3071
NuojCheng wants to merge 5 commits intomainfrom
chengnuojin-pp-separate-weights

NuojCheng commented Feb 3, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NuojCheng commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

1. Configuration & Types

2. Pipeline Core (pipeline.py)

3. Layers & Utils

4. Testing

Implementation Details

Tests

Checklist

Uh oh!

codecov bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NuojCheng commented Feb 3, 2026 •

edited

Loading

2. Pipeline Core (`pipeline.py`)

codecov bot commented Feb 3, 2026 •

edited

Loading

codecov bot commented Feb 9, 2026 •

edited

Loading