[XPU] Fix PD + MTP by cmcamdy · Pull Request #6495 · PaddlePaddle/FastDeploy

cmcamdy · 2026-02-24T09:31:48Z

Motivation

P卡上，PD+MTP会hang，本PR为修复此现象

Modifications

1.token_processor.py中，spec模式下对task是否有正常值的判定需要根据accept num大小来判定是否需要提前退出
2.speculate_update.xpu算子与GPU对齐，同时为了适配1，D仍需要在stop flag为true的时候更新seq_lens_decoder，否则当D第一轮直接推出eos时，会导致task一直滞留在槽位中，不推理（execute_model）也不调度（schedule），最终的结果只能是client超时。
3.修正D在第一轮（接收到P的task时）的seq_lens_this_time值，正常值应该是length + 1（P的mtp = 1），修正之后在P卡上会减少一些口吃现象。
4.修正D在第一轮（接收到P的task时）的draft tokens值，需要从request中取出，否则在pre_process中会取出奇怪的token

Usage or Command

Accuracy Tests

4修复前（{"role": "user", "content": "你好，你是谁？"}）：

- 4修复后

21B-A3B TP1，benchmark对比

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-02-24T09:31:53Z

Thanks for your contribution!

codecov-commenter · 2026-02-24T11:19:07Z

Codecov Report

❌ Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@7b1d787). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/gpu_model_runner.py	0.00%	4 Missing and 1 partial ⚠️
...tdeploy/model_executor/xpu_pre_and_post_process.py	0.00%	4 Missing ⚠️
fastdeploy/spec_decode/mtp.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6495   +/-   ##
==========================================
  Coverage           ?   70.41%           
==========================================
  Files              ?      394           
  Lines              ?    53860           
  Branches           ?     8463           
==========================================
  Hits               ?    37925           
  Misses             ?    13204           
  Partials           ?     2731

Flag	Coverage Δ
GPU	`70.41% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zhupengyang

LGTM

Deleter-D

LGTM

Copilot

Pull request overview

这个PR旨在修复XPU卡上PD（Prefill-Decode分离）+ MTP（Multi-Token Prediction）模式下出现的hang问题。主要通过以下修改来实现：

Changes:

修正了speculative decoding模式下的任务判定逻辑，根据accept_num大小判断是否需要提前退出
对齐XPU和GPU的speculate_update算子实现，并在stop flag为true时更新seq_lens_decoder以防止任务滞留
修正了Decoder在第一轮接收Prefill任务时的seq_lens_this_time和draft_tokens值
新增了多个XPU算子以支持speculative decoding的任务恢复和状态更新

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
fastdeploy/worker/xpu_model_runner.py	在PD模式下为D端添加seq_lens_this_time和draft_tokens的初始化；修改skip_save_output逻辑；新增mask_rollback初始化
fastdeploy/worker/gpu_model_runner.py	同步XPU的修改，在PD模式下初始化seq_lens_this_time和draft_tokens
fastdeploy/spec_decode/mtp.py	在PD模式下设置正确的seq_lens_this_time_buffer值
fastdeploy/model_executor/xpu_pre_and_post_process.py	更新speculate_update调用和speculate_save_output参数，支持mask_rollback
custom_ops/xpu_ops/src/plugin/src/wrapper/mtp_wrapper/speculate_update.cpp	新增speculate_update算子的wrapper实现
custom_ops/xpu_ops/src/plugin/src/wrapper/mtp_wrapper/speculate_set_value_by_flags.cpp	修改参数为非const以支持写回操作
custom_ops/xpu_ops/src/plugin/src/wrapper/mtp_wrapper/speculate_set_stop_value_multi_seqs.cpp	新增min_tokens参数支持最小token限制检查
custom_ops/xpu_ops/src/plugin/src/wrapper/mtp_wrapper/recover_spec_decode_task.cpp	新增recover_spec_decode_task算子wrapper
custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/speculate_update.xpu	新增XPU kernel实现speculate_update功能
custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/speculate_set_value_by_flags.xpu	修改逻辑以支持stop_flags的正确处理
custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/speculate_set_stop_value_multi_seqs.xpu	新增min_tokens检查逻辑
custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/speculate_schedule_cache.xpu	修正数据类型和添加内存屏障
custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/recover_spec_decode_task.xpu	新增XPU kernel实现任务恢复功能
custom_ops/xpu_ops/src/plugin/include/xpu/plugin.h	新增函数声明和修改参数类型
custom_ops/xpu_ops/src/ops/recover_decode_task.cc	扩展支持speculative decoding任务恢复
custom_ops/xpu_ops/src/ops/pybind/pybind.cc	更新Python绑定以支持新的算子签名
custom_ops/xpu_ops/src/ops/mtp/speculate_update.cc	新增speculate_update算子实现
custom_ops/xpu_ops/src/ops/mtp/speculate_set_value_by_flags.cc	修改参数为非const
custom_ops/xpu_ops/src/ops/mtp/speculate_set_stop_value_multi_seqs.cc	新增min_tokens参数
custom_ops/xpu_ops/src/ops/mtp/speculate_save_output.cc	扩展参数支持skip_prefill和preempted_idx
custom_ops/gpu_ops/speculate_decoding/speculate_update.cu	添加TODO注释关于seq_lens_decoder更新

fastdeploy/worker/xpu_model_runner.py

custom_ops/xpu_ops/src/ops/recover_decode_task.cc

custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/speculate_update.xpu

.../xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/speculate_set_stop_value_multi_seqs.xpu

fastdeploy/worker/xpu_model_runner.py

custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/speculate_update.xpu

custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/speculate_set_value_by_flags.xpu

custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/speculate_update.xpu

custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/recover_spec_decode_task.xpu

fix pd + mtp

5405f6a

cmcamdy had a problem deploying to Metax_ci February 24, 2026 09:31 — with GitHub Actions Error

fix code style

f8994b7

cmcamdy temporarily deployed to Metax_ci February 24, 2026 09:36 — with GitHub Actions Inactive

cmcamdy temporarily deployed to Metax_ci February 25, 2026 02:05 — with GitHub Actions Inactive

fix PD + MTP, D get P's first token

3be13bb

cmcamdy force-pushed the xpu_mtp_pd branch from 2bf82f7 to 3be13bb Compare February 25, 2026 05:15

cmcamdy had a problem deploying to Metax_ci February 25, 2026 05:16 — with GitHub Actions Error

cmcamdy temporarily deployed to Metax_ci February 25, 2026 05:16 — with GitHub Actions Inactive

add anno for gpu(speculate_update)

cb94a7a

cmcamdy force-pushed the xpu_mtp_pd branch from 9cdde3a to cb94a7a Compare February 25, 2026 07:08

cmcamdy had a problem deploying to Metax_ci February 25, 2026 07:08 — with GitHub Actions Error

Merge branch 'develop' into xpu_mtp_pd

bb4e538

cmcamdy temporarily deployed to Metax_ci February 25, 2026 07:08 — with GitHub Actions Inactive

update draft insertv1

b837468

cmcamdy temporarily deployed to Metax_ci February 26, 2026 04:24 — with GitHub Actions Inactive

Merge branch 'develop' into xpu_mtp_pd

03a7f29

cmcamdy temporarily deployed to Metax_ci February 26, 2026 11:40 — with GitHub Actions Inactive

zhupengyang previously approved these changes Feb 27, 2026

View reviewed changes

Deleter-D previously approved these changes Feb 27, 2026

View reviewed changes

Jiang-Jia-Jun requested a review from Copilot February 27, 2026 08:07

Copilot started reviewing on behalf of Jiang-Jia-Jun February 27, 2026 08:07 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

fix wapper & kernel

0a01cd4

cmcamdy dismissed stale reviews from Deleter-D and zhupengyang via 0a01cd4 February 27, 2026 09:03

cmcamdy had a problem deploying to Metax_ci February 27, 2026 09:03 — with GitHub Actions Error

fix wapper

0e7f78c

cmcamdy had a problem deploying to Metax_ci February 27, 2026 09:08 — with GitHub Actions Error

fix code stype

a881d62

cmcamdy temporarily deployed to Metax_ci February 27, 2026 09:12 — with GitHub Actions Inactive

Jiang-Jia-Jun approved these changes Feb 27, 2026

View reviewed changes

Jiang-Jia-Jun merged commit 1344727 into PaddlePaddle:develop Feb 27, 2026
20 of 24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] Fix PD + MTP#6495

[XPU] Fix PD + MTP#6495
Jiang-Jia-Jun merged 10 commits intoPaddlePaddle:developfrom
cmcamdy:xpu_mtp_pd

cmcamdy commented Feb 24, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Feb 24, 2026

Uh oh!

codecov-commenter commented Feb 24, 2026 •

edited

Loading

Uh oh!

zhupengyang left a comment

Uh oh!

Deleter-D left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

cmcamdy commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Feb 24, 2026

Uh oh!

codecov-commenter commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zhupengyang left a comment

Choose a reason for hiding this comment

Uh oh!

Deleter-D left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cmcamdy commented Feb 24, 2026 •

edited

Loading

codecov-commenter commented Feb 24, 2026 •

edited

Loading