[XPU] decouple split_kv_cache and block_attn by RuohengMa · Pull Request #6489 · PaddlePaddle/FastDeploy

RuohengMa · 2026-02-24T03:07:54Z

Motivation

decouple split_kv_cache and block_attn

Modifications

decouple split_kv_cache and block_attn

Usage or Command

decouple split_kv_cache and block_attn

Accuracy Tests

None

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-02-24T03:07:59Z

Thanks for your contribution!

codecov-commenter · 2026-02-24T05:04:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@60e75ea). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6489   +/-   ##
==========================================
  Coverage           ?   68.75%           
==========================================
  Files              ?      391           
  Lines              ?    52809           
  Branches           ?     8225           
==========================================
  Hits               ?    36308           
  Misses             ?    13856           
  Partials           ?     2645

Flag	Coverage Δ
GPU	`68.75% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mayang002 · 2026-02-25T03:00:56Z

custom_ops/xpu_ops/src/ops/block_attn_decouple.cc

+template <typename TC, typename TS>
+struct SplitRopeTypeTrait {
+  using E_Scale = TS;
+  using D_Scale = TS;
+};
+template <>
+struct SplitRopeTypeTrait<bfloat16, bfloat16> {
+  using E_Scale = bfloat16;
+  using D_Scale = float;
+};
+template <>
+struct SplitRopeTypeTrait<int8_t, bfloat16> {
+  using E_Scale = bfloat16;
+  using D_Scale = bfloat16;
+};


这些代码是不是可以删掉了？

mayang002 · 2026-02-25T06:13:40Z

fastdeploy/model_executor/layers/backends/xpu/attention.py

            self.rope_3d,
        )
-
+        


多余的空格

mayang002 · 2026-02-25T06:21:27Z

fastdeploy/model_executor/layers/backends/xpu/attention.py

+        '''
+        # q = q * k_scales_inv
+        if is_cache_int8 and has_zp:
+            if enc_batch > 0 and is_prefix_cache:
+                origin_shape = q_enc.shape
+                q_enc_reshaped = paddle.view(
+                    q_enc,
+                    [total_enc_len, kv_num_heads, num_heads // kv_num_heads, head_dim])
+                q_enc_reshaped = q_enc_reshaped * paddle.view(k_scales_inv, [1, kv_num_heads, 1, head_dim])
+                q_enc = paddle.view(q_enc_reshaped, origin_shape)
+
+                # q_enc_reshaped = paddle.reshape(
+                #     q_enc,
+                #     [total_enc_len, kv_num_heads, num_heads // kv_num_heads, head_dim])
+                # q_enc_reshaped = q_enc_reshaped * paddle.reshape(k_scales_inv, [1, kv_num_heads, 1, head_dim])
+                # q_enc = paddle.reshape(q_enc_reshaped, q_enc.shape)
+            if dec_batch > 0:
+                origin_shape = q_dec.shape
+                q_dec_reshaped = paddle.view(
+                    q_dec,
+                    [total_dec_len, kv_num_heads, num_heads // kv_num_heads, head_dim])
+                q_dec_reshaped = q_dec_reshaped * paddle.view(k_scales_inv, [1, kv_num_heads, 1, head_dim])
+                q_dec = paddle.view(q_dec_reshaped, origin_shape)
+
+                # q_dec_reshaped = paddle.reshape(
+                #     q_dec,
+                #     [total_dec_len, kv_num_heads, num_heads // kv_num_heads, head_dim])
+                # q_dec_reshaped = q_dec_reshaped * paddle.reshape(k_scales_inv, [1, kv_num_heads, 1, head_dim])
+                # q_dec = paddle.reshape(q_dec_reshaped, q_dec.shape)
+        '''


这些看起来是处理 cache 非对称量化的逻辑？应该要保留吧

mayang002 · 2026-02-25T06:21:41Z

fastdeploy/model_executor/layers/backends/xpu/attention.py

+            # if shift:
+            #     out[total_enc_len:, :] = out[total_enc_len:, :] + shift
+            # if smooth:
+            #     out[total_enc_len:, :] = out[total_enc_len:, :] * smooth


这部分代码逻辑同上，是不是原先都是在 C 代码里的？

[XPU] decouple split_kv_cache and block_attn

11f9952

RuohengMa temporarily deployed to Metax_ci February 24, 2026 03:07 — with GitHub Actions Inactive

paddle-bot bot added the XPU label Feb 24, 2026

mayang002 suggested changes Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] decouple split_kv_cache and block_attn#6489

[XPU] decouple split_kv_cache and block_attn#6489
RuohengMa wants to merge 1 commit intoPaddlePaddle:developfrom
RuohengMa:decouple

RuohengMa commented Feb 24, 2026

Uh oh!

paddle-bot bot commented Feb 24, 2026

Uh oh!

codecov-commenter commented Feb 24, 2026

Uh oh!

mayang002 Feb 25, 2026

Uh oh!

mayang002 Feb 25, 2026

Uh oh!

mayang002 Feb 25, 2026

Uh oh!

mayang002 Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RuohengMa commented Feb 24, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Feb 24, 2026

Uh oh!

codecov-commenter commented Feb 24, 2026

Codecov Report

Uh oh!

mayang002 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

mayang002 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

mayang002 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

mayang002 Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants