Remove non-blockwise 8-bit optimizer and legacy quantization CUDA/HIP kernels by matthewdouglas · Pull Request #1880 · bitsandbytes-foundation/bitsandbytes

matthewdouglas · 2026-02-23T19:53:26Z

Summary

Follows up on #1871 (which removed the Python-side APIs) by deleting the corresponding C++, CUDA, and HIP implementation code. No Python callers remain for any of this.

What was removed

CUDA/HIP kernels (csrc/kernels.cu, csrc/kernels.hip):

kPreconditionOptimizerStatic8bit1State
kOptimizerStatic8bit1State
kPreconditionOptimizerStatic8bit2State
kOptimizerStatic8bit2State
kPercentileClipping<float/half, 2048, 4>
kQuantize / kDequantize (global/non-blockwise variants)
All associated #define macros and explicit template instantiation blocks

Ops-level launchers (csrc/ops.cu, csrc/ops.hip):

optimizerStatic8bit<T, OPTIMIZER>
percentileClipping<T>()
quantize() / dequantize()

Headers (csrc/kernels.cuh, csrc/kernels_hip.cuh, csrc/ops.cuh, csrc/ops_hip.cuh):

All declarations for the above

Python interface (csrc/pythonInterface.cpp):

MAKE_FUNC8 + 8 instantiations (adam/momentum/rmsprop/lion × fp32/fp16)
MAKE_CFUNC8 + 8 C-exported wrappers (cadam_static_8bit_grad_32/16, etc.)
cquantize / cdequantize
cpercentile_clipping_g32 / cpercentile_clipping_g16

The blockwise variants (optimizerStatic8bitBlockwise, kOptimizerStatic8bit1StateBlockwise, kOptimizerStatic8bit2StateBlockwise) are untouched.

Agent guide updates

Removed stale references to percentile_clipping, block_wise parameter, optimizer_update_8bit (non-blockwise), and the non-blockwise dispatch branch from agents/api_surface.md, agents/architecture_guide.md, and agents/security_guide.md.

github-actions · 2026-02-23T19:58:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

matthewdouglas

PR Review: #1880 — Remove non-blockwise 8-bit optimizer and legacy quantization CUDA/HIP kernels

Follow-up to #1871: mechanically removes the C++/CUDA/HIP implementations of functions whose Python-side APIs were already deleted. The scope is exactly right — only code with confirmed-zero Python callers is deleted, and the blockwise variants are untouched throughout.

No blocking issues.

What was checked:

Python caller verification: grep across the entire Python tree confirms zero remaining references to all removed symbols (optimizer_update_8bit, percentile_clipping, cquantize, cdequantize, cadam_static_8bit_*, etc.). The deletions are safe.
Blockwise path untouched: optimizerStatic8bitBlockwise, kOptimizerStatic8bit1StateBlockwise, kOptimizerStatic8bit2StateBlockwise, and all their header declarations/instantiations remain in place. Verified in both kernels.cu and kernels.hip.
32-bit variant not accidentally removed: The remaining kPreconditionOptimizer32bit1State / kPreconditionOptimizer32bit2State references in the diff are the 32-bit ops-path kernels, which are distinct from the deleted 8-bit static ones. No over-deletion.
CUDA/HIP symmetry: Both backends receive identical removals (quantize/dequantize launchers, optimizerStatic8bit, percentileClipping, template instantiations, header declarations). ✓
The +1 in csrc/ops.cu: Fixes indentation on MAKE_optimizerStatic8bitBlockwise(half, ADAM); which was over-indented inside the now-removed MAKE_optimizerStatic8bit block. Correct.
Agent docs: api_surface.md (parameter table and deprecated-symbols table), architecture_guide.md (ops list and dispatch pseudocode), and security_guide.md (trusted hot paths) are all consistently updated. The architecture guide's dispatch pseudocode now accurately shows only the uint8 → blockwise branch.
Security: Clear (internal author, no new dependencies)
Downstream impact: None (Python API already removed in #1871; no new breakage)
Tests: Adequate (no Python callers → no test changes needed; full suite passes)
CI: All pass (CUDA 11.8–13.x, ROCm 6.2–7.2, CPU x64/aarch64/macOS/Windows, XPU, Lint)
Serialization: Not affected
torch.compile: Not affected

Remove unused kernels

6018d9a

matthewdouglas added this to the v0.50.0 milestone Feb 23, 2026

matthewdouglas added ROCm CUDA Issues and PRs related to the CUDA backend, excluding installation/support help. labels Feb 23, 2026

matthewdouglas commented Feb 23, 2026

View reviewed changes

matthewdouglas merged commit d5cb0f2 into main Feb 23, 2026
227 of 229 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove non-blockwise 8-bit optimizer and legacy quantization CUDA/HIP kernels#1880

Remove non-blockwise 8-bit optimizer and legacy quantization CUDA/HIP kernels#1880
matthewdouglas merged 1 commit intomainfrom
remove-unused-kernels

matthewdouglas commented Feb 23, 2026

Uh oh!

github-actions bot commented Feb 23, 2026

Uh oh!

matthewdouglas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

matthewdouglas commented Feb 23, 2026

Summary

What was removed

Agent guide updates

Uh oh!

github-actions bot commented Feb 23, 2026

Uh oh!

matthewdouglas left a comment

Choose a reason for hiding this comment

PR Review: #1880 — Remove non-blockwise 8-bit optimizer and legacy quantization CUDA/HIP kernels

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant