Remove non-blockwise 8-bit optimizer and legacy quantization CUDA/HIP kernels#1880
Remove non-blockwise 8-bit optimizer and legacy quantization CUDA/HIP kernels#1880matthewdouglas merged 1 commit intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
matthewdouglas
left a comment
There was a problem hiding this comment.
PR Review: #1880 — Remove non-blockwise 8-bit optimizer and legacy quantization CUDA/HIP kernels
Follow-up to #1871: mechanically removes the C++/CUDA/HIP implementations of functions whose Python-side APIs were already deleted. The scope is exactly right — only code with confirmed-zero Python callers is deleted, and the blockwise variants are untouched throughout.
No blocking issues.
What was checked:
-
Python caller verification:
grepacross the entire Python tree confirms zero remaining references to all removed symbols (optimizer_update_8bit,percentile_clipping,cquantize,cdequantize,cadam_static_8bit_*, etc.). The deletions are safe. -
Blockwise path untouched:
optimizerStatic8bitBlockwise,kOptimizerStatic8bit1StateBlockwise,kOptimizerStatic8bit2StateBlockwise, and all their header declarations/instantiations remain in place. Verified in bothkernels.cuandkernels.hip. -
32-bit variant not accidentally removed: The remaining
kPreconditionOptimizer32bit1State/kPreconditionOptimizer32bit2Statereferences in the diff are the 32-bit ops-path kernels, which are distinct from the deleted 8-bit static ones. No over-deletion. -
CUDA/HIP symmetry: Both backends receive identical removals (quantize/dequantize launchers, optimizerStatic8bit, percentileClipping, template instantiations, header declarations). ✓
-
The +1 in
csrc/ops.cu: Fixes indentation onMAKE_optimizerStatic8bitBlockwise(half, ADAM);which was over-indented inside the now-removedMAKE_optimizerStatic8bitblock. Correct. -
Agent docs:
api_surface.md(parameter table and deprecated-symbols table),architecture_guide.md(ops list and dispatch pseudocode), andsecurity_guide.md(trusted hot paths) are all consistently updated. The architecture guide's dispatch pseudocode now accurately shows only theuint8→ blockwise branch. -
Security: Clear (internal author, no new dependencies)
-
Downstream impact: None (Python API already removed in #1871; no new breakage)
-
Tests: Adequate (no Python callers → no test changes needed; full suite passes)
-
CI: All pass (CUDA 11.8–13.x, ROCm 6.2–7.2, CPU x64/aarch64/macOS/Windows, XPU, Lint)
-
Serialization: Not affected
-
torch.compile: Not affected
Summary
Follows up on #1871 (which removed the Python-side APIs) by deleting the corresponding C++, CUDA, and HIP implementation code. No Python callers remain for any of this.
What was removed
CUDA/HIP kernels (
csrc/kernels.cu,csrc/kernels.hip):kPreconditionOptimizerStatic8bit1StatekOptimizerStatic8bit1StatekPreconditionOptimizerStatic8bit2StatekOptimizerStatic8bit2StatekPercentileClipping<float/half, 2048, 4>kQuantize/kDequantize(global/non-blockwise variants)#definemacros and explicit template instantiation blocksOps-level launchers (
csrc/ops.cu,csrc/ops.hip):optimizerStatic8bit<T, OPTIMIZER>percentileClipping<T>()quantize()/dequantize()Headers (
csrc/kernels.cuh,csrc/kernels_hip.cuh,csrc/ops.cuh,csrc/ops_hip.cuh):Python interface (
csrc/pythonInterface.cpp):MAKE_FUNC8+ 8 instantiations (adam/momentum/rmsprop/lion × fp32/fp16)MAKE_CFUNC8+ 8 C-exported wrappers (cadam_static_8bit_grad_32/16, etc.)cquantize/cdequantizecpercentile_clipping_g32/cpercentile_clipping_g16The blockwise variants (
optimizerStatic8bitBlockwise,kOptimizerStatic8bit1StateBlockwise,kOptimizerStatic8bit2StateBlockwise) are untouched.Agent guide updates
Removed stale references to
percentile_clipping,block_wiseparameter,optimizer_update_8bit(non-blockwise), and the non-blockwise dispatch branch fromagents/api_surface.md,agents/architecture_guide.md, andagents/security_guide.md.