-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add warp-level intrinsics for shuffle operations and warp reductions.
Words to implement
| Word | Stack effect | Description |
|---|---|---|
SHFL-DOWN |
( val offset -- result ) |
Warp shuffle down |
SHFL-UP |
( val offset -- result ) |
Warp shuffle up |
SHFL-XOR |
( val mask -- result ) |
Warp shuffle XOR (butterfly) |
SHFL-IDX |
( val idx -- result ) |
Warp shuffle to specific lane |
Motivation
- Needed for high-performance reductions (e.g., sum across a warp without shared memory)
- Used in split-K matmul variants
- Warp-level operations avoid shared memory round-trips
Implementation notes
- Maps to
nvvm.shfl.syncintrinsics in NVVM - Full warp mask (
0xFFFFFFFF) can be the default - May also want
WARP-SIZE(constant 32) andLANE-IDwords
Priority
Nice to have — needed for advanced GPU optimization patterns.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request