Skip to content

Shared memory support for GPU scratchpad #4

@tetsuo-cpp

Description

@tetsuo-cpp

Summary

Add support for GPU shared memory (__shared__ in CUDA) to enable tiled algorithms.

Motivation

Tiled matmul is the baseline practical GPU implementation. Without shared memory, you're limited to naive matmul which is ~100x slower. Shared memory acts as a programmer-managed cache shared across all threads in a block.

Design considerations

  • Need a way to declare shared memory (size, element type)
  • Need a way to read from and write to shared memory
  • Possible syntax: SHARED <name> <size> declaration + reuse @ !
  • MLIR mapping: memref.alloc in GPU shared address space, or gpu.alloc with appropriate memory space

Implementation notes

  • May require extending !forth.stack or adding a new shared memory type
  • The memref should be in GPU shared address space (address space 3 for NVVM)
  • Allocation should be at kernel entry, not per-thread

Priority

High — required for any practical tiled GPU algorithm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions