In this chapter, we'll create a basic CUDA program that performs vector addition using the GPU. The program consists of a CUDA kernel that adds corresponding elements from two input vectors and stores the result in an output vector.
Let's go through the code step by step:
-
Kernel Function (
vectorAdd):- The
vectorAddfunction is the heart of our CUDA program. It runs on the GPU and performs the vector addition. - It takes four arguments:
A: Pointer to the first input vector.B: Pointer to the second input vector.C: Pointer to the output vector (where the result will be stored).size: Size of the vectors (number of elements).
- Inside the kernel, each thread computes the sum of corresponding elements from
AandBand stores the result inC.
- The
-
Main Function:
- The
mainfunction sets up the host (CPU) and device (GPU) memory, initializes input vectors, launches the kernel, and retrieves the result. - Key steps in the
mainfunction:- Allocate memory for vectors (
h_A,h_B, andh_C) on the host. - Initialize input vectors (
h_Aandh_B) with sample values. - Allocate memory for vectors (
d_A,d_B, andd_C) on the device (GPU). - Copy data from host to device using
cudaMemcpy. - Launch the
vectorAddkernel with appropriate block and grid dimensions. - Copy the result back from the device to the host.
- Print the result (output vector
h_C).
- Allocate memory for vectors (
- The
-
Memory Allocation and Transfer:
- We allocate memory for vectors on both the host and the device.
cudaMallocallocates memory on the device.cudaMemcpytransfers data between host and device.
-
Kernel Launch:
- We launch the
vectorAddkernel using<<<numBlocks, threadsPerBlock>>>syntax. numBlocksandthreadsPerBlockdetermine the grid and block dimensions.
- We launch the
-
Clean Up:
- We free the allocated device memory using
cudaFree. - We also delete the host vectors (
h_A,h_B, andh_C) to avoid memory leaks.
- We free the allocated device memory using
-
Compile the Code:
- Open your terminal or command prompt.
- Navigate to the folder containing
vector_addition.cu. - Compile the code using
nvcc(NVIDIA CUDA Compiler):nvcc vector_addition.cu -o vector_addition
-
Run the Executable:
- Execute the compiled binary:
./vector_addition - You'll see the result of vector addition printed to the console.
- Execute the compiled binary:

