Interoperability between CUDA and OpenGL usually uses CUDA to generate data, and then renders the corresponding graphics in OpenGL. There are two ways to combine the two:
1. Use PBO (Pixel Buffer Object) in OpenGL. CUDA generates pixel data and OpenGL renders it directly.
2. Use FBO (Vertex Buffer Object) in OpenGL. CUDA generates vertex data and OpenGL rendering.
The core of these two methods is to map the buffer objects in OpenGL to the memory space of CUDA (let CUDA's memory pointer point to the buffer of OpenGL), so that the data in the buffer need not be transferred to the memory of CUDA, then use the high parallel computing performance of CUDA to accelerate the calculation, and finally use OpenGL to render directly.
For example, CUDA is used to generate 16 points dynamically according to time and display them on the screen.
Steps:
1. Set up devices that interoperate with OpenGL
status=cudaGLSetGLDevice(0);
2. Registering Buffer Objects in CUDA
status = cudaGLRegisterBufferObject(this->VBO);
Mapping Buffer Objects: Let CUDA Memory Pointer Point to the Space Corresponding to Buffer Objects
// Mapping buffer object float4* position; status=cudaGLMapBufferObject((void**)&position, this->VBO);
4. Running Kernel Functions
// Running Kernel Function dim3 dimBlock(4, 4, 1); dim3 dimGrid(1); KernelFunc<<<dimGrid, dimBlock>>>(position, clock(), 4, 4); cudaThreadSynchronize(); //Synchronous thread
5. Unmapping
status=cudaGLUnmapBufferObject(this->VBO);
Effects:
Note: When the kernel function of CUDA modifies the space pointed by the CUDA pointer beyond the size of the OpenGL buffer object, the subsequent cancellation mapping fails. (The CUDA pointer here is mapped to the OpenGL buffer object)
The complete code is as follows:
#include "GenVertex.cuh" #include <time.h> __global__ void KernelFunc(float4* position, float time, unsigned int width, unsigned int height) { unsigned int x = blockIdx.x*blockDim.x + threadIdx.x; unsigned int y = blockIdx.y*blockDim.y + threadIdx.y; float u = x / (float)width; float v = y / (float)height; u = u*2.0f - 1.0f; v = v*2.0f - 1.0f; float freq = 4.0f; float w = sinf(u*freq + time*0.001f)*cosf(v*freq + time*0.001f)*0.5f; position[y*width + x] = make_float4(u*10, w*10, v*10, 1.0f); } GenVertex::GenVertex() { this->setup(); } GenVertex::~GenVertex() { } void GenVertex::setup() { cudaError_t status; //devices setting up status=cudaGLSetGLDevice(0); if (status != cudaSuccess) { puts("setup Device failed!"); } } void GenVertex::setVBO(unsigned int vbo) { this->VBO = vbo; cudaError_t status; status = cudaGLRegisterBufferObject(this->VBO); if (status != cudaSuccess) { puts("Register buffer object failed!"); } } void GenVertex::createVtxWithCuda() { cudaError_t status; // Mapping buffer object float4* position; status=cudaGLMapBufferObject((void**)&position, this->VBO); if (status != cudaSuccess) { puts("map buffer object failed!"); } // Running Kernel Function dim3 dimBlock(4, 4, 1); dim3 dimGrid(1); KernelFunc<<<dimGrid, dimBlock>>>(position, clock(), 4, 4); cudaThreadSynchronize(); //Synchronous thread status=cudaGLUnmapBufferObject(this->VBO); if (status != cudaSuccess) { puts("unmap buffer object failed!"); } }