Interoperation between CUDA and OpenGL

Keywords: C++

Interoperability between CUDA and OpenGL usually uses CUDA to generate data, and then renders the corresponding graphics in OpenGL. There are two ways to combine the two:

1. Use PBO (Pixel Buffer Object) in OpenGL. CUDA generates pixel data and OpenGL renders it directly.

2. Use FBO (Vertex Buffer Object) in OpenGL. CUDA generates vertex data and OpenGL rendering.

The core of these two methods is to map the buffer objects in OpenGL to the memory space of CUDA (let CUDA's memory pointer point to the buffer of OpenGL), so that the data in the buffer need not be transferred to the memory of CUDA, then use the high parallel computing performance of CUDA to accelerate the calculation, and finally use OpenGL to render directly.

  

For example, CUDA is used to generate 16 points dynamically according to time and display them on the screen.

Steps:

1. Set up devices that interoperate with OpenGL

status=cudaGLSetGLDevice(0);

2. Registering Buffer Objects in CUDA

status = cudaGLRegisterBufferObject(this->VBO);

Mapping Buffer Objects: Let CUDA Memory Pointer Point to the Space Corresponding to Buffer Objects

// Mapping buffer object
    float4* position;
    status=cudaGLMapBufferObject((void**)&position, this->VBO);

4. Running Kernel Functions

// Running Kernel Function
    dim3 dimBlock(4, 4, 1);
    dim3 dimGrid(1);
    KernelFunc<<<dimGrid, dimBlock>>>(position, clock(), 4, 4);
    cudaThreadSynchronize(); //Synchronous thread

5. Unmapping

status=cudaGLUnmapBufferObject(this->VBO);

Effects:

  

Note: When the kernel function of CUDA modifies the space pointed by the CUDA pointer beyond the size of the OpenGL buffer object, the subsequent cancellation mapping fails. (The CUDA pointer here is mapped to the OpenGL buffer object)

The complete code is as follows:

  

#include "GenVertex.cuh"
#include <time.h>


__global__ void KernelFunc(float4* position, float time, unsigned int width, unsigned int height)
{
    unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
    unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;
    float u = x / (float)width;
    float v = y / (float)height;
    u = u*2.0f - 1.0f;
    v = v*2.0f - 1.0f;
    float freq = 4.0f;
    float w = sinf(u*freq + time*0.001f)*cosf(v*freq + time*0.001f)*0.5f;
    position[y*width + x] = make_float4(u*10, w*10, v*10, 1.0f);
}

GenVertex::GenVertex()
{
    this->setup();
}

GenVertex::~GenVertex()
{
}

void GenVertex::setup() {
    cudaError_t status;
    //devices setting up
    status=cudaGLSetGLDevice(0);
    if (status != cudaSuccess) {
        puts("setup Device failed!");
    }
}

void GenVertex::setVBO(unsigned int vbo) {
    this->VBO = vbo;
    cudaError_t status;
    status = cudaGLRegisterBufferObject(this->VBO);
    if (status != cudaSuccess) {
        puts("Register buffer object failed!");
    }
}


void GenVertex::createVtxWithCuda()
{
    cudaError_t status;
    // Mapping buffer object
    float4* position;
    status=cudaGLMapBufferObject((void**)&position, this->VBO);
    if (status != cudaSuccess) {
        puts("map buffer object failed!");
    }
    // Running Kernel Function
    dim3 dimBlock(4, 4, 1);
    dim3 dimGrid(1);
    KernelFunc<<<dimGrid, dimBlock>>>(position, clock(), 4, 4);
    cudaThreadSynchronize(); //Synchronous thread
    status=cudaGLUnmapBufferObject(this->VBO);
    if (status != cudaSuccess) {
        puts("unmap buffer object failed!");
    }
}

Posted by DarrenL on Wed, 02 Oct 2019 15:52:05 -0700