• Support Home
  • Getting Started
    • Connecting Your Camera
    • 3rd Party Software Getting Started Guides
  • Tech Ref Manuals
    • Arena SDK Documentation
    • HTP003S – Helios2+ ToF 3D
    • HLT003S – Helios2 ToF 3D
    • HLS003S – Helios ToF 3D
    • HLF003S – Helios Flex ToF 3D
    • ATX245S – Atlas10 24.5 MP
    • ATX204S – Atlas10 20.4 MP
    • ATX162S – Atlas10 16.2 MP
    • ATX124S – Atlas10 12.3 MP
    • ATX081S – Atlas10 8.1 MP
    • ATX051S – Atlas10 5.0 MP
    • ATL314S – Atlas 31.4 MP
    • ATL196S – Atlas 19.6 MP
    • ATL168S – Atlas 16.8 MP
    • ATL120S – Atlas 12.3 MP
    • ATL089S – Atlas 8.9 MP
    • ATL071S – Atlas 7.1 MP
    • ATL050S – Atlas 5.0 MP
    • ATL028S – Atlas 2.8 MP
    • ATP200S – Atlas IP67 20 MP
    • ATP120S – Atlas IP67 12.3 MP
    • ATP089S -Atlas IP67 8.9 MP
    • ATP071S – Atlas IP67 7.1 MP
    • ATP028S – Atlas IP67 2.8 MP
    • TRI200S – Triton 20.0 MP
    • TRI120S – Triton 12.3 MP
    • TRI122S – Triton 12.2 MP
    • TRI089S – Triton 8.9 MP
    • TRI071S – Triton 7.1 MP
    • TRI064S – Triton 6.3 MP
    • TRI054S – Triton 5.4 MP
    • TRI050S-P/Q – Triton 5.0 MP Polarized
    • TRI050S – Triton 5.0 MP
    • TRI032S – Triton 3.2 MP
    • TRI028S – Triton 2.8 MP
    • TRI023S – Triton 2.3 MP
    • TRI016S – Triton 1.6 MP
    • TRI005S – Triton 0.5 MP
    • TRI004S – Triton 0.4 MP
    • TRI02KA – Triton 2K Line Scan
    • PHX200S – Phoenix 20.0 MP
    • PHX120S – Phoenix 12.3 MP
    • PHX122S – Phoenix 12.2 MP
    • PHX089S – Phoenix 8.9 MP
    • PHX064S – Phoenix 6.3 MP
    • PHX050S-P/Q – Phoenix 5.0 MP Polarized
    • PHX050S – Phoenix 5.0 MP
    • PHX032S – Phoenix 3.2 MP
    • PHX023S – Phoenix 2.3 MP
    • PHX016S – Phoenix 1.6 MP
    • PHX004S – Phoenix 0.4 MP
  • App Notes
    • Bandwidth Sharing in Multi-Camera Systems
    • Combine Helios 3D Point Cloud with RGB Color
    • I2C Support on LUCID Cameras
    • Using Helios2 with the Point Cloud Library for Dimensioning
    • Using GPIO on LUCID Cameras
    • Using PTP & Scheduled Action Commands
    • Helios2 And Triton Synchronization
  • Knowledge Base
  • PCNs
  • Contact Support
  • Log In
  • Support Home
  • Getting Started
    • Connecting Your Camera
    • 3rd Party Software Getting Started Guides
  • Tech Ref Manuals
    • Arena SDK Documentation
    • HTP003S – Helios2+ ToF 3D
    • HLT003S – Helios2 ToF 3D
    • HLS003S – Helios ToF 3D
    • HLF003S – Helios Flex ToF 3D
    • ATX245S – Atlas10 24.5 MP
    • ATX204S – Atlas10 20.4 MP
    • ATX162S – Atlas10 16.2 MP
    • ATX124S – Atlas10 12.3 MP
    • ATX081S – Atlas10 8.1 MP
    • ATX051S – Atlas10 5.0 MP
    • ATL314S – Atlas 31.4 MP
    • ATL196S – Atlas 19.6 MP
    • ATL168S – Atlas 16.8 MP
    • ATL120S – Atlas 12.3 MP
    • ATL089S – Atlas 8.9 MP
    • ATL071S – Atlas 7.1 MP
    • ATL050S – Atlas 5.0 MP
    • ATL028S – Atlas 2.8 MP
    • ATP200S – Atlas IP67 20 MP
    • ATP120S – Atlas IP67 12.3 MP
    • ATP089S -Atlas IP67 8.9 MP
    • ATP071S – Atlas IP67 7.1 MP
    • ATP028S – Atlas IP67 2.8 MP
    • TRI200S – Triton 20.0 MP
    • TRI120S – Triton 12.3 MP
    • TRI122S – Triton 12.2 MP
    • TRI089S – Triton 8.9 MP
    • TRI071S – Triton 7.1 MP
    • TRI064S – Triton 6.3 MP
    • TRI054S – Triton 5.4 MP
    • TRI050S-P/Q – Triton 5.0 MP Polarized
    • TRI050S – Triton 5.0 MP
    • TRI032S – Triton 3.2 MP
    • TRI028S – Triton 2.8 MP
    • TRI023S – Triton 2.3 MP
    • TRI016S – Triton 1.6 MP
    • TRI005S – Triton 0.5 MP
    • TRI004S – Triton 0.4 MP
    • TRI02KA – Triton 2K Line Scan
    • PHX200S – Phoenix 20.0 MP
    • PHX120S – Phoenix 12.3 MP
    • PHX122S – Phoenix 12.2 MP
    • PHX089S – Phoenix 8.9 MP
    • PHX064S – Phoenix 6.3 MP
    • PHX050S-P/Q – Phoenix 5.0 MP Polarized
    • PHX050S – Phoenix 5.0 MP
    • PHX032S – Phoenix 3.2 MP
    • PHX023S – Phoenix 2.3 MP
    • PHX016S – Phoenix 1.6 MP
    • PHX004S – Phoenix 0.4 MP
  • App Notes
    • Bandwidth Sharing in Multi-Camera Systems
    • Combine Helios 3D Point Cloud with RGB Color
    • I2C Support on LUCID Cameras
    • Using Helios2 with the Point Cloud Library for Dimensioning
    • Using GPIO on LUCID Cameras
    • Using PTP & Scheduled Action Commands
    • Helios2 And Triton Synchronization
  • Knowledge Base
  • PCNs
  • Contact Support
  • Log In
home/Knowledge Base/Arena Software/User-defined buffers in the Arena SDK

User-defined buffers in the Arena SDK

27 views 0 May 21, 2026 Updated on May 22, 2026

Overview

This knowledge base article explains how to use user-defined buffers with Arena SDK. User-defined buffers allow you to provide your own memory pointer to the Arena SDK, rather than having the SDK automatically allocate memory. This gives you control over where image data resides. When combined with GPU memory and GPUDirect RDMA hardware, this enables zero-copy workflows where the camera data is written directly to GPU VRAM.

Note: For information on choosing an appropriate buffer count, see the companion KB article How to choose an appropriate buffer count for the Arena SDK.

Prerequisites

For user-defined buffers (basic):

  • Arena SDK: Version with user-defined buffer support.
  • Memory: Any valid memory pointer (Malloc, cudaMalloc, mmap, etc.)

For zero-copy GPUDirect RDMA (advanced):

  • Network Interface Card: RDMA-capable NIC with GPUDirect RDMA support.
  • GPU: NVIDIA GPU with CUDA support.
  • Software: For NVIDIA: CUDA Toolkit installed.
  • Drivers: For NVIDIA: nvidia-peermem kernel module.
  • Network: Proper RDMA configuration on NIC and network infrastructure (RoCEv2).
  • Operating System: Ubuntu 24.04 LTS (Noble Numbat)

Important: Not all RDMA_capable NICs support GPUDirect RDMA. The NIC must specifically support peer-to-peer DMA and have drivers that integrate with CUDA.

Memory Location

The fundamental difference between the classic StartStream call and user-defined buffers is where that memory physically resides in your computer.

Classic StartStream (automatic allocation)

pDevice->StartStream(10);
  • ArenaSDK internally calls void* buffers = malloc(10 * bufferSize);
  • Physical location: System RAM.
  • Accessible by: GPU, but only after cudaMemcpy (transfer over PCIe).
  • Total allocation: 10 buffers x 12MB = 120MB in system RAM.

User-defined buffers with system RAM

void* myBuffers = malloc(120MB);
// Provide your system memory pointer to the Arena SDK
pDevice->StartStream(bufferList);
  • Provide your system memory pointer to ArenaSDK pDevice->StartStream.
  • Physical location: System RAM, which is the same as the classic approach. You control allocation, ArenaSDK uses your memory.
  • Still requires cudaMemcpy for GPU processing.

User-defined buffers with GPU memory (no RDMA)

void* gpuBuffers;
cudaMalloc(&gpuBuffers, 120MB);
pDevice->StartStream(gpuBufferList);
  • Physical location: GPU VRAM (on graphics card).
  • Data flow: Camera → NiC → System RAM → PCIe → GPU VRAM.
  • Internal copy from system RAM to GPU is still still required.
  • Arena writes to GPU memory, but data passes through the CPU first.

User-defined buffers with GPUDirect RDMA

void* gpuBuffers;
cudaMalloc(&gpuBuffers, 120MB);
pDevice->StartStream(gpuBufferList);
  • RDMA-capable NIC + nvidia-peermem loaded.
  • Physical location: GPU VRAM (on graphics card).
  • Data flow: Camera → NIC → PCIe → GPU VRAM.
  • Zero-copy: NIC writes directly to GPU, bypassing system RAM entirely.

Understanding the components

User-defined buffers (Arena SDK feature)

  • You provide a memory pointer, Arena SDK uses it instead of allocating its own.
  • Works with ANY memory type: system RAM (malloc), GPU memory (cudaMalloc), pinned memory (cudaMallocHost), shared memory (mmap).
  • Does NOT require RDMA or GPU.
  • Use case: Control memory allocation strategy.

GPU memory (CUDA feature)

  • Memory allocated on GPU using cudaMalloc().
  • Does not automatically enable zero-copy; without RDMA, data still goes through system RAM first.
  • Use case: Keep data on GPU for processing, reduce copy operations.

GPUDirect RDMA (NIC vendor feature)

  • Technology that allows NIC to write directly to GPU memory, bypassing CPU and system RAM entirely.
  • Requires RDMA-capable NIC with GPUDirect support (Mellanox/NVIDIA, Broadcom, or Intel models), NVIDIA-peermem (NIC-agnostic) driver loaded, GPU memory allocated with cudaMalloc.
  • Must be combined with user-defined buffers pointing to GPU memory.
  • Use case: zero-copy workflows for GPU-accelerated processing.

UserSuppliedBuffer API

The UserSuppliedBuffer struct is how you provide buffers to the Arena SDK:

struct UserSuppliedBuffer
{
    uint8_t* pData;      // Pointer to the actual buffer
    size_t bufferSize;   // Size of the buffer in bytes
    void* userContext;   // Optional handle for user metadata
    UserSuppliedBuffer(uint8_t* data, size_t size, 
                       void* ctx = nullptr);
};
  • pData: Pointer to your allocated buffer (system RAM or GPU VRAM).
  • bufferSize: Size of this buffer in bytes (typically from PayloadSize node).
  • userContext: Optional metadata pointer that travels with this buffer (retrieved via GetPrivateDataPtr()).

Functional signature

void StartStream(const std::vector& bufferList);

Example code

// Step 1: Get buffer size
size_t bufferSize = Arena::GetNodeValue(
    pDevice->GetNodeMap(), "PayloadSize");
const int numBuffers = 10;
// Step 2: Allocate GPU memory pool
void* d_bufferPool;
cudaError_t err = cudaMalloc(&d_bufferPool, 
                              bufferSize * numBuffers);
if (err != cudaSuccess) {
    std::cerr << "GPU allocation failed: " 
              << cudaGetErrorString(err) << std::endl;
    return -1;
}
// Step 3: Create buffer list with GPU pointers
std::vector bufferList;
bufferList.reserve(numBuffers);
for (int i = 0; i < numBuffers; i++) {
    uint8_t* bufferPtr = 
        static_cast(d_bufferPool) + 
        (i * bufferSize);
    bufferList.emplace_back(bufferPtr, bufferSize);
}
// Step 4: Start streaming
pDevice->StartStream(bufferList);
// Step 5: Get images
for (int i = 0; i < 100; i++) {
    Arena::IImage* pImage = pDevice->GetImage(2000);
    // At this point, image data is in GPU memory
    // pImage->GetData() returns a GPU memory address
    //////////////////////////////////////////////////////////////////////////
    // STILL NEED TO PROCESS THIS DATA ON THE GPU - SEE CUDA KERNEL SECTION //
    //////////////////////////////////////////////////////////////////////////
    pDevice->RequeueBuffer(pImage);
}
// Step 6: Cleanup
pDevice->StopStream();
bufferList.clear();
cudaFree(d_bufferPool);

Understanding GPU processing with CUDA kernels

The most common way to process GPU data is with CUDA kernels. A CUDA kernel is a function that runs on the GPU and processes data in parallel across thousands of threads. In the context of image processing, each thread typically handles one pixel, allowing the entire image to be processed simultaneously.

__global__ void processKernel(uint8_t* image, size_t width, size_t height) {
    // Calculate which pixel this thread processes
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    if (x < width && y < height) {
        size_t idx = y * width + x;
        image[idx] = 255 - image[idx];  // Process this pixel
    }
}
  • __global__: Keyword indicating this function runs on the GPU but is called from CPU code
  • uint8_t* image: Pointer to image data. With user-defined GPU buffers, this points to GPU VRAM (not system RAM)
  • size_t width, height: Image dimensions passed to the kernel
  • blockIdx.x, blockIdx.y: Which block (tile) this thread belongs to in the grid
  • blockDim.x, blockDim.y: Number of threads per block in each dimension (16×16 = 256 threads)
  • threadIdx.x, threadIdx.y: This thread’s position within its block (0-15 in each dimension)
  • int x, y: The pixel coordinates this thread will process, calculated from block and thread indices
  • if (x < width && y < height): Bounds check to ensure thread only processes pixels inside the image
  • size_t idx = y * width + x: Converts 2D pixel coordinates to 1D array index (images stored row-by-row)
  • image[idx] = 255 - image[idx]: The actual processing (this example inverts pixel values)

Once you have defined your CUDA kernel, it will need to be launched from the CPU:

// Get image from GPU buffer
Arena::IImage* pImage = pDevice->GetImage(2000);
 
// Configure kernel launch parameters
dim3 threads(16, 16);
dim3 blocks((pImage->GetWidth() + 15) / 16, 
            (pImage->GetHeight() + 15) / 16);
 
// Launch kernel
processKernel<<>>(
    const_cast(pImage->GetData()),
    pImage->GetWidth(),
    pImage->GetHeight());
 
// Wait for GPU to finish
cudaDeviceSynchronize();
 
// Safe to requeue now
pDevice->RequeueBuffer(pImage);
  • dim3 threads(16, 16): Defines the number of threads per block. Creates 16×16 = 256 threads per block. Each block processes a 16×16 tile of the image.
  • dim3 blocks((width + 15) / 16, (height + 15) / 16): Calculates how many blocks are needed to cover the entire image. The formula (width + 15) / 16 rounds up to ensure all pixels are covered. For a 1024×768 image, this creates 64×48 = 3,072 blocks.
  • <<<blocks, threads>>>: CUDA kernel launch syntax. The triple angle brackets specify the grid configuration: blocks defines how many blocks to create, threads defines how many threads per block. This launches 3,072 blocks × 256 threads = 786,432 parallel threads.
  • const_cast<uint8_t*>(pImage->GetData()): Removes the const qualifier from the image data pointer. With user-defined GPU buffers, GetData() returns a GPU memory address (not system RAM). The const_cast is needed because the kernel will modify the data.
  • pImage->GetWidth(), pImage->GetHeight(): Pass image dimensions to the kernel so each thread knows the image boundaries.
  • cudaDeviceSynchronize(): Blocks CPU execution until the GPU kernel completes. Kernel launches are asynchronous (they return immediately without waiting). You must synchronize before requeuing the buffer, otherwise you’ll requeue while the GPU is still processing it, causing undefined behavior.
  • pDevice->RequeueBuffer(pImage): Returns the buffer to Arena SDK for reuse.

Working with buffer metadata (userContext)

The userContext field in UserSuppliedBuffer allows you to attach metadata to each buffer. This metadata pointer travels with the buffer and can be retrieved via GetPrivateDataPtr() when you get an image. The userContext is an opaque pointer; Arena SDK stores it and returns it to you unchanged, but never reads or validates the data. This is useful for tracking which buffer is being used, collecting statistics, or debugging buffer reuse patterns.

// Create metadata (this example uses simple integers)
int bufferIDs[10];
for (int i = 0; i < 10; i++) {
    bufferIDs[i] = i;  // Metadata: just the buffer number
}
// Attach metadata when creating buffers
bufferList.emplace_back(
    bufferPtr,        // Buffer pointer
    bufferSize,       // Buffer size  
    &bufferIDs[i]     // Metadata pointer (third parameter)
);
// Later, retrieve metadata
Arena::IImage* pImage = pDevice->GetImage(timeout);
int* bufferID = (int*)pImage->GetPrivateDataPtr();
printf("Using buffer %d\n", *bufferID);
		

Was this helpful?

Yes  No
Related Articles
  • How to choose an appropriate buffer count for the Arena SDK
  • Opening Raw Images in ArenaView MP
  • Solving driver-related RDMA streaming issues with Ubuntu 22.04
  • Using Multiple Helios Cameras Simultaneously
  • Troubleshooting Network Timeouts (Linux)
  • How to create an Ubuntu docker image and container with Arena SDK

Didn't find your answer? Contact Us

  Opening Raw Images in ArenaView MP

How to choose an appropriate buffer count for the Arena SDK  

© 2026 LUCID Vision Labs Inc.
Looking to purchase our cameras?
Visit the LUCID Webstore at thinklucid.com
Manage Consent

We use cookies to process e-commerce purchases securely and to understand how our site is used. Your privacy matters — click ‘Accept’ to continue.

Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
View preferences
  • {title}
  • {title}
  • {title}
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
View preferences
  • {title}
  • {title}
  • {title}