airpack.deploy.trt_utils

Utility code for performing inference using the TensorRT framework with settings optimized for AIR-T hardware.

Module Contents

airpack.deploy.trt_utils.make_cuda_context(gpu_index=0)

Initializes a CUDA context for use with the selected GPU and makes it active.

This context is created with a set of flags that will allow us to use device mapped (pinned) memory that supports zero-copy operations on the AIR-T.

Parameters

gpu_index (int) – Which GPU in the system to use, defaults to the first GPU (index 0)

Return type

pycuda.driver.Context

class airpack.deploy.trt_utils.MappedBuffer(num_elems, dtype)

A device-mapped memory buffer for sharing data between CPU and GPU.

Once created, the host field can be used to access the memory from CPU as a numpy.ndarray, and the device field can be used to access the memory from the GPU.

Example usage:

# Create a buffer of 16 single-precision floats
buffer = MappedBuffer(num_elems=16, dtype=numpy.float32)
# Zero the buffer by writing to it on CPU
buffer.host[:] = 0.0
# Pass the device pointer to an API that works with GPU buffers
func_that_uses_gpu_buffer(buffer.device)

Note

Device-mapped memory is meant for Jetson embedded GPUs like the one found on the AIR-T, where both the host and device pointers refer to the same physical memory. Using this type of memory buffer on desktop GPUs will be very slow.

Parameters
  • num_elems (int) – Number of elements in the created buffer

  • dtype (numpy.dtype) – Data type of an element (e.g., numpy.float32 or numpy.int16)

Variables
  • host (numpy.ndarray) – Access to the buffer from the CPU

  • device (CUdeviceptr) – Access to the buffer from the GPU