airpack.deploy.trt_utils¶
Utility code for performing inference using the TensorRT framework with settings optimized for AIR-T hardware.
Module Contents¶
-
airpack.deploy.trt_utils.make_cuda_context(gpu_index=0)¶ Initializes a CUDA context for use with the selected GPU and makes it active.
This context is created with a set of flags that will allow us to use device mapped (pinned) memory that supports zero-copy operations on the AIR-T.
- Parameters
gpu_index (int) – Which GPU in the system to use, defaults to the first GPU (index 0)
- Return type
pycuda.driver.Context
-
class
airpack.deploy.trt_utils.MappedBuffer(num_elems, dtype)¶ A device-mapped memory buffer for sharing data between CPU and GPU.
Once created, the
hostfield can be used to access the memory from CPU as anumpy.ndarray, and thedevicefield can be used to access the memory from the GPU.Example usage:
# Create a buffer of 16 single-precision floats buffer = MappedBuffer(num_elems=16, dtype=numpy.float32) # Zero the buffer by writing to it on CPU buffer.host[:] = 0.0 # Pass the device pointer to an API that works with GPU buffers func_that_uses_gpu_buffer(buffer.device)
Note
Device-mapped memory is meant for Jetson embedded GPUs like the one found on the AIR-T, where both the host and device pointers refer to the same physical memory. Using this type of memory buffer on desktop GPUs will be very slow.
- Parameters
num_elems (int) – Number of elements in the created buffer
dtype (numpy.dtype) – Data type of an element (e.g.,
numpy.float32ornumpy.int16)
- Variables
host (numpy.ndarray) – Access to the buffer from the CPU
device (CUdeviceptr) – Access to the buffer from the GPU