Bixia Zheng 14d708ab72 [TF:TRT] Handle out of GPU memory when creating TensorRT execution context.
Previously, we use ICudaEngine::createExecutionContext to create a TensorRT
execution context along with the GPU needed to execute the Cuda Engine. This
API doesn't handle out of GPU memory properly, instead propagates an exception.
This change uses ICudaEngine::createExecutionContextWithoutDeviceMemory to
create a TensorRT execution context without any GPU memory, and let TF-TRT
create the needed GPU memory. In order to keep track of such GPU memory, we
wrap the TensorRT execution context and the associated GPU memory in a new
class callsed ExecutionContext.

PiperOrigin-RevId: 351895192
Change-Id: Ie01f0241578fadba8fad25bd110f937fd47082c8
2021-01-14 16:08:51 -08:00
..
2019-06-17 08:40:57 -07:00