Previously, with implicit batch mode, all the nodes inside the same graph uses the same max batch size. With this CL, users will be able to configure part of the nodes to use a different max batch size with an optional `_tftrt_op_max_batch_size` attribute on the node. Besides, all the static batch size will be treated in the same way as `_tftrt_op_max_batch_size` attribute annotation.
During segmentation, TF-TRT will avoid putting nodes with different annotated max batch size into the same cluster.
- For static engines, if any nodes inside the cluster are annotated with a customized max batch size, TF-TRT will use the customized max batch size to build a static engine. Otherwise, TF-TRT will use the default max batch size in convert parameters to build a static engine;
- For dynamic engines, TF-TRT will still use the batch size detected at runtime as the max batch size.
PiperOrigin-RevId: 338162727
Change-Id: I6aaf1157353676850dfb6c75b344149c55a8bc11