[XLA:GPU] Switch from specifying maxntid to reqntid.

maxntid specifies the max number of threads in a block, whereas reqntid
says that we will use *exactly* this many threads in a block.

This doesn't have any effect on the benchmarks I ran, but we might as
well do it in case it helps ptxas make a better decision at some point
on some GPU.  At least it will prevent the next person to come along
from doing this same investigation I just did.  :)

PiperOrigin-RevId: 177851116
This commit is contained in:
Justin Lebar 2017-12-04 12:28:48 -08:00 committed by TensorFlower Gardener
parent 36ac0f3eb6
commit a1c29139cc

View File

@ -123,10 +123,12 @@ void UpdateLaunchDimensions(const LaunchDimensions& launch_dims, Thunk* thunk,
llvm::ConstantInt* threads_per_block_ir_value = llvm::ConstantInt::get(
llvm::IntegerType::get(llvm_context, /*NumBits=*/32),
launch_dims.threads_per_block());
// Our launch bounds are exact, so we can specify them as reqntidx rather than
// maxntidx.
nvvm_annotations_node->addOperand(llvm::MDNode::get(
llvm_context,
{llvm::ConstantAsMetadata::get(ir_kernel),
llvm::MDString::get(llvm_context, "maxntidx"),
llvm::MDString::get(llvm_context, "reqntidx"),
llvm::ConstantAsMetadata::get(threads_per_block_ir_value)}));
}
} // namespace