[XLA:GPU] Switch from specifying maxntid to reqntid.
maxntid specifies the max number of threads in a block, whereas reqntid says that we will use *exactly* this many threads in a block. This doesn't have any effect on the benchmarks I ran, but we might as well do it in case it helps ptxas make a better decision at some point on some GPU. At least it will prevent the next person to come along from doing this same investigation I just did. :) PiperOrigin-RevId: 177851116
This commit is contained in:
parent
36ac0f3eb6
commit
a1c29139cc
@ -123,10 +123,12 @@ void UpdateLaunchDimensions(const LaunchDimensions& launch_dims, Thunk* thunk,
|
||||
llvm::ConstantInt* threads_per_block_ir_value = llvm::ConstantInt::get(
|
||||
llvm::IntegerType::get(llvm_context, /*NumBits=*/32),
|
||||
launch_dims.threads_per_block());
|
||||
// Our launch bounds are exact, so we can specify them as reqntidx rather than
|
||||
// maxntidx.
|
||||
nvvm_annotations_node->addOperand(llvm::MDNode::get(
|
||||
llvm_context,
|
||||
{llvm::ConstantAsMetadata::get(ir_kernel),
|
||||
llvm::MDString::get(llvm_context, "maxntidx"),
|
||||
llvm::MDString::get(llvm_context, "reqntidx"),
|
||||
llvm::ConstantAsMetadata::get(threads_per_block_ir_value)}));
|
||||
}
|
||||
} // namespace
|
||||
|
Loading…
Reference in New Issue
Block a user