Querying blocks_per_core_limit requires an active CUDA context. We'd
like to avoid this requirement for DeviceDescription and keep it as
stateless as possible. Thereby allowing querying the device without
allocating any memory.
Currently blocks_per_core_limit is not being used anywhere. However, if
we'd like to add it back, we can do so with a dedicated function.
PiperOrigin-RevId: 244747460
Please approve this CL. It will be submitted automatically, and its GitHub pull request will be marked as merged.
Imported from GitHub PR #25011
New PR to continue the efforts started by @deven-amd in #20709 / #22669 / #24156.
This PR aims to refactor StreamExecutor GPU interfaces so it can be shared among CUDA and ROCm. The PR would be the first part of a series of PRs.
Based on @timshen91 's inputs, I've refactored logic in #214156 so :
- only contains changes in stream_executor/....
- does not remove any stream_executor/cuda/*.h, so that things outside of stream_executor don't break. All the types and functions in the namespace cuda now alias to namespace gpu counterparts. For example, namespace cuda { using CUDADriver = gpu::GpuDriver; }.
- all stream_executor/gpu/BUILD targets should be only visible to //third_party/tensorflow/stream_executor:__subpackages__.
- target stream_executor/gpu:X should be only used by stream_executor/cuda:cuda_X or stream_executor/rocm:rocm_X, not cuda_Y. For example, cuda:cuda_platform should depend on cuda:cuda_driver, not gpu:gpu_driver.
Copybara import of the project:
- 267affbb73df9164baf4e62142fe7201e6a305ee [ROCm][CUDA] StreamExecutor logic for ROCm / CUDA platform by Wen-Heng (Jack) Chung <whchung@gmail.com>
- 04fac5bf358059bdb2cd4a3e092e52dc982ea7b0 Merge 267affbb73df9164baf4e62142fe7201e6a305ee into 5f8ea... by Wen-Heng (Jack) Chung <whchung@gmail.com>
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/tensorflow/pull/25011 from ROCmSoftwarePlatform:google-upstream-pr-stream-executor-alt 267affbb73df9164baf4e62142fe7201e6a305ee
PiperOrigin-RevId: 231250990
-Maintain functionality, just move CalculateOccupancy() and CompareOccupancy() methods from device_description to cuda_gpu_executor
-Remove CUDA requirement in general class device_description
-Replace references to the UnqueryableDeviceParams struct with calls to CUDA's built-in occupancy calculation functions
-Update calls to the occupancy checking functions with the new changes
-Changes should provide more long-term reliability and will remove the need to manually update hardcoded data values for new GPU architectures
Step 1 of re-namespace'ing StreamExecutor into ::stream_executor.
This moves everything inside of stream_executor/..., and leaves a
namespace alias into ::perftools::gputools. The next steps will clean
up users to use the new namespace.
This is mostly a mechanical change, but it also includes a bunch of
non-mechanical changes that ideally would be split out into separate
patches. Unfortunately they all sort of need to be shoved in here for
various reasons:
- forward declarations need to be in the same namespace as the actual
types, so we need to change all forward declarations of
StreamExecutor types in this one patch.
- Uses of these forward declarations need to be changed to the new
namespace (or otherwise we need to add a namespace alias to the
relevant header, but this is pretty ugly).
- Various initialization code needs to live in StreamExecutor's "real"
namespace, so all this needs to be changed.
PiperOrigin-RevId: 193256128
Changes:
* error message that refers to removed `DefaultSession` method.
* -Wnull-conversion warnings
* the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set.
* typo in tutorial data download progress message.
* a typo ("however their installing"=>"however installing").
* typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website.
* a typo ("subtact"=>"subtract").
* protobuf examples in comments in tensorflow::Example.proto.
* formula formatting in MNIST beginner tutorial
* negative fraction-of-queue-full stats
* protobuf inclusion path so that Android demo will build under Blaze.
* small typo (moderatly > moderately)
* Session.run() to check that tensor arguments come from the session's graph.
* another six import
* seq2seq typo in bazel command
Base CL: 108349164