The intermediate code is a result of separate compilation and linking, and removing it reduces TF's GPU wheel size.
PiperOrigin-RevId: 317081343
Change-Id: I603477b4499344aeec653765be78de11f392eac6
We currently use the following setup to select which compute architectures to compile for:
- ./configure allows specifying a set of CUDA compute architectures to compile for, e.g. '5.2,6.0'.
- .tf_configure.bazelrc maps this to an environment variable (TF_CUDA_COMPUTE_CAPABILITIES=5.2,6.0)
- cuda_configure.bzl turns this into compiler flags (copts) for clang, which the crosstool maps to nvcc if needed.
- The kernels are always compiled to both the virtual (ptx) and the real (sass) architecture.
This change adds support for specifying just real (sm_xy) or both virtual and real (compute_xy) compute architectures in TF_CUDA_COMPUTE_CAPABILITIES.
./configure is left unchanged, the old 'x.y' strings are mapped to 'compute_xy' in cuda_configure.bzl.
PiperOrigin-RevId: 313359468
Change-Id: I96c5b8b0a02b2ce62df27df7cc5272ddd42217aa
`-no-as-needed` linker flag is position sensitive (it's only effecting
following -l flags), therefore we need to move it before libraries to
link.
This change uncovered that nccl doesn't properly declare it's dependency
on `-lrt`, which is fixed. I suspect this started to be a problem in
f819114a2d.
This change also uncovered that some tests don't need to depend on nccl.
While `-no-as-needed` wasn't taking effect, nccl was just left out as
not needed.
Previously TF_CUDA_CONFIG_REPO would point to a pregenerated and checked in configuration. This changes has it point to a remote repository intead that generates the configuration during the build for the specific docker image. All supported configurations can be found in third_party/toolchains/remote_config/configs.bzl. Each tensorflow_rbe_config() macro creates a few remote repositories to which to point the TF_*_CONFIG_REPO environment variables to. The remote repository names are prefixed with the macro's name. For example, tensorflow_rbe_config(name = "ubuntu") will create @ubuntu_config_python, @ubuntu_config_cuda, @ubuntu_config_nccl, etc.
This change also introduces the platform_configure. All this rule does is create a remote repository with a single platform target for the tensorflow_rbe_config(). This will make the platforms defined in //third_party/toolchains/BUILD obsolete once remote config is fully rolled out.
PiperOrigin-RevId: 296065144
Change-Id: Ia54beeb771b28846444e27a2023f70abbd9f6ad5
Currently TF_*_CONFIG_REPO environment variables point to checked in preconfig packages. After migrating to remote config they will point to remote repositories. The "config_repo_label" function ensures both ways continue to work.
PiperOrigin-RevId: 295990961
Change-Id: I7637ff5298893d4ee77354e9b48f87b8c328c301
TF_NCCL_CONFIG_REPO follows the same pattern as used in the other *_configure rules. If set TF_NCCL_CONFIG_REPO should point to a package with pregenerated configuration files.
PiperOrigin-RevId: 295804343
Change-Id: Ie1a69732fc3a538ccc3ed158c8ae79bda280514a
repository_ctx.execute() does not support uploading of files from the source tree. I initially tried constructing a command that simply embeds the file's contents. However that did not work on Windows because the file is larger than 8192 characters. So my best idea was to compress it locally and embed the compressed contents in the command and to uncompress it remotely. This works but comes with the drawback that we need to compress it first. This can't be done as part of the repository_rule either because within one repository_rule every execute() runs either locally or remotely. I thus decided to check in the compressed version in the source tree. It's very much a temporary measure as I'll add the ability to upload files to a future version of Bazel.
PiperOrigin-RevId: 295787408
Change-Id: I1545dd86cdec7e4b20cba43d6a134ad6d1a08109
This change is in prepartion for rolling out remote config. It will
allow us to inject environment variables from repository rules as
well as from the shell enviroment.
PiperOrigin-RevId: 295782466
Change-Id: I1eb61fca3556473e94f2f12c45ee5eb1fe51625b
Move get_cpu_value() to common.bzl and use it from cuda_configure and rocm_configure
PiperOrigin-RevId: 293807189
Change-Id: I2eb0ef0ab27a64060a99985bcab9ae4706f57fc5
Use a single python script (third_party/gpus/find_cuda_config.py) from configure.py and the different *_configure.bzl scripts to find the different CUDA library and header paths based on a set of environment variables.
PiperOrigin-RevId: 243669844
We currently have two ways to disable NCCL support:
A) leave TF_NCCL_VERSION env variable undefined
B) bazel flag '--config=nonccl' or '--define=no_nccl_support=true'
After this change A) will build NCCL from source instead.
Add license to other binary targets, now that we ship NCCL with them.
PiperOrigin-RevId: 227342886
This previously resulted in undefined symbols because we would link the RDC code from the .pic.a files with the generated host code from the .a files. The two use different symbol names for kernel auto-registration.
The change effectively enforces that we link the host code from .pic.a.
PiperOrigin-RevId: 219918474
Note to users manually patching ptxas from a later toolkit version:
Building NCCL requires the same version of ptxas and nvlink.
PiperOrigin-RevId: 215911973
Instead of symlinking the install dir, copy the two files we need.
Symlinking a system dir like /usr is generally problematic as it can quickly
lead to miscompiles for unrelated reasons. Furthermore, bazel will consider
it an error if /usr is linked in and contains a recursive symlink in
/usr/bin/X11 -> .
PiperOrigin-RevId: 211842260
The nccl_configure.bzl generates two different BUILD files based on the chose NCCL version. For NCCL 1, it aliases to the existing 'nccl_archive' http_repo on GitHub. For NCCL 2, it creates a target containing the NCCL 2 library and headers from the chosen install directory.
PiperOrigin-RevId: 191718007