Commit Graph

47 Commits

Author SHA1 Message Date
Christian Sigg
7f12fa50f1 Update NCCL to v2.7.3.
PiperOrigin-RevId: 317672859
Change-Id: Ice6b6ac0875f1b6f8daa6ad9d9539621b22e6666
2020-06-22 10:18:19 -07:00
Christian Sigg
0a541ad1cc Remove intermediate relocatable code stored in __nv_relfatbin sections, if objcopy is at least version 2.26 (which added support for --update-sections).
The intermediate code is a result of separate compilation and linking, and removing it reduces TF's GPU wheel size.

PiperOrigin-RevId: 317081343
Change-Id: I603477b4499344aeec653765be78de11f392eac6
2020-06-18 05:12:04 -07:00
TensorFlower Gardener
db76607503 Merge pull request from hlopko:fix_38205
PiperOrigin-RevId: 313411456
Change-Id: I1d57346eced1e20d77b9d0c941f30a79d61746c3
2020-05-27 10:12:54 -07:00
Christian Sigg
9b7b8f16f3 Support compiling for a separate set of virtual and real CUDA compute architectures.
We currently use the following setup to select which compute architectures to compile for:

- ./configure allows specifying a set of CUDA compute architectures to compile for, e.g. '5.2,6.0'.
- .tf_configure.bazelrc maps this to an environment variable (TF_CUDA_COMPUTE_CAPABILITIES=5.2,6.0)
- cuda_configure.bzl turns this into compiler flags (copts) for clang, which the crosstool maps to nvcc if needed.
- The kernels are always compiled to both the virtual (ptx) and the real (sass) architecture.

This change adds support for specifying just real (sm_xy) or both virtual and real (compute_xy) compute architectures in TF_CUDA_COMPUTE_CAPABILITIES.

./configure is left unchanged, the old 'x.y' strings are mapped to 'compute_xy' in cuda_configure.bzl.

PiperOrigin-RevId: 313359468
Change-Id: I96c5b8b0a02b2ce62df27df7cc5272ddd42217aa
2020-05-27 03:27:45 -07:00
Marcel Hlopko
bb4c751414 Move -no-as-needed to the top of the linking command line
`-no-as-needed` linker flag is position sensitive (it's only effecting
following -l flags), therefore we need to move it before libraries to
link.

This change uncovered that nccl doesn't properly declare it's dependency
on `-lrt`, which is fixed. I suspect this started to be a problem in
f819114a2d.

This change also uncovered that some tests don't need to depend on nccl.
While `-no-as-needed` wasn't taking effect, nccl was just left out as
not needed.
2020-05-25 12:15:22 +02:00
Adrian Kuegel
43bc7a79f0 Simplify nccl_configure.
It can make use of the newly added cuda_gpu_architectures() macro.

PiperOrigin-RevId: 312301158
Change-Id: Ifa5229831f8d17093a0649b64457ae9d97ba6737
2020-05-19 10:09:55 -07:00
A. Unique TensorFlower
f819114a2d Remove direct dependency on the static libcudart; it is now linked dynamically
via the stub everywhere.

PiperOrigin-RevId: 301974345
Change-Id: I041786954d8aaa22bf76fdeeab48b08fbe7c2ec0
2020-03-20 00:16:43 -07:00
Jakob Buchgraber
09fe958fee Enable Remote Config for ROCM and CUDA RBE pre- and postsubmits
Previously TF_CUDA_CONFIG_REPO would point to a pregenerated and checked in configuration. This changes has it point to a remote repository intead that generates the configuration during the build for the specific docker image. All supported configurations can be found in third_party/toolchains/remote_config/configs.bzl. Each tensorflow_rbe_config() macro creates a few remote repositories to which to point the TF_*_CONFIG_REPO environment variables to. The remote repository names are prefixed with the macro's name. For example, tensorflow_rbe_config(name = "ubuntu") will create @ubuntu_config_python, @ubuntu_config_cuda, @ubuntu_config_nccl, etc.

This change also introduces the platform_configure. All this rule does is create a remote repository with a single platform target for the tensorflow_rbe_config(). This will make the platforms defined in //third_party/toolchains/BUILD obsolete once remote config is fully rolled out.

PiperOrigin-RevId: 296065144
Change-Id: Ia54beeb771b28846444e27a2023f70abbd9f6ad5
2020-02-19 15:05:06 -08:00
Jakob Buchgraber
478ea62407 Support remote repositories in TF_*_CONFIG_REPO environment variables
Currently TF_*_CONFIG_REPO environment variables point to checked in preconfig packages. After migrating to remote config they will point to remote repositories. The "config_repo_label" function ensures both ways continue to work.

PiperOrigin-RevId: 295990961
Change-Id: I7637ff5298893d4ee77354e9b48f87b8c328c301
2020-02-19 09:55:35 -08:00
Jakob Buchgraber
ecb8befb32 nccl_configure: introduce environment variable TF_NCCL_CONFIG_REPO
TF_NCCL_CONFIG_REPO follows the same pattern as used in the other *_configure rules. If set TF_NCCL_CONFIG_REPO should point to a package with pregenerated configuration files.

PiperOrigin-RevId: 295804343
Change-Id: Ie1a69732fc3a538ccc3ed158c8ae79bda280514a
2020-02-18 13:26:40 -08:00
Jakob Buchgraber
b7796f3c85 cuda_configure: make find_cuda_config() compatible with remote execution
repository_ctx.execute() does not support uploading of files from the source tree. I initially tried constructing a command that simply embeds the file's contents. However that did not work on Windows because the file is larger than 8192 characters. So my best idea was to compress it locally and embed the compressed contents in the command and to uncompress it remotely. This works but comes with the drawback that we need to compress it first. This can't be done as part of the repository_rule either because within one repository_rule every execute() runs either locally or remotely. I thus decided to check in the compressed version in the source tree. It's very much a temporary measure as I'll add the ability to upload files to a future version of Bazel.

PiperOrigin-RevId: 295787408
Change-Id: I1545dd86cdec7e4b20cba43d6a134ad6d1a08109
2020-02-18 12:24:05 -08:00
Jakob Buchgraber
f60fc7a072 remote config: replace all uses of os.environ by get_host_environ
This change is in prepartion for rolling out remote config. It will
allow us to inject environment variables from repository rules as
well as from the shell enviroment.

PiperOrigin-RevId: 295782466
Change-Id: I1eb61fca3556473e94f2f12c45ee5eb1fe51625b
2020-02-18 12:19:55 -08:00
Ayush Dubey
dd73c93fd0 Update NCCL to 2.5.7-1.
PiperOrigin-RevId: 294928191
Change-Id: Idb2b74e0571e06d514eb2b4ff9707586aac4d3aa
2020-02-13 09:10:39 -08:00
A. Unique TensorFlower
99ec314b06 rocm_configure: share code with cuda_configure
Move get_cpu_value() to common.bzl and use it from cuda_configure and rocm_configure

PiperOrigin-RevId: 293807189
Change-Id: I2eb0ef0ab27a64060a99985bcab9ae4706f57fc5
2020-02-07 07:09:09 -08:00
A. Unique TensorFlower
9c948d12b2 cuda_configure: fix quadratic runtime due to label resolution
See for details: 62bd353452

PiperOrigin-RevId: 293548911
Change-Id: I66d77bc606458e3b40d2f9fde88770bc8f15da44
2020-02-06 02:15:48 -08:00
Sami
67edc16326 Make nccl bindings compilable with cuda 10.2 2019-12-05 16:49:20 -08:00
Christian Sigg
9621ac4de0 Remove # -*- Python -*- from Starlark files.
PiperOrigin-RevId: 268048675
2019-09-09 12:27:18 -07:00
A. Unique TensorFlower
77659d9c93 Rolling forward CL 252574722 and 252855085 with a fix.
Update NCCL from 2.3.5 to 2.4.7.

PiperOrigin-RevId: 253953922
2019-06-19 01:55:40 -07:00
Amit Patankar
7aaf36666b Automated rollback of commit 5b5ada737a
PiperOrigin-RevId: 252917011
2019-06-12 16:50:07 -07:00
A. Unique TensorFlower
a979d46b88 Fix header inclusion check in NCCL.
PiperOrigin-RevId: 252855085
2019-06-12 11:21:07 -07:00
A. Unique TensorFlower
5b5ada737a Update NCCL from 2.3.5 to 2.4.7.
PiperOrigin-RevId: 252574722
2019-06-11 02:15:15 -07:00
A. Unique TensorFlower
3ab0f000e4 Add missing CUDA header dependency to NCCL.
PiperOrigin-RevId: 251696588
2019-06-05 13:38:45 -07:00
TensorFlower Gardener
222219497c Merge pull request from xinan-jiang:dev/fix_nccl_build
PiperOrigin-RevId: 247561933
2019-05-09 23:59:06 -07:00
Xinan Jiang
2112f2e9d6 fix nccl build bug 2019-05-05 15:45:19 +08:00
A. Unique TensorFlower
8e9ca6617b Change how ./configure and bazel finds CUDA paths.
Use a single python script (third_party/gpus/find_cuda_config.py) from configure.py and the different *_configure.bzl scripts to find the different CUDA library and header paths based on a set of environment variables.

PiperOrigin-RevId: 243669844
2019-04-15 12:55:59 -07:00
Anna R
3401519642 Internal change.
PiperOrigin-RevId: 234672242
2019-02-19 15:08:37 -08:00
A. Unique TensorFlower
a0b0a50328 Formatting BUILD and bzl files.
PiperOrigin-RevId: 234543907
2019-02-18 22:24:09 -08:00
A. Unique TensorFlower
ff91cd6910 Use CUDA tools from @local_config_cuda when building NCCL.
PiperOrigin-RevId: 230245994
2019-01-21 11:50:52 -08:00
A. Unique TensorFlower
f8c19045a1 Improve how the library to link is determined.
PiperOrigin-RevId: 229833513
2019-01-17 16:39:50 -08:00
A. Unique TensorFlower
a36fc1c38b Remove NCCL remote config and dummy config.
We currently have two ways to disable NCCL support:
A) leave TF_NCCL_VERSION env variable undefined
B) bazel flag '--config=nonccl' or '--define=no_nccl_support=true'

After this change A) will build NCCL from source instead.
Add license to other binary targets, now that we ship NCCL with them.

PiperOrigin-RevId: 227342886
2018-12-31 02:49:39 -08:00
A. Unique TensorFlower
cc3786bacb Roll-forward of CL 225051897 (adjust NCCL source so it can be built with clang).
PiperOrigin-RevId: 226293762
2018-12-20 00:47:53 -08:00
A. Unique TensorFlower
27d89c6b8e Automated rollback of commit d6a4685035
PiperOrigin-RevId: 225236744
2018-12-12 12:58:16 -08:00
A. Unique TensorFlower
d6a4685035 Improve build rules to compile NCCL from source, in particular for clang.
PiperOrigin-RevId: 225051897
2018-12-11 12:49:53 -08:00
Yifei Feng
f96c3fb234 Run buildifier on some files.
PiperOrigin-RevId: 222101413
2018-11-19 10:51:34 -08:00
A. Unique TensorFlower
cd4ae6610d Fix building NCCL from source in build modes that link non-PIC code (fastbuild, dbg).
This previously resulted in undefined symbols because we would link the RDC code from the .pic.a files with the generated host code from the .a files. The two use different symbol names for kernel auto-registration.

The change effectively enforces that we link the host code from .pic.a.

PiperOrigin-RevId: 219918474
2018-11-03 01:45:29 -07:00
A. Unique TensorFlower
456e6927c0 When building NCCL from source, do not rely on CUDA_TOOLKIT_PATH being defined to find bin2c.
PiperOrigin-RevId: 219698198
2018-11-01 14:26:09 -07:00
A. Unique TensorFlower
bd7de65f74 Fix NCCL OSS build for CUDA 10.
PiperOrigin-RevId: 218338355
2018-10-23 07:29:19 -07:00
A. Unique TensorFlower
8c072a519e Get rid of nvcc warning about including internal headers. See issue .
Use select for cpu-arch.

Auto-format file.

PiperOrigin-RevId: 218299288
2018-10-23 00:52:07 -07:00
Jason Furmanek
ac53355550 Make nccl2 bazel configuration platform independent 2018-10-10 16:34:16 +00:00
A. Unique TensorFlower
53faa313b7 Switch NCCL to build from open source (version 2.3.5-5) by default.
Note to users manually patching ptxas from a later toolkit version:
Building NCCL requires the same version of ptxas and nvlink.

PiperOrigin-RevId: 215911973
2018-10-05 08:53:12 -07:00
Jason Furmanek
09bf8eb99c white space removal 2018-09-26 05:26:54 +00:00
Jason Furmanek
7c2341501a Find NCCL2 debians in Tensorflow configure 2018-09-26 04:44:12 +00:00
A. Unique TensorFlower
b51f4c97e8 Fix nccl for remote builds.
Instead of symlinking the install dir, copy the two files we need.
Symlinking a system dir like /usr is generally problematic as it can quickly
lead to miscompiles for unrelated reasons. Furthermore, bazel will consider
it an error if /usr is linked in and contains a recursive symlink in
/usr/bin/X11 -> .

PiperOrigin-RevId: 211842260
2018-09-06 11:52:12 -07:00
Toby Boyd
e4c0dbcab8 internal change
PiperOrigin-RevId: 204832902
2018-07-16 17:10:20 -07:00
A. Unique TensorFlower
1fda7645d1 Add support for NCCL2. The configure script asks for what version of NCCL to use. The default is still NCCL 1 from GitHub. If the user chooses NCCL 2, it asks for the install directory.
The nccl_configure.bzl generates two different BUILD files based on the chose NCCL version. For NCCL 1, it aliases to the existing 'nccl_archive' http_repo on GitHub. For NCCL 2, it creates a target containing the NCCL 2 library and headers from the chosen install directory.

PiperOrigin-RevId: 191718007
2018-04-05 03:11:33 -07:00
Yifei Feng
0632564172 Remove deleted files. 2017-05-05 16:43:23 -07:00
A. Unique TensorFlower
8d393ea2fa Add cuda_clang build configuration that allows to use clang as a CUDA compiler.
Change: 151705528
2017-03-30 08:54:57 -07:00