STT-tensorflow/tensorflow/python/training
Daniel Ellis c29e9f25e7 Handle garbage collection race condition.
An exception is being thrown when objects that use `CapturableResourceDeleter` are garbage collected at the end of a program's life.  This can happen in very normal circumstances, such as when using `saved_model_cli` to inspect a model.

The cause of the exception appears to be a race condition with garbage collection between `CapturableResourceDeleter` and `ScopedTFFunction`. Both define a custom finalizer (`__del__`); `CaptureableResourceDeleter`'s finalizer ultimately calls a concrete function which calls an `_EagerDefinedFunction` which attempts to load and execute a `ScopedTFFunction`.

In the case of multiple objects in a reference cycle all going unreachable during the same garbage collection cycle, we get no guaranteed ordering for which of the objects will be collected first. In the case of the exception, `ScopedTFFunction` is collected first and its underlying function is deleted. Later, `CapturableResourceDeleter` is called, which fails, since the function it's trying to call is gone.

PiperOrigin-RevId: 358292164
Change-Id: I9162d5de622f5c1ec9b2954647b9958a7d3d87b6
2021-02-18 17:00:03 -08:00
..
experimental PY2 removal cleanup 2021-01-15 16:48:57 -08:00
saving PY2 removal cleanup 2021-01-15 16:48:57 -08:00
tracking Handle garbage collection race condition. 2021-02-18 17:00:03 -08:00
__init__.py
adadelta_test.py Use self.evaluate for global_variables_initializer 2020-07-01 11:07:20 -07:00
adadelta.py
adagrad_da_test.py fix skip test for adagrad_da_test 2020-07-17 22:57:26 -07:00
adagrad_da.py
adagrad_test.py Make several graph-only tests run in a graph explicitly instead of using the run_v1 decorators (to avoid skipping them). 2020-07-09 14:43:18 -07:00
adagrad.py
adam_test.py Makes Optimizer._zeros_slot() copy XLA sharding from variable. 2021-02-02 08:57:05 -08:00
adam.py Fix documentation compatibility tag formatting. 2020-05-07 16:46:21 -07:00
basic_loops_test.py fixit for basic_loops_test 2020-07-20 14:01:18 -07:00
basic_loops.py
basic_session_run_hooks_test.py Move away from deprecated asserts 2020-06-30 16:10:22 -07:00
basic_session_run_hooks.py
BUILD The ftrl optimizer creates large constants for its slot variable initialization. Changing this to create a tensor instead to reduce peak memory usage when applying this to large variables. 2021-02-18 09:59:31 -08:00
checkpoint_management_test.py Issue: Test case breaking on windows due to wrong path 2020-12-30 23:59:10 -05:00
checkpoint_management.py Add an option to choose the I/O Device for saving and loading models for CheckpointManager. 2020-12-04 15:52:11 -08:00
checkpoint_ops_test.py Update run_v1_only test with proper reasons. 2020-07-21 17:18:50 -07:00
checkpoint_ops.py
checkpoint_state.proto
checkpoint_utils_test.py
checkpoint_utils.py Improve tf.train.list_variables API doc. 2020-07-08 16:10:45 -07:00
coordinator_test.py Move away from deprecated asserts 2020-06-30 16:10:22 -07:00
coordinator.py
device_setter_test.py
device_setter.py
distribute.py
distribution_strategy_context.py
evaluation_test.py Update evaluation_test to not rely on Keras metrics. 2020-06-11 10:57:06 -07:00
evaluation.py Remove 'Z' because it's a local time 2021-01-14 17:29:14 -08:00
ftrl_test.py Added beta parameter from FTRL paper to main optimizer class. 2020-08-06 13:01:42 -07:00
ftrl.py The ftrl optimizer creates large constants for its slot variable initialization. Changing this to create a tensor instead to reduce peak memory usage when applying this to large variables. 2021-02-18 09:59:31 -08:00
gen_training_ops.py Add BUILD rules for python/training and python/training/experimental 2020-08-31 09:53:04 -07:00
gradient_descent_test.py Update v1 only training/gradient_descent_test with graph scope. 2020-07-09 16:34:28 -07:00
gradient_descent.py
input_test.py Update run_deprecated_v1 tests with graph scope. 2020-07-24 15:54:03 -07:00
input.py
learning_rate_decay.py Move the learning_rate_decay code to keras to break the reverse dependency. 2020-03-30 11:06:36 -07:00
localhost_cluster_performance_test.py
momentum_test.py Merge pull request #41522 from redwrasse:momentum-test-loss-var 2020-08-24 08:04:57 -07:00
momentum.py
monitored_session_test.py Move away from deprecated asserts 2020-06-30 16:10:22 -07:00
monitored_session.py fix typos in python directory 2020-10-29 16:21:24 +03:00
moving_averages_test.py Makes slot_creator support copy sharding from the primary variable. 2021-01-15 16:17:29 -08:00
moving_averages.py Getting initialized value from the same device as the original variable, 2021-02-08 16:35:52 -08:00
optimizer_test.py [retry] Use same var key in _create_slots/get_slot in V1 optimizer 2021-01-28 10:48:00 -08:00
optimizer.py Makes Optimizer._zeros_slot() copy XLA sharding from variable. 2021-02-02 08:57:05 -08:00
proximal_adagrad_test.py Remove run_v1_only annotation from proximal_adagrad_test, saved_model_experimental_test, legacy base layer test 2020-07-09 18:46:33 -07:00
proximal_adagrad.py
proximal_gradient_descent_test.py Remove run_deprecated_v1 in proximal_gradient_descent_test. 2020-07-14 21:53:13 -07:00
proximal_gradient_descent.py
py_checkpoint_reader.py add BUILD file for python/util and refactor python/BUILD 2020-12-15 11:43:18 -08:00
quantize_training_test.py Update v1 only test with proper reason. 2020-06-04 12:39:12 -07:00
quantize_training_wrapper.cc [Build cleanup] Split "core_cpu_impl" into fine-grained targets (4/n). 2020-04-28 09:53:59 -07:00
quantize_training.py
queue_runner_impl.py
queue_runner_test.py Move away from deprecated asserts 2020-06-30 16:10:22 -07:00
queue_runner.py
rmsprop_test.py
rmsprop.py
saver_large_partitioned_variable_test.py Use self.evaluate for global_variables_initializer 2020-07-01 11:07:20 -07:00
saver_large_variable_test.py Move away from deprecated asserts 2020-06-30 16:10:22 -07:00
saver_test_utils.py
saver_test.py Update run_v1_only tests in saver_test with proper reasons. 2020-07-21 17:10:15 -07:00
saver.py Issue:#46004 2020-12-28 22:45:45 -05:00
server_lib_multiple_containers_test.py
server_lib_same_variables_clear_container_test.py fixit for server_lib container test. 2020-08-05 12:46:53 -07:00
server_lib_same_variables_clear_test.py
server_lib_same_variables_no_clear_test.py Update run_v1_only test with proper reason. 2020-07-20 10:02:09 -07:00
server_lib_sparse_job_test.py
server_lib_test.py Garbage collect old WorkerSession when the restarted master task create new one. 2020-08-03 11:31:26 -07:00
server_lib.py Implement __bool__ instead of __nonzero__ 2020-07-11 00:50:10 +02:00
session_manager_test.py Update session_manager_test wrt run_v1_only annotation. 2020-07-14 23:16:31 -07:00
session_manager.py Use __slots__ for small classes 2020-06-28 18:41:22 +02:00
session_run_hook.py
slot_creator_test.py Makes slot_creator support copy sharding from the primary variable. 2021-01-15 16:17:29 -08:00
slot_creator.py Makes slot_creator support copy sharding from the primary variable. 2021-01-15 16:17:29 -08:00
summary_io.py
supervisor_test.py Move away from deprecated asserts 2020-06-30 16:10:22 -07:00
supervisor.py
sync_replicas_optimizer_test.py Update run_v1_only tests with proper reasons. 2020-07-21 10:17:50 -07:00
sync_replicas_optimizer.py Remove "dummy queue" which is never closed. This is causing workers to hang when a training job stops early (e.g., using tuner along with tf.estimator.parameterized_train_and_evaluate or with TFX). 2021-02-10 16:41:42 -08:00
tensorboard_logging_test.py
tensorboard_logging.py Add a recursive import before running doctest. 2020-06-01 21:13:25 -07:00
training_ops_test.py Avoid "too many resources requested for launch" errors for SparseApplyFtrl 2021-02-01 11:34:11 -08:00
training_ops.py Add BUILD rules for python/training and python/training/experimental 2020-08-31 09:53:04 -07:00
training_util_test.py Fix GlobalStepTests to specify the collection 2020-07-17 15:35:49 -07:00
training_util.py
training.py
warm_starting_util_test.py Remove unnecessary eval() calls 2020-06-30 17:18:32 -07:00
warm_starting_util.py fix: Resolving incorrect value change error 2020-04-07 20:06:23 +09:00