STT-tensorflow/training at rename - STT-tensorflow - Gitea in the EmuNest

experiments/STT-tensorflow

History

Daniel Ellis c29e9f25e7 Handle garbage collection race condition.

An exception is being thrown when objects that use `CapturableResourceDeleter` are garbage collected at the end of a program's life.  This can happen in very normal circumstances, such as when using `saved_model_cli` to inspect a model.

The cause of the exception appears to be a race condition with garbage collection between `CapturableResourceDeleter` and `ScopedTFFunction`. Both define a custom finalizer (`__del__`); `CaptureableResourceDeleter`'s finalizer ultimately calls a concrete function which calls an `_EagerDefinedFunction` which attempts to load and execute a `ScopedTFFunction`.

In the case of multiple objects in a reference cycle all going unreachable during the same garbage collection cycle, we get no guaranteed ordering for which of the objects will be collected first. In the case of the exception, `ScopedTFFunction` is collected first and its underlying function is deleted. Later, `CapturableResourceDeleter` is called, which fails, since the function it's trying to call is gone.

PiperOrigin-RevId: 358292164
Change-Id: I9162d5de622f5c1ec9b2954647b9958a7d3d87b6

2021-02-18 17:00:03 -08:00

..

PY2 removal cleanup

2021-01-15 16:48:57 -08:00

PY2 removal cleanup

2021-01-15 16:48:57 -08:00

Handle garbage collection race condition.

2021-02-18 17:00:03 -08:00

__init__.py

…

adadelta_test.py

Use self.evaluate for global_variables_initializer

2020-07-01 11:07:20 -07:00

adadelta.py

…

adagrad_da_test.py

fix skip test for adagrad_da_test

2020-07-17 22:57:26 -07:00

adagrad_da.py

…

adagrad_test.py

Make several graph-only tests run in a graph explicitly instead of using the run_v1 decorators (to avoid skipping them).

2020-07-09 14:43:18 -07:00

adagrad.py

…

adam_test.py

Makes Optimizer._zeros_slot() copy XLA sharding from variable.

2021-02-02 08:57:05 -08:00

adam.py

Fix documentation compatibility tag formatting.

2020-05-07 16:46:21 -07:00

basic_loops_test.py

fixit for basic_loops_test

2020-07-20 14:01:18 -07:00

basic_loops.py

…

basic_session_run_hooks_test.py

Move away from deprecated asserts

2020-06-30 16:10:22 -07:00

basic_session_run_hooks.py

…

BUILD

The ftrl optimizer creates large constants for its slot variable initialization. Changing this to create a tensor instead to reduce peak memory usage when applying this to large variables.

2021-02-18 09:59:31 -08:00

checkpoint_management_test.py

Issue: Test case breaking on windows due to wrong path

2020-12-30 23:59:10 -05:00

checkpoint_management.py

Add an option to choose the I/O Device for saving and loading models for CheckpointManager.

2020-12-04 15:52:11 -08:00

checkpoint_ops_test.py

Update run_v1_only test with proper reasons.

2020-07-21 17:18:50 -07:00

checkpoint_ops.py

…

checkpoint_state.proto

…

checkpoint_utils_test.py

…

checkpoint_utils.py

Improve tf.train.list_variables API doc.

2020-07-08 16:10:45 -07:00

coordinator_test.py

Move away from deprecated asserts

2020-06-30 16:10:22 -07:00

coordinator.py

…

device_setter_test.py

…

device_setter.py

…

distribute.py

…

distribution_strategy_context.py

…

evaluation_test.py

Update evaluation_test to not rely on Keras metrics.

2020-06-11 10:57:06 -07:00

evaluation.py

Remove 'Z' because it's a local time

2021-01-14 17:29:14 -08:00

ftrl_test.py

Added beta parameter from FTRL paper to main optimizer class.

2020-08-06 13:01:42 -07:00

ftrl.py

The ftrl optimizer creates large constants for its slot variable initialization. Changing this to create a tensor instead to reduce peak memory usage when applying this to large variables.

2021-02-18 09:59:31 -08:00

gen_training_ops.py

Add BUILD rules for python/training and python/training/experimental

2020-08-31 09:53:04 -07:00

gradient_descent_test.py

Update v1 only training/gradient_descent_test with graph scope.

2020-07-09 16:34:28 -07:00

gradient_descent.py

…

input_test.py

Update run_deprecated_v1 tests with graph scope.

2020-07-24 15:54:03 -07:00

input.py

…

learning_rate_decay.py

Move the learning_rate_decay code to keras to break the reverse dependency.

2020-03-30 11:06:36 -07:00

localhost_cluster_performance_test.py

…

momentum_test.py

Merge pull request #41522 from redwrasse:momentum-test-loss-var

2020-08-24 08:04:57 -07:00

momentum.py

…

monitored_session_test.py

Move away from deprecated asserts

2020-06-30 16:10:22 -07:00

monitored_session.py

fix typos in python directory

2020-10-29 16:21:24 +03:00

moving_averages_test.py

Makes slot_creator support copy sharding from the primary variable.

2021-01-15 16:17:29 -08:00

moving_averages.py

Getting initialized value from the same device as the original variable,

2021-02-08 16:35:52 -08:00

optimizer_test.py

[retry] Use same var key in _create_slots/get_slot in V1 optimizer

2021-01-28 10:48:00 -08:00

optimizer.py

Makes Optimizer._zeros_slot() copy XLA sharding from variable.

2021-02-02 08:57:05 -08:00

proximal_adagrad_test.py

Remove run_v1_only annotation from proximal_adagrad_test, saved_model_experimental_test, legacy base layer test

2020-07-09 18:46:33 -07:00

proximal_adagrad.py

…

proximal_gradient_descent_test.py

Remove run_deprecated_v1 in proximal_gradient_descent_test.

2020-07-14 21:53:13 -07:00

proximal_gradient_descent.py

…

py_checkpoint_reader.py

add BUILD file for python/util and refactor python/BUILD

2020-12-15 11:43:18 -08:00

quantize_training_test.py

Update v1 only test with proper reason.

2020-06-04 12:39:12 -07:00

quantize_training_wrapper.cc

[Build cleanup] Split "core_cpu_impl" into fine-grained targets (4/n).

2020-04-28 09:53:59 -07:00

quantize_training.py

…

queue_runner_impl.py

…

queue_runner_test.py

Move away from deprecated asserts

2020-06-30 16:10:22 -07:00

queue_runner.py

…

rmsprop_test.py

…

rmsprop.py

…

saver_large_partitioned_variable_test.py

Use self.evaluate for global_variables_initializer

2020-07-01 11:07:20 -07:00

saver_large_variable_test.py

Move away from deprecated asserts

2020-06-30 16:10:22 -07:00

saver_test_utils.py

…

saver_test.py

Update run_v1_only tests in saver_test with proper reasons.

2020-07-21 17:10:15 -07:00

saver.py

Issue:#46004

2020-12-28 22:45:45 -05:00

server_lib_multiple_containers_test.py

…

server_lib_same_variables_clear_container_test.py

fixit for server_lib container test.

2020-08-05 12:46:53 -07:00

server_lib_same_variables_clear_test.py

…

server_lib_same_variables_no_clear_test.py

Update run_v1_only test with proper reason.

2020-07-20 10:02:09 -07:00

server_lib_sparse_job_test.py

…

server_lib_test.py

Garbage collect old WorkerSession when the restarted master task create new one.

2020-08-03 11:31:26 -07:00

server_lib.py

Implement __bool__ instead of __nonzero__

2020-07-11 00:50:10 +02:00

session_manager_test.py

Update session_manager_test wrt run_v1_only annotation.

2020-07-14 23:16:31 -07:00

session_manager.py

Use __slots__ for small classes

2020-06-28 18:41:22 +02:00

session_run_hook.py

…

slot_creator_test.py

Makes slot_creator support copy sharding from the primary variable.

2021-01-15 16:17:29 -08:00

slot_creator.py

Makes slot_creator support copy sharding from the primary variable.

2021-01-15 16:17:29 -08:00

summary_io.py

…

supervisor_test.py

Move away from deprecated asserts

2020-06-30 16:10:22 -07:00

supervisor.py

…

sync_replicas_optimizer_test.py

Update run_v1_only tests with proper reasons.

2020-07-21 10:17:50 -07:00

sync_replicas_optimizer.py

Remove "dummy queue" which is never closed. This is causing workers to hang when a training job stops early (e.g., using tuner along with tf.estimator.parameterized_train_and_evaluate or with TFX).

2021-02-10 16:41:42 -08:00

tensorboard_logging_test.py

…

tensorboard_logging.py

Add a recursive import before running doctest.

2020-06-01 21:13:25 -07:00

training_ops_test.py

Avoid "too many resources requested for launch" errors for SparseApplyFtrl

2021-02-01 11:34:11 -08:00

training_ops.py

Add BUILD rules for python/training and python/training/experimental

2020-08-31 09:53:04 -07:00

training_util_test.py

Fix GlobalStepTests to specify the collection

2020-07-17 15:35:49 -07:00

training_util.py

…

training.py

…

warm_starting_util_test.py

Remove unnecessary eval() calls

2020-06-30 17:18:32 -07:00

warm_starting_util.py

fix: Resolving incorrect value change error

2020-04-07 20:06:23 +09:00