Merge pull request #43930 from kopytjuk:timeseries-many-to-many-example

PiperOrigin-RevId: 347936838 Change-Id: I3e4740e3435a403a9ec85baaa9a1f1890d29d0f4
2020-12-16 18:37:56 -08:00 · 2020-12-16 18:37:56 -08:00 · dbaf939937
commit dbaf939937
parent 9d54a79234 3e9ac52680
1 changed files with 51 additions and 25 deletions
--- a/tensorflow/python/keras/preprocessing/timeseries.py
+++ b/tensorflow/python/keras/preprocessing/timeseries.py
@ -87,34 +87,60 @@ def timeseries_dataset_from_array(
    `shuffle=False`, the dataset will yield batches of sequences
    composed of the following indices:

-  ```
-  First sequence:  [0  2  4  6  8 10 12 14 16 18]
-  Second sequence: [3  5  7  9 11 13 15 17 19 21]
-  Third sequence:  [6  8 10 12 14 16 18 20 22 24]
-  ...
-  Last sequence:   [78 80 82 84 86 88 90 92 94 96]
-  ```
+    ```
+    First sequence:  [0  2  4  6  8 10 12 14 16 18]
+    Second sequence: [3  5  7  9 11 13 15 17 19 21]
+    Third sequence:  [6  8 10 12 14 16 18 20 22 24]
+    ...
+    Last sequence:   [78 80 82 84 86 88 90 92 94 96]
+    ```

-  In this case the last 3 data points are discarded since no full sequence
-  can be generated to include them (the next sequence would have started
-  at index 81, and thus its last step would have gone over 99).
+    In this case the last 3 data points are discarded since no full sequence
+    can be generated to include them (the next sequence would have started
+    at index 81, and thus its last step would have gone over 99).

-  Example 2: temporal regression. Consider an array `data` of scalar
-  values, of shape `(steps,)`. To generate a dataset that uses the past 10
-  timesteps to predict the next timestep, you would use:
+  Example 2: temporal regression. 
+    Consider an array `data` of scalar values, of shape `(steps,)`. 
+    To generate a dataset that uses the past 10
+    timesteps to predict the next timestep, you would use:

-  ```python
-  input_data = data
-  offset = 10
-  targets = data[offset:]
-  dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
-      input_data, targets, sequence_length=offset)
-  for batch in dataset:
-    inputs, targets = batch
-    assert np.array_equal(inputs[0], data[:10])  # First sequence: steps [0-9]
-    assert np.array_equal(targets[0], data[10])  # Corresponding target: step 10
-    break
-  ```
+    ```python
+    input_data = data[:-10]
+    targets = data[10:]
+    dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
+        input_data, targets, sequence_length=10)
+    for batch in dataset:
+      inputs, targets = batch
+      assert np.array_equal(inputs[0], data[:10])  # First sequence: steps [0-9]
+      assert np.array_equal(targets[0], data[10])  # Corresponding target: step 10
+      break
+    ```
+
+  Example 3: temporal regression for many-to-many architectures.
+    Consider two arrays of scalar values `X` and `Y`,
+    both of shape `(100,)`. The resulting dataset should consist samples with 
+    20 timestamps each. The samples should not overlap.
+    To generate a dataset that uses the current timestamp 
+    to predict the corresponding target timestep, you would use:
+
+    ```python
+    X = np.arange(100)
+    Y = X*2
+
+    sample_length = 20
+    input_dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
+      X, None, sequence_length=sample_length, sequence_stride=sample_length)
+    target_dataset = tf.keras.preprocessing.timeseries_dataset_from_array(
+      Y, None, sequence_length=sample_length, sequence_stride=sample_length)
+
+    for batch in zip(input_dataset, target_dataset):
+      inputs, targets = batch
+      assert np.array_equal(inputs[0], X[:sample_length])
+
+      # second sample equals output timestamps 20-40
+      assert np.array_equal(targets[1], Y[sample_length:2*sample_length])
+      break
+    ```
  """
  if start_index and (start_index < 0 or start_index >= len(data)):
    raise ValueError('start_index must be higher than 0 and lower than the '