STT-tensorflow/tensorflow/python/keras/datasets
Devi Sandeep Endluri 7b1a726ec3 Ensure there are test samples for imdb dataset, when maxlen is low
With the current imdb.load_data(), the following results are seen
for different values of maxlen.

	load_data                (len(x_train), len(x_test))
------------------------------------------------------------
imdb.load_data(maxlen=50)    -->    (1035, 0)
imdb.load_data(maxlen=100)   -->    (5736, 0)
imdb.load_data(maxlen=200)   -->    (25000, 3913)
imdb.load_data()             -->    (25000, 25000)

Analysis: We can observe that when maxlen is low, the number
of test samples can be 0. This is because the train and test data is
concatenated, then the samples with length > maxlen are removed, and
the first 25,000 are considered as training data.

Fix: This can be fixed when data can be filtered first to remove the
ones with length > maxlen, and then concatenate to process further.
The following are the results after the fix.

     fixed load_data              (len(x_train), len(x_test))
------------------------------------------------------------
imdb.load_data(maxlen=50)    -->    (477, 558)
imdb.load_data(maxlen=100)   -->    (2773, 2963)
imdb.load_data(maxlen=200)   -->    (14244, 14669)
imdb.load_data()             -->    (25000, 25000)
2020-06-19 07:04:19 -05:00
..
BUILD
__init__.py
boston_housing.py Prevent Keras dataset loading from affecting the global RNG seed. 2020-06-15 13:07:57 -07:00
cifar.py
cifar10.py
cifar100.py
fashion_mnist.py
imdb.py Ensure there are test samples for imdb dataset, when maxlen is low 2020-06-19 07:04:19 -05:00
mnist.py
reuters.py Prevent Keras dataset loading from affecting the global RNG seed. 2020-06-15 13:07:57 -07:00