STT-tensorflow

History

Devi Sandeep Endluri 7b1a726ec3 Ensure there are test samples for imdb dataset, when maxlen is low With the current imdb.load_data(), the following results are seen for different values of maxlen. load_data (len(x_train), len(x_test)) ------------------------------------------------------------ imdb.load_data(maxlen=50) --> (1035, 0) imdb.load_data(maxlen=100) --> (5736, 0) imdb.load_data(maxlen=200) --> (25000, 3913) imdb.load_data() --> (25000, 25000) Analysis: We can observe that when maxlen is low, the number of test samples can be 0. This is because the train and test data is concatenated, then the samples with length > maxlen are removed, and the first 25,000 are considered as training data. Fix: This can be fixed when data can be filtered first to remove the ones with length > maxlen, and then concatenate to process further. The following are the results after the fix. fixed load_data (len(x_train), len(x_test)) ------------------------------------------------------------ imdb.load_data(maxlen=50) --> (477, 558) imdb.load_data(maxlen=100) --> (2773, 2963) imdb.load_data(maxlen=200) --> (14244, 14669) imdb.load_data() --> (25000, 25000)		2020-06-19 07:04:19 -05:00
..
BUILD	…
__init__.py	…
boston_housing.py	Prevent Keras dataset loading from affecting the global RNG seed.	2020-06-15 13:07:57 -07:00
cifar.py	…
cifar10.py	…
cifar100.py	…
fashion_mnist.py	…
imdb.py	Ensure there are test samples for imdb dataset, when maxlen is low	2020-06-19 07:04:19 -05:00
mnist.py	…
reuters.py	Prevent Keras dataset loading from affecting the global RNG seed.	2020-06-15 13:07:57 -07:00