Add notebook for CV

2021-09-14 11:50:35 -04:00 · 2021-09-14 11:50:35 -04:00 · 7085fd3ed3
commit 7085fd3ed3
parent ef8825f5f6
4 changed files with 263 additions and 2 deletions
--- a/notebooks/README.md
+++ b/notebooks/README.md
@ -1,4 +1,5 @@
 # Python Notebooks for 🐸 STT

-1. Train a new Speech-to-Text model from scratch [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/coqui-ai/STT/blob/main/notebooks/train-your-first-coqui-STT-model.ipynb)
-2. Transfer learning (English --> Russian) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/coqui-ai/STT/blob/main/notebooks/easy-transfer-learning.ipynb)
+1. Train a new Speech-to-Text model from scratch [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/coqui-ai/STT/blob/main/notebooks/train_your_first_coqui-STT_model.ipynb)
+2. Transfer learning (English --> Russian) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/coqui-ai/STT/blob/main/notebooks/easy_transfer_learning.ipynb)
+2. Train a model with Common Voice data [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/coqui-ai/STT/blob/main/notebooks/train_with_common_voice.ipynb)
--- a/notebooks/easy_transfer_learning.ipynb
+++ b/notebooks/easy_transfer_learning.ipynb
--- a/notebooks/train_with_common_voice.ipynb
+++ b/notebooks/train_with_common_voice.ipynb
@ -0,0 +1,260 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 5,
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.8.5"
+    },
+    "colab": {
+      "name": "train-with-common-voice-data.ipynb",
+      "private_outputs": true,
+      "provenance": [],
+      "collapsed_sections": []
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "f79d99ef"
+      },
+      "source": [
+        "# Train a 🐸 STT model with Common Voice data 💫\n",
+        "\n",
+        "👋 Hello and welcome to Coqui (🐸) STT \n",
+        "\n",
+        "The goal of this notebook is to show you a **typical workflow** for **training** and **testing** an STT model with 🐸 and data from Common Voice.\n",
+        "\n",
+        "In this notebook, we will:\n",
+        "\n",
+        "1. Download Common Voice data (pre-formatted for 🐸 STT)\n",
+        "2. Configure the training and testing runs\n",
+        "3. Train a new model\n",
+        "4. Test the model and display its performance\n",
+        "\n",
+        "So, let's jump right in!\n",
+        "\n",
+        "*PS - If you just want a working, off-the-shelf model, check out the [🐸 Model Zoo](https://www.coqui.ai/models)*"
+      ],
+      "id": "f79d99ef"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "fa2aec78"
+      },
+      "source": [
+        "## Install Coqui STT\n",
+        "! pip install -U pip\n",
+        "! pip install coqui_stt_training\n",
+        "## Install opus tools\n",
+        "! apt-get install libopusfile0 libopus-dev libopusfile-dev"
+      ],
+      "id": "fa2aec78",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "be5fe49c"
+      },
+      "source": [
+        "## ✅ Download & format sample data for English\n",
+        "\n",
+        "**First things first**: we need some data.\n",
+        "\n",
+        "We're training a Speech-to-Text model, so we need some _speech_ and we need some _text_. Specificially, we want _transcribed speech_. Let's download some audio and transcripts.\n",
+        "\n",
+        "🐸 STT expects to find information about your data in a CSV file, where each line contains:\n",
+        "\n",
+        "1. the **path** to an audio file\n",
+        "2. the **size** of that audio file\n",
+        "3. the **transcript** of that audio file.\n",
+        "\n",
+        "To focus on model training, we formatted the Common Voice data for you already, and you will find CSV files for `{train,test,dev}.csv` in the data directory.\n",
+        "\n",
+        "Let's train a speech-to-text model 😊\n"
+      ],
+      "id": "be5fe49c"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "scrolled": true,
+        "id": "53945462"
+      },
+      "source": [
+        "### Download pre-formatted Common Voice data\n",
+        "! wget https://coqui-ai-public-data.s3.amazonaws.com/cv/7.0/kk-data.tar\n",
+        "! tar -xf kk-data.tar"
+      ],
+      "id": "53945462",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "96e8b708"
+      },
+      "source": [
+        "### 👀 Take a look at the data"
+      ],
+      "id": "96e8b708"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "fa2aec77"
+      },
+      "source": [
+        "! ls kk-data\n",
+        "! wc -l kk-data/*.csv"
+      ],
+      "id": "fa2aec77",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "d9dfac21"
+      },
+      "source": [
+        "## ✅ Configure & set hyperparameters\n",
+        "\n",
+        "Coqui STT comes with a long list of hyperparameters you can tweak. We've set default values, but you will often want to set your own. You can use `initialize_globals_from_args()` to do this. \n",
+        "\n",
+        "You must **always** configure the paths to your data, and you must **always** configure your alphabet. Additionally, here we show how you can specify the size of hidden layers (`n_hidden`), the number of epochs to train for (`epochs`), and to initialize a new model from scratch (`load_train=\"init\"`).\n",
+        "\n",
+        "If you're training on a GPU, you can uncomment the (larger) training batch sizes for faster training."
+      ],
+      "id": "d9dfac21"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "d264fdec"
+      },
+      "source": [
+        "from coqui_stt_training.util.config import initialize_globals_from_args\n",
+        "\n",
+        "initialize_globals_from_args(\n",
+        "    train_files=[\"kk-data/train.csv\"],\n",
+        "    dev_files=[\"kk-data/dev.csv\"],\n",
+        "    test_files=[\"kk-data/test.csv\"],\n",
+        "    load_train=\"init\",\n",
+        "    n_hidden=200,\n",
+        "    epochs=1,\n",
+        "    beam_width=1,\n",
+        "    #train_batch_size=128,\n",
+        "    #dev_batch_size=128,\n",
+        "    #test_batch_size=128,\n",
+        ")"
+      ],
+      "id": "d264fdec",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "799c1425"
+      },
+      "source": [
+        "### 👀 View all config settings"
+      ],
+      "id": "799c1425"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "03b33d2b"
+      },
+      "source": [
+        "from coqui_stt_training.util.config import Config\n",
+        "\n",
+        "print(Config.to_json())"
+      ],
+      "id": "03b33d2b",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ae82fd75"
+      },
+      "source": [
+        "## ✅ Train a new model\n",
+        "\n",
+        "Let's kick off a training run 🚀🚀🚀 (using the configure you set above).\n",
+        "\n",
+        "This notebook should work on either a GPU or a CPU. However, in case you're running this on _multiple_ GPUs we want to only use one, because the sample dataset (one audio file) is too small to split across multiple GPUs."
+      ],
+      "id": "ae82fd75"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "scrolled": true,
+        "id": "550a504e"
+      },
+      "source": [
+        "from coqui_stt_training.train import train\n",
+        "\n",
+        "train()"
+      ],
+      "id": "550a504e",
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "9f6dc959"
+      },
+      "source": [
+        "## ✅ Test the model\n",
+        "\n",
+        "We made it! 🙌\n",
+        "\n",
+        "Let's kick off the testing run, which displays performance metrics.\n",
+        "\n",
+        "The settings we used here are for demonstration purposes, so you don't want to deploy this model into production. In this notebook we're focusing on the workflow itself, so it's forgivable 😇\n",
+        "\n",
+        "You can still train a more State-of-the-Art model by finding better hyperparameters, so go for it 💪"
+      ],
+      "id": "9f6dc959"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "dd42bc7a"
+      },
+      "source": [
+        "from coqui_stt_training.evaluate import test\n",
+        "\n",
+        "test()"
+      ],
+      "id": "dd42bc7a",
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
--- a/notebooks/train_your_first_coqui_STT_model.ipynb
+++ b/notebooks/train_your_first_coqui_STT_model.ipynb