STT/data
Josh Meyer 1c7539e9c9
Add missing space to russian sample alphabet
2021-07-27 05:40:20 -04:00
..
lm Run pre-commit hooks on all files 2021-05-18 13:45:52 +02:00
smoke_test Add missing space to russian sample alphabet 2021-07-27 05:40:20 -04:00
ted Merge of pull requests #49, #50, and #52. Fixes issues #2, #4, #11, #12, #46, #47, and #48 2016-10-13 15:15:39 -04:00
README.rst Run pre-commit hooks on all files 2021-05-18 13:45:52 +02:00
alphabet.txt Support custom alphabet mappings (Fixes #692) (#797) 2017-08-31 11:51:15 +02:00

README.rst

Language-Specific Data
======================

This directory contains language-specific data files. Most importantly, you will find here:

1. A list of unique characters for the target language (e.g. English) in ``data/alphabet.txt``. After installing the training code, you can check ``python -m coqui_stt_training.util.check_characters --help`` for a tool that creates an alphabet file from a list of training CSV files.

2. A script used to generate a binary n-gram language model: ``data/lm/generate_lm.py``.

For more information on how to build these resources from scratch, see the ``External scorer scripts`` section on `stt.readthedocs.io <https://stt.readthedocs.io/>`_.