GitHub - jarednielsen/speech2phone: Semi-supervised machine transcription of spoken audio into phonemes (speech units).

speech2phone

TODO

Mark when you've finished them.

(Kyle) Preprocessor caching
(Kyle) Preprocessor returns categorical distribution
(Kyle) Embedding baseline
(Seong) Python scripts (not notebooks) that use grid search, mag and save the plots in /visualizations for the following models:
- Random Forest
- XGBoost
- Gaussian Discriminant Analysis
- Naive Bayes
- Logistic Regression
- Principal Component Analysis
- Support Vector Machine
- K-Nearest Neighbors
- K-Means
- Gaussian Mixture Model
(Jared) Semi-supervised learning scripts (not notebooks) with fully-connected layer and 1-D CNN
- Self-training
- Co-training
- Pi-model
- Label propagation
- Label gradient alignment
- Using your model against itself
(anyone)

Directory Structure

embedding/: Learned embeddings, applied after preprocessing. For example, PCA.
experiments/class/: mag experiments on classification (phoneme boundaries given to model).
experiments/seg_class/: mag experiments on segmentation and classification (phoneme boundaries produced by model).
models/: custom model classes we've built.
preprocessing/: Loads data from files, caches it, and returns NumPy arrays.
results/: Assorted images/ plots that are interesting and could be useful in the final report. For example, a PCA .png
temp_{jared, kyle, seong}/: The equivalent of branches. Put work-in-progress here, and bring it out into the main system when it's done.
visualizations/: Examples of how to plot a Mel spectrogram, etc.

Testing

Run pytest test_main.py.
Add additional tests there. We'll use a single test module for now. pytest uses simple assert statements.

Rules

The directory containing speech2phone must be on the environment variable PYTHONPATH.
To append it, run export PYTHONPATH="${PYTHONPATH}:/my/other/path".
For example, if I have Users/jarednielsen/Desktop/speech2phone, then I must have Users/jarednielsen/Desktop on my PYTHONPATH.
If that doesn't work because of conda, Add a .pth file to the directory $HOME/path/to/anaconda/lib/pythonX.X/site-packages. This can be named anything (it just must end with .pth). A .pth file is just a newline-separated listing of the full path-names of directories that will be added to your path on Python startup. For example, /anaconda3/envs/py36/lib/python3.6/site-packages/path.pth has the line /Users/jarednielsen/Desktop in it.
Use absolute imports everywhere. For example, import speech2phone or import speech2phone.preprocessing.
See speech2phone/__init__.py and speech2phone/preprocessing/__init__.py for examples of how to set up subpackages.
/preprocessing applies classic data processing methods (i.e. not learned) to the data, while /embedding applies learned methods. For example, Mel spectrogram stuff should be handled in /preprocessing.

boundary recognition

Approaches

Recurrent network
Merging (like piecewise linear regression) with the criterion over a metric using dynamic time-warping

`/embedding`

Options for embedding include:

spectrum
cepstrum
single linear layer (we could try this or just SGD)
more complex learned network
autoencoder
UMAP
t-SNE

These will all be specifiable by importing from the embedding module. The spectrum works pretty well as an embedding space, as we found by doing some PCA (see /visualizations/pca_embedding.png). I think we'll use it as a baseline.

Things to try (add ideas here)

trinemes
dynamic time-warping
reapply models to TIMIT to quantify results (quantified semi-supervised learning)
use Mel spectrogram but give some time dependence (80 freq x 10 time)
using the activations from the neural network to try to predict the speaker, and then consider the ethical implications (a la "voiceprint" technology)

jarednielsen / speech2phone Public

README.md

speech2phone

TODO

Directory Structure

Testing

Rules

boundary recognition

`/embedding`

Things to try (add ideas here)

About

Releases

Packages

Contributors 3

Languages

jarednielsen / speech2phone Public

License

Latest commit

Git stats

Files

README.md

speech2phone

TODO

Directory Structure

Testing

Rules

boundary recognition

/embedding

Things to try (add ideas here)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

`/embedding`

Packages