TiramisuSELD implements some speech event localization and detection architectures.
- Python 3.6+
- Tensorflow 2.2+:
pip install tensorflow
Install tensorflow: pip3 install tensorflow
or pip3 install tf-nightly
(for using tflite)
Install packages: python3 setup.py install
- To enable XLA, run
TF_XLA_FLAGS=--tf_xla_auto_jit=2 $python_train_script
Clean up: python3 setup.py clean --all
(this will remove /build
contents)
Example YAML Config Structure
speech_config: ...
model_config: ...
decoder_config: ...
learning_config:
augmentations: ...
dataset_config:
train_paths: ...
eval_paths: ...
test_paths: ...
tfrecords_dir: ...
optimizer_config: ...
running_config:
batch_size: 8
num_epochs: 20
outdir: ...
log_interval_steps: 500
See examples for some predefined ASR models.