As much as possible the Persephone library strives to be a Python-only library to make installation as easy as possible. Due to the nature of processing sound files we have to interact with various utilities that are non-Python.
The Persephone library requires you have Python 3.5 or Python 3.6 installed. Currently Python 3.7 is not supported because we depend on Tensorflow which currently does not support Python 3.7 (see the relevant issue thread)
Installation from PyPi¶
The Persephone library is available on PyPi: https://pypi.org/project/persephone/
The easiest way to install is via the pip package manager
pip install persephone
The library depends on a few binaries being installed:
- Kaldi (Optional, required for pitch features support)
There are some bootstrap scripts that are used to provision a development environment which will install the required system packages from apt.
See the Configuration section for how to configure Persephone to use these binaries.
The library requires various binaries to be available and directories to be present in order to work. Various defaults are defined as per values in persephone/config.py to override any of these you can create a file called
settings.ini at the same base path that you are invoking Persephone from.
Once you have the binaries in the External binaries section installed you may need to configure the paths to them.
Here is an example of how to specify the path to required binaries in the
[PATHS] SOX_PATH = "sox" FFMPEG_PATH = "ffmpeg" KALDI_ROOT = "/home/oadams/tools/kaldi"
Here “sox” and “ffmpeg” must be available on the path and
KALDI_ROOT specifies an absolute path. Note that these paths can also be specified as absolute paths if you wish.
There’s a variety of filesystem paths that will be used for storage of data. Here is an example of how to specify paths in the
[PATHS] CORPORA_BASE_PATH = "./ourdata/original/" TGT_DIR = "./preprocessed_data" EXP_DIR = "./experiments"
CORPORA_BASE_PATH will specify the base for paths that contain the original un-preprocessed source corpora. The default for this is
TGT_DIR will specify the target directory to store preprocessed data in. The default for this is
EXP_DIR will specify the directory where experiment results are saved in. The default for this is