Paper Pre-print: Towards Streaming Speech-to-Avatar Synthesis
Repo cleaning in progress!
-
Download and install Autodesk Maya 2022 or 2023 (other versions not tested).
-
Clone this repository and
cdinto thewav2avatardirectory:git clone https://github.com/tejasprabhune/wav2avatar cd wav2avatar -
Install
wav2avataras an editable module (see Maya Scripting):pip install -e . mayapy -m pip install -e . -
Install
articulatoryfromhttps://github.com/articulatory/articulatoryands3prlfromhttps://github.com/s3prl/s3prl(see s3prl notes).
Given an EMA .npy file and a corresponding audio .wav file, you can directly
generate a midsagittal view of the avatar using the headless version of Maya
(Maya Standalone). Using the midsagittal.mb file provided in this repo,
you can use the headless_blast.py script as follows:
cd scripts
mayapy headless_blast.py --maya_file <MIDSAGITTAL_PATH> --ema <EMA_PATH> --wav <WAV_PATH> --outdir <OUTPUT_DIR_PATH>
mayapy headless_blast.py --maya_file ../wav2avatar/maya_models/midsagittal.mb --ema ../wav2avatar/inversion/ema/cj_journal/web/venture.npy --wav ../wav2avatar/inversion/ema/cj_journal/web/venture.wav
-
Run
inversion/wav2ema.pyto generate a.npyfile for your.wavaudio:python wav2ema.py --model_dir <MODEL_DIR> --wav_name <WAV_NAME> --save_dir <SAVE_DIR>If you do not have a Transformer/BiGRU checkpoint for inversion, you can run
inversion/linear_inversion.pywithckpts/lr_wlm_l9_mng_90_10hz.pklas the checkpoint and put your.wavfile within the.predictfunction call. -
Open
maya_models/full_face_ema.pyin Maya 2022/2023. -
Open
offline/animate_ema.pyin Maya by accessingWindows/General Editors/Script Editorthen usingFile/Open Script. -
Replace
<EMA .NPY FILE PATH>with the path to the.npyfile generated from inversion earlier. -
Right click on the timeline at the bottom of Maya, select
Audio/Import Audio, and navigate to and select the.wavfile used for inversion. You will see the waveform show up in green over the timeline. If you set the current key to 0 then click the Play icon on the right, you will hear your audio play inside Maya. -
Select all text in the Script Editor and press
CTRL + Enter. This will reset all keyframes for all joints then create new keyframes for every joint according to the.npyinversion file.If you navigate to the
Outlinerand select a sample joint (e.g.head_base), you should see many red bars in the timeline corresponding to every newly created keyframe. -
Set the current key to 0 and click the Play icon to the right of the timeline. You should simultaneously hear your audio and see the avatar animate.
There is a high chance that the avatar looks very warped if you are using a multi-speaker BiGRU or Transformer inversion model. This is an ongoing issue with inferring the resting position of the avatar during inversion for unseen speakers (solved by linear inversion). To fix this, you may have to manually change the offset values at the bottom of
offline/animate_ema.pyin the Script Editor. For example, if the original file animates thellin this way:MayaUtils.animateZ("ll", ema_handler.maya_data["ll"], 0)but the
ll(lower lip) juts too far out (in the+Zdirection), we can offset this joint such that every keyframe will shift backwards by 2 units (in the-Zdirection):MayaUtils.animateZ("ll", ema_handler.maya_data["ll"], -2)Repeating this process for every joint until the avatar looks natural is currently the only way to achieve good offline animation (you can scrub through the timeline and verify that each modification is closer to natural speaking). This is definitely very inconvenient and we are working on avatar resting position inference (will update soon!).
- Open
wav2avatar/maya_models/stream_model.mb - Open
scripts/recv_wav2maya.pyin Maya - Run
scripts/stream_wav2maya.py(replacing the model location in the code - cli coming)
When you see "allocating shared memory", run
recv_wav2maya.py in Maya. When you see "listening", you should be able to
speak and see the corresponding animations in Maya.
Note: mayapy in Maya 2022 uses Python 3.6 which doesn't support pickling
data protocol 4 from multiprocessing.shared_memory, so you should run
mayapy -m pip install shared-memory38 before running.
Maya uses their own Python environment which is located as mayapy in
C:\Program Files\Autodesk\Maya<VersionNumber>\bin\ or
/usr/autodesk/Maya<VersionNumber>/bin/ (Linux).
Adding this to PATH then allows us to do mayapy <py_file>.py.
The second setup step needed is to install numpy to this separate mayapy
instance by running mayapy -m pip install numpy
The reason this is needed is to use the maya.cmds library, where we are able
to generate whole Maya ASCII files and polygons within those files.
If you run into an error installing webrtcvad on Windows, use
pip install webrtcvad-wheels or mayapy -m pip install webrtcvad-wheels
for the corresponding mayapy installation.
s3prl does not support Windows, but we can work around this by getting
rid of all the times s3prl.hub requires the sox_io backend after cloning
the repo, then locally installing that version instead of from the original
pip library.
We do a similar thing for torchaudio if it throws an error.