----------------------------> Model Architecture <-----------------------
VAW-GAN (SP+CWT): VAW-GAN system that converts spectrum and CWT-based F0 (with no conditioning on decoder);
VAW-GAN (SP+F0+C): Converts the spectrum with VAW-GAN conditioned on LG-based F0 without CWT decomposition, where F0 is converted with LG-based linear transformation;
VAW-GAN (SP+CWT+C)(Proposed): Converts the spectrum with VAW-GAN conditioned on CWT-based F0, where F0 is converted with VAW-GAN with CWT decomposition.
-----------------------> Emotional Speech Samples <-----------------------
(1) VAW-GAN (SP+CWT) vs. VAW-GAN (SP+CWT+C)(Proposed)
Source
CWT-VAWGAN
C-CWT-VAWGAN (Proposed)
Target
Neutral-to-Angry
Neutral-to-Sleepy
(2) VAW-GAN (SP+F0+C) vs. VAW-GAN (SP+CWT+C)(Proposed)
Source
CWT-VAWGAN
C-CWT-VAWGAN (Proposed)
Target
Neutral-to-Angry
Neutral-to-Sleepy