1. Speech-side prosody control
Speech-side prosody embedding can control prosody of specific frame. * The red line denotes the 1st dimension of the prosody embedding. * The green line denotes the 2nd dimension of the prosody embedding.Text: I had a dear friend, once a brown terrier, "skye" they called her. No Adjustment
Adjusted 1st dimension (pitch)
Adjusted 2nd dimension (amplitude)
Text: But when it came to breaking in, that was a bad time for me. No Adjustment
Adjusted 1st dimension (pitch)
Adjusted 2nd dimension (amplitude)
Text: I know nothing about it, but Fanny must teach me. No Adjustment
Adjusted 1st dimension (pitch)
Adjusted 2nd dimension (amplitude)
2. Text-side prosody control
Text-side prosody embedding can control prosody of specific phoneme.- The red line denotes the 1st dimension of the prosody embedding.
- The blue line denotes the 2nd dimension of the prosody embedding.
- The yellow line denotes the 3rd dimension of the prosody embedding.