

Both methods achieve good performance in terms of speech naturalness and similarity to the original speaker. Two approaches are explored: speaker adaptation, which fine-tunes a multi-speaker model with cloning samples, and speaker encoding, which trains a separate model to infer new speaker embeddings from cloning audios. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou)Ī neural voice cloning system is introduced, using a few audio samples to create personalized speech interfaces. Neural Voice Cloning with a Few Samples - NeurIPS 2018 (Sercan O.Note: This question is similar to What is the State-of-the-Art open source Voice Cloning tool right now?, except that that question is old and the project mentioned only does text-to-speech, not speech-to-speech.Īdditional projects that might be of interest: This open source project seems to do what I want, cloning Kate Winslet's voice, but it has no installation instructions and so I haven't tried yet.Ĭan you recommend an open-source project, ideally in Python and Tensorflow, to roughly replace a voice with another? Another startup is play.ht, but again it seems to be English-only. The engineer published a masters' thesis as an open-source project, but this project does only text-to-speech, not speech-to-speech. The closest I have found is Resemble.ai, which has an impressive video, but the public plan is only in English and other languages are prohibitively expensive. I have 30 minutes to one hour of utterances from each voice I want to clone. It doesn't need to be perfect 80% right and believable would be enough to get good feedback and reach a final version of the script before recording. I would like to record my voice, in English or other languages, then run a neural network and produce an audio with the same text, intonation and emotion but with roughly the actors' voices. For the prototype, I have a set of recordings from voice actors. I want to program and train a voice cloner, in part to learn about this area of AI, and in part to use as a prototype of audio for testing and getting feedback from early adopters before recording in a studio with voice actors.
