Coqui TTS has blew my mind!

Wed Feb 9 13:09:09 UTC 2022

Sorry! My English is not well. How to let it speakChinese? Please give a 
sample of command-line. Thank you!

On Wed, 9 Feb 2022, Linux for blind general discussion wrote:

> Date: Wed, 09 Feb 2022 11:59:26 +0000
> From: Linux for blind general discussion <blinux-list at redhat.com>
> To: blinux-list at redhat.com
> Subject: Re: Coqui TTS has blew my mind!
> 
> Hello Chrys,
>
> I think the problem is that Python 3.10 is not supported as of now.
>
> https://pypi.org/project/TTS/
>
>
> Though I'm not sure why. May be some of the backing libraries are not
> yet compatible, I remember this being a problem in the past with new
> releases of TensorFlow.
>
>
> Perhaps a virtual environment with lower Python version could do the trick?
>
>
> Best regards
>
>
> Rastislav
>
>
> Dňa 9. 2. 2022 o 11:48 Linux for blind general discussion napísal(a):
>> Howdy,
>>
>> just want to try coqui again (after a while) and just got this:
>> $ pip3 install TTS
>> Defaulting to user installation because normal site-packages is not
>> writeable
>> ERROR: Could not find a version that satisfies the requirement TTS
>> ERROR: No matching distribution found for TTS
>>
>> any ideas?
>>
>> cheers chrys
>>
>> Am 09.02.22 um 11:40 schrieb Linux for blind general discussion:
>>> Howdy Rastislav,
>>>
>>> yea Coqui is awsome. it was initial part of mozillas TTS and STT efforts.
>>> we really should have  an speech-dispatcher driver for that :).
>>>
>>> by the way, keep up your great work! Just take a look at the C#
>>> speech-dispatcher bindings.
>>>
>>> cheers chrys
>>>
>>> Am 09.02.22 um 11:25 schrieb Linux for blind general discussion:
>>>> Hello everyone,
>>>>
>>>> may be I've discovered America, but yesterday I mostly randomly came
>>>> across:
>>>>
>>>> https://erogol.github.io/ddc-samples/
>>>>
>>>>
>>>> And the voice has completely blew my mind!
>>>>
>>>> Like, I knew the TTS area has advanced significantly in the recent
>>>> years, but I thought the new neural voices are mostly closed features of
>>>> companies like Google or Microsoft.
>>>>
>>>> I had no idea we had something so beautiful on linux and completely
>>>> open-source!
>>>>
>>>>
>>>> Plus, it's not just the license that makes this so interesting, but also
>>>> the usability.
>>>>
>>>> There were the Deepmind papers even before and some open projects trying
>>>> to implement them, but the level of completeness and usability varied
>>>> significantly, even if a project was usable, getting it to work required
>>>> some effort (at least the projects I saw).
>>>>
>>>>
>>>> With Coqui, the situation is completely differrent.
>>>>
>>>> As the above mentioned blog says, all you need to do is:
>>>>
>>>>
>>>> $ pip3 install TTS
>>>>
>>>> $ tts --text "Hello, this is an experimental sentence."
>>>>
>>>>
>>>> And you have a synthesized result!
>>>>
>>>>
>>>> Or you can launch the server:
>>>>
>>>> $ tts-server
>>>>
>>>>
>>>> And play in the web browser. Note that the audio is sent only after it's
>>>> fully synthesized, so you'll need to wait a bit to listen it.
>>>>
>>>>
>>>> The only problematic part is the limit of decoder steps, which is set to
>>>> 500 by default.
>>>>
>>>> I'm not sure why did they put it so low, with this value, the TTS is
>>>> unable to speak longer sentences.
>>>>
>>>>
>>>> Fortunately, the fix is very easy. All I needed to do was to open
>>>> ~/.local/lib/python3.8/site-packages/TTS/tts/configs/tacotron_config.py
>>>>
>>>> and modify the line:
>>>>
>>>>        max_decoder_steps: int = 500
>>>>
>>>> to
>>>>
>>>>        max_decoder_steps: int = 0
>>>>
>>>>
>>>> which seems to disable the limit.
>>>>
>>>>
>>>> After this step, I can synthesize very long sentences, and the quality
>>>> is absolutely glamorous!
>>>>
>>>>
>>>> So I wanted to share. I may be actually the last person discoverying it
>>>> here, though I did not see it mentioned in TTS discussions on this list.
>>>>
>>>>
>>>> I've even thought about creating a speech dispatcher version of this. It
>>>> would certainly be doable, though I'm afraid what would the synthesis
>>>> sound like with the irregularities of navigation with a screenreader.
>>>> These voices are intended for reading longer texts and consistent
>>>> phrases, with punctuation, complete information etc.
>>>>
>>>> The intonation would probably get a bit weird with for example just a
>>>> half sentence, as happens when navigating a document or webpage line by
>>>> line.
>>>>
>>>>
>>>> Another limitation would be the one of speed. On my laptop, the realtime
>>>> factor (processing duration / audio length) is around 0.8, what means it
>>>> could handle real-time synthesis at the default speed without delays.
>>>>
>>>>
>>>> The situation would get more complicated with higher speeds, though.
>>>>
>>>> It wouldn't be impossible, but one would need a GPU to handle
>>>> significantly higher speech rates.
>>>>
>>>>
>>>> So I wonder.
>>>>
>>>>
>>>> But anyway, this definitely made my day. :)
>>>>
>>>>
>>>> Best regards
>>>>
>>>>
>>>> Rastislav
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Blinux-list mailing list
>>>> Blinux-list at redhat.com
>>>> https://listman.redhat.com/mailman/listinfo/blinux-list
>>>
>>> _______________________________________________
>>> Blinux-list mailing list
>>> Blinux-list at redhat.com
>>> https://listman.redhat.com/mailman/listinfo/blinux-list
>>
>> _______________________________________________
>> Blinux-list mailing list
>> Blinux-list at redhat.com
>> https://listman.redhat.com/mailman/listinfo/blinux-list
>
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list at redhat.com
> https://listman.redhat.com/mailman/listinfo/blinux-list