Coqui TTS has blew my mind!

Linux for blind general discussion blinux-list at redhat.com
Wed Feb 9 11:59:26 UTC 2022


Hello Chrys,

I think the problem is that Python 3.10 is not supported as of now.

https://pypi.org/project/TTS/


Though I'm not sure why. May be some of the backing libraries are not
yet compatible, I remember this being a problem in the past with new
releases of TensorFlow.


Perhaps a virtual environment with lower Python version could do the trick?


Best regards


Rastislav


Dňa 9. 2. 2022 o 11:48 Linux for blind general discussion napísal(a):
> Howdy,
>
> just want to try coqui again (after a while) and just got this:
> $ pip3 install TTS
> Defaulting to user installation because normal site-packages is not
> writeable
> ERROR: Could not find a version that satisfies the requirement TTS
> ERROR: No matching distribution found for TTS
>
> any ideas?
>
> cheers chrys
>
> Am 09.02.22 um 11:40 schrieb Linux for blind general discussion:
>> Howdy Rastislav,
>>
>> yea Coqui is awsome. it was initial part of mozillas TTS and STT efforts.
>> we really should have  an speech-dispatcher driver for that :).
>>
>> by the way, keep up your great work! Just take a look at the C#
>> speech-dispatcher bindings.
>>
>> cheers chrys
>>
>> Am 09.02.22 um 11:25 schrieb Linux for blind general discussion:
>>> Hello everyone,
>>>
>>> may be I've discovered America, but yesterday I mostly randomly came
>>> across:
>>>
>>> https://erogol.github.io/ddc-samples/
>>>
>>>
>>> And the voice has completely blew my mind!
>>>
>>> Like, I knew the TTS area has advanced significantly in the recent
>>> years, but I thought the new neural voices are mostly closed features of
>>> companies like Google or Microsoft.
>>>
>>> I had no idea we had something so beautiful on linux and completely
>>> open-source!
>>>
>>>
>>> Plus, it's not just the license that makes this so interesting, but also
>>> the usability.
>>>
>>> There were the Deepmind papers even before and some open projects trying
>>> to implement them, but the level of completeness and usability varied
>>> significantly, even if a project was usable, getting it to work required
>>> some effort (at least the projects I saw).
>>>
>>>
>>> With Coqui, the situation is completely differrent.
>>>
>>> As the above mentioned blog says, all you need to do is:
>>>
>>>
>>> $ pip3 install TTS
>>>
>>> $ tts --text "Hello, this is an experimental sentence."
>>>
>>>
>>> And you have a synthesized result!
>>>
>>>
>>> Or you can launch the server:
>>>
>>> $ tts-server
>>>
>>>
>>> And play in the web browser. Note that the audio is sent only after it's
>>> fully synthesized, so you'll need to wait a bit to listen it.
>>>
>>>
>>> The only problematic part is the limit of decoder steps, which is set to
>>> 500 by default.
>>>
>>> I'm not sure why did they put it so low, with this value, the TTS is
>>> unable to speak longer sentences.
>>>
>>>
>>> Fortunately, the fix is very easy. All I needed to do was to open
>>> ~/.local/lib/python3.8/site-packages/TTS/tts/configs/tacotron_config.py
>>>
>>> and modify the line:
>>>
>>>        max_decoder_steps: int = 500
>>>
>>> to
>>>
>>>        max_decoder_steps: int = 0
>>>
>>>
>>> which seems to disable the limit.
>>>
>>>
>>> After this step, I can synthesize very long sentences, and the quality
>>> is absolutely glamorous!
>>>
>>>
>>> So I wanted to share. I may be actually the last person discoverying it
>>> here, though I did not see it mentioned in TTS discussions on this list.
>>>
>>>
>>> I've even thought about creating a speech dispatcher version of this. It
>>> would certainly be doable, though I'm afraid what would the synthesis
>>> sound like with the irregularities of navigation with a screenreader.
>>> These voices are intended for reading longer texts and consistent
>>> phrases, with punctuation, complete information etc.
>>>
>>> The intonation would probably get a bit weird with for example just a
>>> half sentence, as happens when navigating a document or webpage line by
>>> line.
>>>
>>>
>>> Another limitation would be the one of speed. On my laptop, the realtime
>>> factor (processing duration / audio length) is around 0.8, what means it
>>> could handle real-time synthesis at the default speed without delays.
>>>
>>>
>>> The situation would get more complicated with higher speeds, though.
>>>
>>> It wouldn't be impossible, but one would need a GPU to handle
>>> significantly higher speech rates.
>>>
>>>
>>> So I wonder.
>>>
>>>
>>> But anyway, this definitely made my day. :)
>>>
>>>
>>> Best regards
>>>
>>>
>>> Rastislav
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Blinux-list mailing list
>>> Blinux-list at redhat.com
>>> https://listman.redhat.com/mailman/listinfo/blinux-list
>>
>> _______________________________________________
>> Blinux-list mailing list
>> Blinux-list at redhat.com
>> https://listman.redhat.com/mailman/listinfo/blinux-list
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list at redhat.com
> https://listman.redhat.com/mailman/listinfo/blinux-list





More information about the Blinux-list mailing list