open source tts programs?

ana · April 30, 2024, 11:30pm

i wanna have my own tts setup for making videos and such like lilian does, tho its hard finding open source tts programs/voices, lilian uses a microsoft product and me being the foss-brained linuxmaxxer that i am i dont wanna go near that, so im wondering if any of yous have any info on this

soyl · April 30, 2024, 11:40pm

It’s not sure if its possible to get Curses working on linux using Wine, though it does know that @Pankit has a self described “total abomination” setup that works. Not sure if they have an ability to share some insite but if youre a linuxoid they might be able to help with this specific thing

Though considering its possible to get the default narrator on Minecraft, which usually uses the default windows voices working, with Flyte or whatever, maybe that would work for Curses too

vesper · April 30, 2024, 11:45pm

i’ve not tested either of the two here as of yet (configuring a program myself is a little more work, especially as i’m fine using my own voice and just want the text to speech setup for parsing twitch chat and similar), but these are a couple i’ve bookmarked already on github for when i do have time to work on them.

can’t help with the speech to text end of the functions, though. not familiar enough.

the local setup here currently is OSS, though not F. i use a modified setup of TTS by K running on SAMMI for my twitch chat needs, which is around 12 USD from recollection. it’s probably not what you’re looking for, though.

ana · April 30, 2024, 11:49pm

dont need the speech to text part, only text to speech (wont be streaming), thank you

zqlk · May 1, 2024, 12:09am

There is of course the venerable espeak, for that authentic 90s charm.

ana · May 1, 2024, 12:10am

only one english female voice included, and id like for the tts to still be understandable when heard

ana · May 1, 2024, 12:17am

ok i think ive found a voice/config i really like, its using piper-tts’s cori and i set the sample rate to 26000, what do yous think
(google drive upload cause i cant figure out a different way to upload the audio file)

Pankit · May 1, 2024, 12:18am

so yeah, I don’t think my setup is very recommendable or stable but I can at least give some pointers on how I’ve set up a few things. This isn’t really structured, I’m just trying to get the basic information out, so please excuse the word-vomit.

I’ve used Flite and Piper, both of which were linked above. Flite is much more robotic (even compared to older Windows TTS) and Piper is a more modern-sounding neural TTS.

for Flite, getting my distro’s (Arch) package was what got it to work in programs that optionally require it (namely Minecraft). Compiling it from source can give you more voice options. I think there might be a way to use external voices without recompiling, but I haven’t looked into that.

I got Piper from the AUR package piper-tts-bin, but it doesn’t come with any voices, so I manually installed a few. Voice download links and instructions are in the README of Piper’s GitHub repo.

if you’re just looking to use the voice in videos, you can export a .wav file pretty easily using either one’s commands.

flite -t "Hello world" -o "hello.wav"
echo "Hello world" | piper-tts --model ~/piper-voices/en_US-kathleen-low.onnx -f "hello.wav" (make sure --model points to wherever you downloaded the voice)

I also have a hotkey that lets me type something to play as TTS into a virtual microphone, which is the actually messy part. I’ll elaborate on that in a bit, if you’d like.

ana · May 1, 2024, 12:27am

hmm yeah the virtual microphone bit might be helpful too since im usually stuck being micless in games

dont be afraid to get into detail too, i can program, i can make my own wrapper for smth if need be

Pankit · May 1, 2024, 10:34pm

sorry for taking a while to get back about this. I'll go over the hotkeyed command I use for starters, then the virtual mic setup. Since it seems like you're fine with using Piper, I'll just explain that.

The TTS script

my cursed setup has me using Wofi as a textbox to enter a message. Wofi is an application launcher first and foremost, but you can use it as a somewhat scriptable menu too. I don't give it any menu options in this case, but it's still usable as a textbox that lets you write a line of text to stdout, so it's useful for this case.

the problem with recommending wofi for this is that it's Wayland only, and also unmaintained. If you can't use it, you might have to convert this to something using a launcher that works for your system, but there's many alternatives out there that fill very similar roles.

with that out of the way, here's the command I made:

echo $(wofi -d -H 50 -p "Say something via TTS...") | piper-tts --model ~/piper-voices/en_US-kathleen-low.onnx --output_raw | aplay -r 16000 -f S16_LE -t raw -

this pipes the Wofi input to Piper, which then pipes the audio to aplay to actually play it. Make sure to set Piper's --model flag to the path of your preferred voice, and to set the -r flag in aplay to whatever sample rate is suitable for your voice (otherwise the speed and pitch will be off).

then, I just hotkeyed that (meta+alt+t in my case, but whatever works for you). If you try it like this, whatever you typed should be played into your default speaker. The prompt looks like this for me.

Putting this through a virtual microphone

I set up a virtual mic in PulseAudio using these commands. It creates a virtual speaker that I can play TTS into, which then goes to the virtual mic. Sometimes these disappear and I have to recreate them. I think it might just be after rebooting but I'm not sure.

pactl load-module module-null-sink sink_name="virtual_speaker" sink_properties=device.description="virtual_speaker"
pactl load-module module-remap-source master="virtual_speaker.monitor" source_name="virtual_mic" source_properties=device.description="virtual_mic"

then, I need to make the TTS actually play into that speaker instead of the default one. There's probably some fancy way to do this, but I just make the TTS play something really long, giving me time to change its output to "virtual_speaker" in pavucontrol. In my experience this has made it continue to output to virtual_speaker with each run.

after that, it should go to your virtual mic! This should technically work on its own, but there's a couple other things I do for quality-of-life.

Additional things

when you play the TTS through the virtual speaker you, unsurprisingly, won't be able to hear it yourself. If you want to hear it as well, you can set up your virtual mic to output to whatever output device you listen through as well.

there's ways to do this through pactl, I'm sure, but as of right now I just manually do it through qpwgraph. Just connect virtual_mic's outputs to whatever output device you're using, like this. I do have to redo this fairly regularly, I think it resets when I suspend my system.

also, Wofi does not appear over fullscreen applications by default, which can be a problem for games, but this is very easily changed by adding the line layer=overlay to the file ~/config/wofi/config

and that's currently how I have things set up! I'm sorry it's so hacky, I wasn't really expecting to need to share it anytime soon, but I hope it can at least be helpful as a reference.

Five · May 2, 2024, 3:02pm

zenity should work as a drop-in replacement to wofi in the script:

zenity --forms --text="TTS Message" --add-entry ""

(it’s packaged in ubuntu and fedora, not sure which installation method is best for other distros)