It’s 7 PM. I sit down at my Fedora Linux PC and start a MMORPG. We want to brawl with Red Alliance over some systems they attacked; the usual stuff on a Friday evening in EVE. While we are waiting on a Titan to bridge us to the fighting area, a tune comes to my mind. “Carola, I wanna hear hurricanes.” My HD LED starts to blink for a second and I hear Carola saying “I found one match” and the music starts.
What sounded like Sci-Fi twenty years ago is now a reality for many PC users. For Linux users, this is now possible by installing “Carola” as your Personal Voice Assistant (PVA)[1].
Carola
The first thing people often ask is, “Why did you name it Carola?” π This is a common misconception. Carola is not the project name. It’s the keyword the PVA reacts to by default. It is similar to “Alexa” or “OK, Google” for those who are familiar with those products. You can configure this keyword. You can also configure other things such as your location, which applications to use by default when opening media files, what CardDAV server to use when looking up contact information, etc. These settings can be personalized for each user. Some of them can even be changed by voice command (e.g. the name, the default TTS engine, and the default apps).
In 2021 I read an article about the Speech-To-Text (STT) system Vosk[2] and started to play a bit with it. The installation was easy. But there was no use-case except for writing what one said down to the screen. A few hours and a hundred lines of Java code later, I could give my PC simple commands. After a few more days of work, it was capable of executing more complex commands. Today, you can tell him/her/it to start apps, redirect audio streams, control audio and video playback, call someone, handle incoming calls, and more. If you have a smart-home (which I haven’t π) you can even switch on the light in the kitchen. But back to why I chose “Carola” — it was the most recognizable by the STT system I was using at the time. π
Note: This PVA has no English translations yet because it was developed in German. I will use rough translations here which should work once someone helps with translating the config. There are videos out about Carola which show these kinds of interactions in reality[6]. For now, because a few dependencies are unavailable from Fedora Linux’s default repositories, you will need to install the system manually. But don’t be afraid, it is all described in the READ.ME and it is pretty simple.
The time of eSpeak has run out
A voice assistant doesn’t just react to your speech. It can also reply back to you. You might expect it to sound like a creepy robot voice from the 1960s when it was invented. But trust me, Fedora Linux can do better. π Naturally, eSpeak was the first choice because it was installed on the system by default. But today we can choose (even by voice command π) what speech engine we want to use.
Text-To-Speech (TTS) systems
Text-To-Speech systems translate the written text to a listenable waveform. Some systems output the waveform directly. Others produce MP3 or WAV files which you can then play with, for example, sox. Over the last year, we’ve experimented with several TTS systems. One outputs the text via the Samsung TTS engine on an Android device. It sounds great, but it requires your cellphone and a special server application. π
One of the first speech engines I tried was MBROLA[3]. It produces more understandable audio than eSpeak. But it’s still far from being “good”, as it relies on eSpeak. Think of it as an eSpeak pre-processor. Unfortunately, it is not currently available from Fedora Linux’s default repositories.
Next was Pico2Wave, which still uses the same technique as eSpeak, but on a higher level. Still, this does not meet our vision of speech quality. Also, it has to be extracted from an old Ubuntu package because it isn’t available from the Fedora Linux repositories.
With MaryTTS[4] we reach the first speech processor that produces human speech patterns combined with accent and dialect. MaryTTS was developed in Germany at the Saarland University & the German Research Center for AI. It’s not meant to run on your local PC (but does quite well). Rather, it is meant to run on a network to offer speech output to any kind of client that asks for it. Because modern PCs have way more CPU power than it requires, you can also run it solo on your PC. However, running it locally requires compiling the source code which can be a little tricky to do.
MaryTTS comes with different languages and one remarkable voice for Germans — an old Bavarian woman. This voice model was trained from an old speech archive in Munich and it’s so good that you’d think your PC is at least seventy years old. π This also makes it seem like you are giving commands to an old women who should be in retirement. This makes giving commands to your PC problematic; trust me. π
The top of the line of available TTS systems is GTTS[5]. It is available from the default Fedora Linux repositories. This TTS produces the best sound quality for a wide variety of languages and, for standard voices, “the first 4 million characters are free each month”[7]. The downside of this TTS is that the text is sent to Google’s servers[8] for processing and translation into an audible speech format. This might be a privacy concern depending on the nature of the data that you are sending for translation. Be mindful that the PVA will repeat parts of what you said if it did not understand your command or when it tells you things in response to your question. For this reason, GTTS is not the default TTS. However, it is pre-configured in the PVA repo[1] and ready to use.
Let’s talk about privacy
You just read that your TTS system could rat you out to Google. This leads to other questions. Namely, “Does the STT system do this too?” It doesn’t. Vosk runs entirely on your PC. There are no cloud services involved. But of course, your assistant needs to know things in order to assist you.
One of the simpler functions is to produce a weather report. For this to work, the assistant needs to relay your question to the weather provider’s API server. The provider can reasonably assume that you normally aren’t interested in the weather for places you do not live. So the server can derive where that you live based on what city you most frequently inquire about and it can collaborate its deduction based on your device’s IP address.
Consequently, if you configure a service in your PVA’s config, you should ask yourself if this service will cause a privacy problem for you. Many PVA functions won’t because they work locally or they use services under your control (the CalDAV and CardDAV address book services, for example). The same is true for the upcoming IMAP feature. You will use the same email provider that your email app is already configured to use. So there are no extra concerns for the IMAP feature. But what happens if you decided to use GTTS because these simple text fragments are “no big deal” and the PVA reads out loud the incoming email? Here the decision for a TTS engine gets more and more important.
One of PVA’s functions is playing music on command. By itself, this function may not seem concerning. However, an upcoming feature might change this. It will be able to tell what you want to listen to by the use of abstract requests like “Jazz”. At the moment, the PVA searches for this term in filenames and metadata it found in your MP3 archive. In the future, it will have a local track list of all the audiophile wishes you had in the past and it will produce a TOP song list for the search term.
It will write every file you added to the playlist to a database and it will count how many times a match to your abstract request is found and play the song(s) with the best score. On the plus side, you get your favorite music. But what happens if an external plugin or hacked app takes this list and exfiltrates the data to someone who pays for this kind of information?
Now this feature becomes a privacy concern. There is no great risk at the moment since this feature needs to be enabled and, for now, PVA is not widespread enough to be a target. But this may change and you should be aware of these kinds of privacy concerns before they become a problem.
As a best effort to address such privacy concerns, PVA will disable features that require external communication by default. So if you use the base installation, there should be very few privacy concerns. There is one known exception — all texts sent to the PVA app are recorded in ~/.var/log/pva.log. This should make it easier to find flaws in the STT engine and track down other problems.
Always keep in mind that privacy can also be undermined by third-party add-ons.
What can you expect from your assistant?
PVA auto-configures itself on first start-up with a basic configuration. For example, it adds the default paths from the freedesktop.org specs to all your pictures, music, documents and videos. It creates a local cache and config directory where you can place your version of the config files, add new ones, or overwrite existing ones. Usually user customizations are added to the config. But you can overwrite existing values. Before we dig deeper, let’s present some more of PVA’s features.
The Weather Report app is a classic. No assistant would be complete without it. π All it needs to know is the name of your hometown. The weather provider used is wttr.in. You can point your browser at this URL to find for your city’s unique identifier. You have no idea how many “Neustadt” exists in Germany alone. π Because it is a webservice, you don’t need to install anything. It works out-of-the-box using cURL.
Asking your PVA what time it is, is also a classic and it works out of the box. You can also ask your PC how it feels: “Carola, what is the actual load?” Or, more abstractly, “Carola, how do you feel?” π
For playing audio, PVA uses QMMP by default. It comes with an easy to use command line interface and a rich feature set. You will need to install this app before you can use it. Luckily, Fedora Linux ships this. QMMP gives you remote control over loudness, track number, track-position, playback, and it gives us information about what is currently playing. So you can ask your PVA what is playing when it is playing random tracks.
Controlling QMMP by voice is one of the features I cannot do without again. I often end up in situations where I have full-screen windows with complex code on the screen. If you loose your focus, you loose your workflow. Developers call this “being in the flow” or “in the tunnel”. Here, voice control comes in very handy. You do not need to divert your focus from your work. You can just say what you want and it “magically” it happens! π
The phone-call feature works in a similar way. You can now ask your SIP software to make a call to a contact in your CardDAV address book without diverting your focusing from your work. As this is a complex process, it needs a little extra care. There is a special internal function for handling the parsing of the command sentence. “Carola, call Andreas” is enough to initiate a call. The first match in the address book will be called. If you know more than one person with the same name, I hope they have nicknames. π Also, since one contact might have multiple phone numbers (e.g. for home and for work), you can specify which number should be called: “Carola, call Andreas at work.”
Even if it doesn’t look like a complex problem, consider which of the words the PVA receives are part of the name and which are just binding words like “at” in the above example? It is not that easy to determine. So we need to follow a precise syntax. But it needs to be natural or humans will not use it. Another thing that is important when interacting with humans is that they tend to alternate their command structures. Parsing a human sentence is more complex for a computer than you might think. It’s natural for you, but the opposite of logic for a computer. Keep this in mind as you read further. It is a reoccurring challenge.
Another example of when voice control can come in handy is controlling video playback is while you exercise. You are likely not in reach of a mouse or a remote (KDE Connect) nor do you want to stop your work out just because someone asks you something. With voice control, you can ask the PC to pause playback and then ask it to resume after you have answered the question or otherwise addressed the problem.
For audio and video players that offer a MPRIS2 interface on DBUS, you can control them on the spot without adding the CLI (command line interface) commands to your config. Based on MPRIS2 you can even control Netflix or YouTube in your Firefox browser. OK, you can’t currently choose the track to watch. But you can change the volume and (un)pause the playback. And all of this can be done with the same set of commands in your config.
There are many situations where voice control is superior or even necessary. What I haven’t told you yet is that the first device I deployed PVA to was a PinePhone. Smartphones can be used for many things. You might use it as an MP3 player while you drive or as a navigational tool. It doesn’t work with Gnome Maps (yet). But controlling a PinePhone via voice while driving will be more and more important in the Linux community. So hopefully further advancements will be made in this area. Fun fact/tip, if you use it as an MP3 player, don’t make it too loud. Or better yet, use an external speaker system[6].
If you use Thunderbird to manage your email, it is capable of composing and sending an email using only CLI arguments. Consequently, your PVA can compose and send email using Thunderbird. All you need to do is to tell your PVA the recipient, the subject, and then dictate the content of the body. It will also do some minor interpretation for you. While I still write my emails by hand, I can imagine situations where it could be useful. I did not have an opportunity to work with a disabled person to test this method of email composition. But it might be interesting.
The PVA can also be handy for short reminders. You can tell your PVA when and what it should remind you about. For example, “Carola, remind me in 10 minutes to check the kitchen” or, “Carola, remind me at 10 to call Andreas”. The PVA will respond if it understood and acknowledge your reminder.
The best comes last. With Twinkle, your PVA can take a call and interact with the caller just as it does with you. One thing I have not explained yet is that your PVA requires authorization codes for vital or potentially dangerous operations. I find it reminiscent of a scene from “Star Trek”. π
“Carola, reboot PC.”
“Authorization code alpha is needed to perform this operation.”
“Carola, authorization code alpha three four seven.”
And if the code is correct, the requested action will proceed. Requiring these authorization codes helps to alleviate the fear that someone might hack your PC through the PVA and cause trouble. Though about the worst that could happen at the moment is that you could unwillingly send spam to someone or your PC might be left playing music all day long. π However, if you have home automation running and configured, it is probably better not to have Twinkle answer the phone.
Let’s take a look behind the curtain
This list is long and detailed. So I will focus on some basics.
PVA comes with a rudimentary form of hard-coded human responses. You can modify and expand them as you like. But there is no intelligence in them. You can, however, build reaction chains that are so long that a normal human cannot detect the repeating phrases.
Here are two examples. Let’s ask the PVA for its name.
reaction:”what is your name”,””,”my name is %KEYWORD”
Here is a more complex example that uses the word “not”.
reaction:”this works”,”not”,”of course it does”
reaction:”this works not”,””,”uh oh, please contact my developer”
As the absence of the word “not” is crucial to the meaning of a sentence, reactions contain “positive” and “negative” terms to work. The rule is as follows.
Positive words MUST be inside the sentence, negative words MUST NOT be inside the sentence. In developer terms it can be written as “if ( p && !n ) do ⦔.
If your reaction texts give a human a new clue what to say next and you can anticipate this, then it is possible to build complex reaction chains that will simulate a conversation. I have even heard from people using this feature that they like talking to their PC (and these are not your stereotypical nerds π). As you can use the same trigger for multiple reactions, alternative chains are possible.
Starting applications
Part of the basic functionality is to start the apps that you’ve named. In the beginning, there was a fixed list of apps that was hard-coded. But now, you can extend this via the config. Let’s take a quick look at it.
app:”office”,”openoffice4″
app:”txt”,”gedit”
app:”pdf”,”evince”
app:”gfx”,”gnome-open”
app:”mail”,”thunderbird”
The corresponding voice commands would be “Carola, start mail”, “Carola, open mail” or, in free-form, you could say, “Carola, start Krita” (Krita is an OpenSource paint app that is available on Fedora Linux). You can configure several alternative versions of the command sentences. These can be regular expressions (regex) or multiple entries in the config file. These apps are also used in complex commands like searching for files and opening the resultant set with an app. For example, “Carola, search pictures of Iceland and open them with Krita.” The previous command would cause the PVA to search in your configured picture paths for filenames matching βicelandβ and then open them with Krita. This works for all apps as long as their launchers accept filenames as arguments on their CLI. Even if this isn’t the case for your favorite app, you might still be able to write a small “wrapper” script in Bash and then use that script as the app target for the PVA.
Via voice command, you can switch apps out for a configured alternative on the fly.
alternatives:”firefox”,”web”,”firefox”
alternatives:”google chrome”,”web”,”google-chrome”
alternatives:”chromium free”,”web”,” chromium-freeworld”
alternatives:”chromium privacy”,”web”,” chromium-privacy-browser”
alternatives:”openoffice”,”office”,”openoffice4″
alternatives:”libreoffice”,”office”,”libreoffice”
By using the “use {alternative}” syntax you select what you want to use next. For example, “Carola use Firefox” or “Carola use Chromium free”. It’s working for any app in the app list. But how are these commands defined?
command:”start”,”STARTAPP”,”app”,””
command:”open”,”OPENAPP”,”app”,””
Quite simple: “Carola, start Firefox app” will start Firefox. “Carola, start Firefox” would also work because the term “app” is filtered out.
Next is the positive and negative list of words again. Here is the reaction syntax.
command:”how is the weather”,”CURRENTWEATHERNOW”,””,”tomorrow”
command:”how will the weather be”,”CURRENTWEATHERNEXT”,””,”tomorrow”
command:”how will the weather be tomorrow”,”CURRENTWEATHERTOMORROW”,””,””
The commands in the second column are some of PVA’s internal function names. They are coded internally because processing the result can be tricky. You can, however, outsource those commands to an external Bash script. Below is an example that shows how to call a custom Bash script.
replacements:”h d m i”,”hdmi”
replacements:”u s b”,”usb”
command:”switch .* to kopfhΓΆrer”,”EXEC:pulse.outx:x%0x:xhdmi”
command:”switch .* to lautsprecher”,”EXEC:pulse.outx:x%0x:xdefault”
command:”switch .* to .* now”,”EXEC:pulse.outx:x%0x:x%1″
The last of the above commands will switch the output device of a named (first “.*”) running app in pulseaudio to the named (second “.*”) device. As you can see, it works using regular-expression-like syntax. The syntax is not, however, fully-featured. All three of the above commands call the script pulse.out with two parameters. That is, “pulse.out <procname> <devicename>”. Now there is no need to start PAVUControl anymore. Instead, you can just grab your headset and tell your PC to use it. π
There is no limit to the number of .* wildcards in the command or execution part. Feel free to code a response for something like, “Carola, switch on the light in the kitchen.”
command:”switch .* the light in .*”,”EXEC:smarthomex:x%0x:x%1″
If you are wondering about those “x:x” character sequences in the execution part, those are being used to escape whitespaces because they tend to appear in filenames.
Oops, it sent your mail to Hermine instead of Hermann
All these voice commands only work correctly when the STT system works flawlessly. Of course, it often does not. Do not have false hopes. It’s not perfect. However, PVA can auto-correct some common errors by configuring replacements.
PVA can correct simple mistakes.
replacements:”hi lΓ€nder”,”highlander”
replacements:”cash”,”cache”
replacements:”karola”,”carola”
Or, it can perform context replacements. These are only done if a command has been recognized.
contextreplacements:”STARTAPP”,”kreta”,”krita”
contextreplacements:”STARTAPP”,”konfig”,”config”
contextreplacements:”ADDTITLE”,”fΓΆge”,”fΓΌge”
Shown above is the entry “Krita” which sounds like the German word “Kreta” which is a Greek island. STT Vosk likes “Kreta”. So we need to replace this. But not every time. If we perform a web search for “Kreta”, it would be counter productive to replace it with “Krita”. You can add simple replacements to the config. I suggest that you use the user config for these because other users my encounter different problems.
If you don’t know why a command is not being recognized correctly, you can the always check the log at ~/.var/log/pva.log.
Contribution
If you want to contribute to this project, feel free to do so. It is available on Github[1]. Since Vosk now has support for eighteen languages, translating the texts to different languages would be a great place that a contributor could get started.
To get PVA installed on Fedora Linux, it is required to rebuild Vosk and its libraries with Fedora sources. We tried to do so earlier this year, but we failed when we ran into some exotic mathlib dependencies that we couldn’t get to compile on Fedora Linux. We are hoping that a good, skilled C developer could solve this problem and get those resulting packages reviewed by Fedora packagers. π
I hope I have awoken some interest from our dear readers. The world of PVA needs your awesome configs! π
Best regards,
Marius Schwarz
References
[1] https://github.com/Cyborgscode/Personal-Voice-Assistent
[2] https://alphacephei.com/vosk
[3] https://github.com/numediart
[5] Available from the Fedora Linux repositories:
- dnf install gtts
[6] http://static.bloggt-in-braunschweig.de/Pinephone-PVA-TEST1.mp4
[7] https://cloud.google.com/text-to-speech
[8] https://cloud.google.com/text-to-speech/docs/reference/rest
Update 13.7.2022
Itβs now possible to install PVA to Fedora Linux if you add the repo from this article:
At the moment, the containing language models are for DE, a small English one will follow soon. Keep in mind, you need an English translation of all config settings, or it doesnβt work in your favor. If you want to install other language models, consult the README.md for the AlphaCephi Website, where you can simply download them as zip files.
Heliosstyx
Thank you for this article, it’s very interesting. For some unexperienced user the article is to technical and partially not understandable. I suggest to divide this articel into more parts and to use more simple example, and so that new structured article can be read by everyone.
I hope it helps.I think you are german speaking person and so: besten Dank fΓΌr den Artikel.
Mehdi Haghgoo
Thank you for the nice tool. Haven’t yet completed reading the article, because it is a big too long. I have got a question. Once the PVA is installed, will it be listening for commands all the time?
Jont Allen
I use speech recognition on my apple phone, and it is amazing. But my student Feipeng Li who wrote many papers with me on the underlying features in the speech signal, now works at apple. Is there a connection? Did he work on the apple speech recognition system? He won’t say, following Apple policy.
Here is his PhD thesis:
Li, Feipeng (2009); PhD Thesis pdf
Go to
https://auditorymodels.org/index.php?n=Main.Conferences
and search for Feipeng. The first one that comes up is his thesis. But there are many others that describe how his research works, at finding fundamental speech features.
Jont Allen Jul 1, 2022
James
Fascinating. I haven’t read the thesis, I can’t find it at the address, but are speech recognition models still using hidden markov models to statistically extract key features, or is the case that more recently deep learning models have been derived?
Marius Schwarz
Yes, it will. The Vosk tool (pva.py) will endless loop and send results to pva once it thinks it has found something.
Jont Allen
To read about how speech recognition works, when it works well, please read the papers by Feipeng Li. These are posted on my website under the topic of “Journal Publications”
Once there, search for “Feipeng”
Feipeng now works for Apple, but due to company requirements, he cannot tell me what he works on. However, the Apple iPhone automatic speech recognition (ASR) system works amazingly well. I dictate all my “texts” using the iPhone ASR system.
To use it, TAP on the microphone once you start sending a text, and dictate your message. You have the option to make any corrections before you send it by taping on your message (the dictated text).
Jont
Marius Schwarz
Dear Jont,
if you can dig deep into VOSK and improve recognition quality with the tools VOSK offers to the common user, please, please do so. It would HELP A LOT with ghostwords, which pop up once in a while from random white noise the mic captures. That would be really great.
Marius
Darvond
Here’s a question: Why Carola instead of Mycroft? https://mycroft.ai/
Second question: If Mbrola isn’t in by default, what of Mimic?
And the other TTS engines, such as Festival, BRLTTY, and Flite?
Marius Schwarz
1) I checked MyCroft several times over the years, but couldn’t get warm with it. Today their focus is on alternative “devices” for Alexa AFAIK. PVA is open, flexible, uses Vosk with 18 supported languages, easy to configure with a texteditor and has cool features i.e. MPRIS2-, Dav-, Phone- and Emailsupport(soon ™) .
2) No big deal:
Just configure your default TTS exe/script like these: (sprachausgabe=speechoutput)
alternatives:”normale sprachausgabe”,”say”,”envx:xVOICE=%VOICEx:x/usr/local/sbin/say”
alternatives:”google sprachausgabe”,”say”,”envx:xVOICE=%VOICEx:xGTTS=1x:x/usr/local/sbin/say”
alternatives:”samsung sprachausgabe”,”say”,”/usr/local/sbin/samsungtts”
alternatives:”piko sprachausgabe”,”say”,”envx:xVOICE=%VOICEx:x/usr/local/sbin/psay”
alternatives:”pico sprachausgabe”,”say”,”envx:xVOICE=%VOICEx:x/usr/local/sbin/psay”
alternatives:”marie sprachausgabe”,”say”,”envx:xVOICE=%VOICEx:x/usr/local/sbin/msay”
alternatives:”roboter sprachausgabe”,”say”,”envx:xVOICE=%VOICEx:x/usr/local/sbin/robotsay”
or overwrite the default “say” with your version:
app:”say”,”env VOICE=%VOICE /usr/local/sbin/say”
As long as the exe/script takes the text to output as an argument, your fine.
Keith
Granted it needs more work, but what about Genie and OVAL’s approach to VA?
https://github.com/stanford-oval
Will PVA have Home Assistant integration?
Does PVA support satellite speakers communicating to a single local PVA server?
Marius Schwarz
No Idea what that Genie does exactly, but a better parser would be a great addition. But TBH, I don’t think, you will need it for most jobs your pc shall usually do for you. Playing Music, Video, searching for pics,docs,files and opening them, calling people, opening maps, get your weather report, switching on the light? No AI needed for those tasks. Clever parsing of commons sense sentences is more than enough.
HA Integration:
The next step will be switching to a client-server model, where pva will run permanently as a process with subprocesses of any kind. The new system will i.e. allow to receive mails via IMAP and inform the user about the content. Adding a plugin system and you can control and react to anything you like.
Remote speakers…. you mean something like those alexa boxes, it’s not planned. If you would ask me right now, i would say: get a raspberry 3-4, with 4-8 gb memory, install PVA with a small language modell, add a NFS mount to your main datastorage and your good to go. VOSK does not need that much cpu power, it runs fine on a pinephone. Regarding what I said about the pluginsystem, I also can imagine a interconnectmodul between different instances, the question is, if we have mediaplayers that can switch output display over several device borders π in other words, what do you want them to interact about. Can be interesting π If you wanne have multidevice audioplayback, I already build one in 2018. PulseAudio made it happen … hmm.. it also used qmmp as player which means, PVA and LAHA would work together. Opportunities emerge π BTW: it was a very cool moment when my desktop, laptop, tablet and phone played the same music at the same time π
David Brownburg
Great article!
qwer2qw
I need POLISH language out of box
w23
mycroft , why not moving mycroft to fedora?
Vivian
OK, maybe i’m, a dummy but those installation instructions made my head spin and started to feel nauseous :/
Can’t there be just a simple few click install via flatpak or a snap? Anything like that? I can’t help it: i just can’t be bothered these days on complicated things when i’m relaxing on my computer …
Thanks if you ever solve this installation thing.
Marius Schwarz
Due to licenses issues, Fedora can’t ship the needed dependencies. I would like to build easy-to-use rpms for you, but I can’t due to the same issues. As soon as we can, we add it to the repo and you can install it as any other app.
To solve this, a good skilled c/c++ dev is needed to compile all the dependencies for vosk for 64bit from their sources, which fails atm π
Daniel M
Lovely! Have you considered Mimic? Mimic 3 came out recently and works locally while sounding great!
Marius Schwarz
no, but any installed TTS can be used, it’s just one more alternative, just one more cmd to call π
ungleer
Maybe https://github.com/rhasspy/larynx is another choice for an offline TTS engine. I find the online demo quite interesting…
(OT but Integration to speech-dispatcher is planned for v2, too)
Marius Schwarz
curiose as I am, i tried it out and it failed misserable. The serverscript does not handle other than debian os-file-structure, docker was unable without insecure chmods on /var/run/docker.sock to even run the container and docker was unable to use .local/share/larynx/voices, even with explicit rw permission on the virtual mount option (-v) and manual creating all those dirs and files in it, it needs to run.. at that point i stopped wasting time.
The demo voices sound ok, so it’s worth reinvesting time, once those bugs are fixed.
foodle
Have you thought about Mimic? Recently released Mimic 3 works locally and sounds fantastic!
rewg4e
mimic 3 use ubuntu not fedora
btw. where is bitmessage package?
Tony
[tony@chainsaw Downloads]$ cat /etc/system-release
Fedora release 36 (Thirty Six)
[tony@chainsaw Downloads]$ sudo dnf search mimic
Last metadata expiration check: 3:11:19 ago on Wed 27 Jul 2022 01:06:20 PM.
================================================== Name Exactly Matched: mimic ==================================================
mimic.x86_64 : Mycroft’s TTS engine
mimic.i686 : Mycroft’s TTS engine
Fedora Workstation
You are sitting in the terminal, manually entering commands why do you need a voice assistant?)))
Tony
Your hands are missing?
sam
My comment is not related to this topic, so sorry. Your bug reporting app is so advanced that only a genius developer can use it. Mine at the last step asks for a an API of bugzilla page and I don’t even understand this. I don’t know what is an api! Please make your bug reporting system more normal user friendly. Thanks.
Gregory Bartholomew
Hi Sam:
Fedora’s “user friendly” bug reporting system is https://ask.fedoraproject.org/.
M
Nice idea and we can see the author put a lot of work into it.
It needs better instructions for installation and setup (self contained and complete), or an installation script.
Marius Schwarz
There is now a repo for easy rpm installation.
Due to the size of the “large” speech models, it will only contain the base parts.
installing the model is easy : unzip the package from Alpha Cephis Model page into the /usr/share/pva/ directory.
http://repo.linux-am-dienstag.de/x86_64/fedora/35/pva-base-1-6.x86_64.rpm
As an example:
http://repo.linux-am-dienstag.de/x86_64/fedora/35/pva-vosk-model-de-small-1-1.x86_64.rpm
Gregory Bartholomew
@Marius
Would you like me to add your last comment to the end of your article?
Marius Schwarz
better not, it’s already outdated info π
You can add this:
Update 13.7.2022:
It’s now possible to install PVA to Fedora if you add the repo from this article:
https://marius.bloggt-in-braunschweig.de/2022/07/13/pva-carol-hat-ihr-eigenes-repo-bekommen/
At the moment, the containing language models are for DE, a small english one will follow soon. Keep in Mind, you need an english translation of all config settings, or it doesn’t work in your favour. If you want to install other languagemodels, consult the README.md for the AlphaCephi Website, where you can simply download them as zip files.
Gregory Bartholomew
Done! π
M
Attempting to download the first rpm, produces an error, “potential security risk”
Attempting to list the directory, produces an error “you are not authorized”
And, did you mean to say that the second rpm a language model? Does that mean you have the language models in that directory?
Your instructions need to be more explicit, and you need to test your deployment from some account other than your own.
Both testing and documentation, take time.
Marius Schwarz
That’s why the hint for the installation of the repo has been added to the article.
Add that repo with GPG checks, and you can easily install and erase the software form the right place.
Jont Allen
Response to James re Hidden Markov models vs more recient Machine learning methods:
I don’t know the answer, but I “spoke” with Steve Levinson Jul 9, 2022, who published some of the early work on this topic. Unfortunately, he has lost the ability to talk. So I cannot ask him the question. Even if I could, he may not know the answer.
The person that should know the answer is my colleague Mark Hasagawa Johnson, a professor at the Univ of IL. We are in the same Dept.
Tim S
IF LInux is going to have a voice assistant. I nominate Wimpy of Wimpy’s World to be the voice.
iptables
From a provacy stand point – Am I going to be able to completely uninstall this service ?
Thank you.
Marius Schwarz
for homemade installations, this should remove everything:
rm -rf /etc/pva /usr/share/pva /home//.cache/pva /home//.config/pva
pip3 uninstall vosk
dnf erase sounddevice gtts
If you can have more apps in /usr/local/sbin/ , you need to delete files there manually.
With the RPM it’s just “dnf erase pva*”