Getting Involved

Wed Jun 4 08:42:37 BST 2025

On Wed, Jun 4, 2025, at 10:26 AM, Bridger Reed-Lewis wrote:
> Shervin, thank you for sharing this with us. I am now very very interested
> in getting a separate computer to try these out. At the moment, I have a
> steam deck, but I do have plans to purchase a used Thinkpad x1 carbon 10th
> gen. Once I get that, which will be a few weeks. Let's chat more about
> these programs because it seems like things have changed since I last
> really dived into linux. I guess if they're available, I'd like to see
> about working with KDE to develop a standard for prepackaged accessibility
> tools, even if that's through a third-party. What do you think?
>
> On Tue, Jun 3, 2025 at 10:56 PM Shervin Emami <shervin.emami at gmail.com>
> wrote:
>
>> Hi Benson,
>>
>> For around 8 years now I've been using speech recognition for many hours
>> per day to control KDE desktop, since that is my main way of working
>> despite having chronic RSI. Since my background is in robotics & AI, I've
>> tried many different alternative interfaces, and gave a talk specifically
>> for Linux (https://www.youtube.com/watch?v=3aQfwS5pyrg)
>>

Thanks.

>> If you're mostly interested in dictation, you should consider OpenAI
>> Whisper, because it's possibly the most reliable ASR engine at the moment
>> but also importantly it has very good ASR support for many different
>> languages and dialects and accents, whereas most ASR is targeted for USA
>> and UK English.
>>

There are efforts to diversify this. Mozilla has some data collection related
efforts.

>> Whisper doesn't have an interface for directly dictating into applications
>> though. And it's not able to do any commands. I currently use the
>> open-source Faster-Whisper for dictation (with my own code for typing into
>> applications), and the open-source Kaldi-Active-Grammar + Dragonfly for
>> typing commands including for programming / typing things on the terminal
>> or controlling the mouse. (
>> https://dragonfly.readthedocs.io/en/latest/kaldi_engine.html)
>> It's not a clean solution though, the interaction I've set up between
>> Kaldi and Whisper is quite hacky. But proves that it is very possible!
>>

Thanks for this.

>> Talon ASR (free but closed-source at https://talonvoice.com/) has a
>> community that have been slowly working towards better interaction with the
>> desktop by voice, they are making excellent solutions for Windows & Mac
>> where user interface elements have a standardised API. But they say Linux
>> has been a nightmare to have anything close to a standardised way of
>> knowing which UI elements are on the screen. And Wayland pretty much
>> doesn't allow interaction with the screen at all! So unfortunately
>> Talon has a community of eager developers that want to improve usability of
>> speech recognition on Linux, but they have basically given up on Linux
>> recently due to these 2 issues.
>>
>> Cheers,
>> Shervin Emami.
>>
>>
>>