Python Speech Recognition, Dictation and Coding

Python allows for a simple speech recognition, using API’s from Google, Amazon, PocketSphinx, and more. This can be used as ultra low bitrate speech coding in Ham Radio.

Python offers an easy interface to several speech recognition engines, like the Google Speech API and Pocket Sphinx. The latter has the advantage that it can also work offline, but it needs to be compiled for installation, and so far this didn’t work for me. The Google API is free for limited usage, does not need a registration, and recently also got a German language recognizer. It has surprisingly good recognition performance, even in a noisy environment and with some distance to the microphone.

Here I present 2 Python programs using the Google API, one for English and one for German. Each recognizes speech that was recorded up to a pause, then synthesises the recognized speech using the “espeak” text-to-speech synthesizer, to check for correctness, and then appends the recognized text to a text file specified in the argument of the program at the start. This makes them suitable for a “dictation”, as replacement for a keyboard and display for text input.

An interesting application might be to use it for ultra low bit rate speech coding. The recognized text could be transmitted using for instance PSK31 or PSK63, and the receiver takes the text and re-synthesizes it as speech. In this way, speech communication would be possible over long distances on Shortwave, using low power transmitters and small antennas. A PTT microphone, a small speaker, and perhaps a Raspberry Pi with a small USB sound card would be sufficient. Another possibility is to use keywords and speech output to control FT8 for portable operation (like “call CQ on 20m”). Instead of a speech pause as a termination signal for the recognizer, a PTT button could be used. I just wonder about a suitable text interface to PSK. For portable operation, maybe in a handheld device, the Pocket-Sphinx engine would be more desirable because it doesn’t need an internet connection.

Run the english version, with the dictation text file “text.txt” name in the argument in a (Linux) terminal window by typing:

python(3) speech_recognition_file.py text.txt

For German you type:

python(3) sprach_erkennung_file.py text.txt

It will then explain the program itself, using speech output (speaker or headphone required).

The programs are finished by pressing Ctl-C, and the recognized speech should have been appended to the content of the file “text.txt”.

Have fun, 73, Gerald, DL5BBN

Python files:

Python Speech Recognition, Dictation and Coding

Further Reading

Bewilligung Projektwerkstatt

AFu-Kurs WS2013/14

Microphone Compressor